Unrelated Files and Panel of Normal
For many of the CGAP Pipelines, a collection of 20 de-identified UGRP samples are used to aid in filtering common variants. This documentation page outlines how they were created.
SNV Pipeline - Unrelated RCK files
Sentieon
20 unrelated
fastq
files from UGRP dataset were run through the Upstream Sentieon module (v1.0.0) to generate analysis-readybam
files.The
bam
files were then processed using a custom module (SNV Unrelated, v1.0.0) that executes granitempileupCounts
andrckTar
commands.The final file was uploaded to the CGAP Portal as:
196ef586-be28-40c5-a244-d739fd173984/GAPFIMO8Y4K1.rck.tar
GATK
20 unrelated
fastq
files from UGRP dataset were run through the Upstream GATK module (v1.0.0) to generate analysis-readybam
files.The
bam
files were then processed using a custom module (SNV Unrelated, v1.0.0) that executes granitempileupCounts
andrckTar
commands.The final file was uploaded to the CGAP Portal as:
eac862c0-8c87-4838-83cb-9a77412bff6f/GAPFIMO8Y4PZ.rck.tar
Somatic Sentieon - Panel of Normal (PON)
20 unrelated
fastq
files from UGRP dataset were run through the Upstream Sentieon module (v1.0.0) to generate analysis-readybam
files.Following this protocol from Sentieon each resulting
bam
file was run individually through the Somatic Sentieon Tumor Only module (v1.0.0), usingGAPFI4LJRN98.vcf.gz
dbSNP file for known SNPs.The 20 resulting
vcf
output files were merged using BCFtools (1.10.2).This file was uploaded to the CGAP Portal as:
833c91e9-a8cd-470e-8100-32b49ed14159/GAPFIV1QKYU9.vcf.gz
SV Pipeline - Manta
20 unrelated
fastq
files from UGRP dataset were uploaded to the (now decommissioned) cgap-wolf environment.Each of the 20 samples was run through the Upstream GATK module (v24), ending with a final
bam
file followingworkflow_gatk-ApplyBQSR
.Each of the resulting final
bam
files was run through a proband-only Manta workflow (v2) to producevcf
files.The resulting
vcf
files were downloaded to a folder namedunrelated
, which was compressed:
tar -cvf unrelated.tar unrelated
This file was uploaded to the CGAP Portal as:
cd647c0c-ac11-46db-9c51-bfe238e9ac13/GAPFIH794KXC.vcf.tar
CNV Pipeline - BICseq2
20 unrelated
fastq
files from UGRP dataset were retrieved from Glacier Deep Archive and uploaded to the current cgap-wolf environment.Each of the 20 samples was run through the Upstream GATK module (v27), ending with a final
bam
file followingworkflow_gatk-ApplyBQSR
.Each of the resulting final
bam
files was run through the development version of the CNV module (v1), which included only 2 steps (workflow_BICseq2_map_norm_seg
andworkflow_BICseq2_vcf_convert_vcf-check
). This development version still included chromosomes X and Y as well, which have since been removed from the production version.The resulting
vcf
files were downloaded to a folder namedunrelated
, which was compressed:
tar -cvf unrelated.tar unrelated
This file was uploaded to the CGAP Portal as:
318788cd-661f-4327-b571-d58a9b7c301e/GAPFICPW2884.vcf.tar