Unrelated Files and Panel of Normal
For many of the CGAP Pipelines, a collection of 20 de-identified UGRP samples are used to aid in filtering common variants. This documentation page outlines how they were created.
SNV Pipeline - Unrelated RCK files
Sentieon
20 unrelated
fastqfiles from UGRP dataset were run through the Upstream Sentieon module (v1.0.0) to generate analysis-readybamfiles.The
bamfiles were then processed using a custom module (SNV Unrelated, v1.0.0) that executes granitempileupCountsandrckTarcommands.The final file was uploaded to the CGAP Portal as:
196ef586-be28-40c5-a244-d739fd173984/GAPFIMO8Y4K1.rck.tar
GATK
20 unrelated
fastqfiles from UGRP dataset were run through the Upstream GATK module (v1.0.0) to generate analysis-readybamfiles.The
bamfiles were then processed using a custom module (SNV Unrelated, v1.0.0) that executes granitempileupCountsandrckTarcommands.The final file was uploaded to the CGAP Portal as:
eac862c0-8c87-4838-83cb-9a77412bff6f/GAPFIMO8Y4PZ.rck.tar
Somatic Sentieon - Panel of Normal (PON)
20 unrelated
fastqfiles from UGRP dataset were run through the Upstream Sentieon module (v1.0.0) to generate analysis-readybamfiles.Following this protocol from Sentieon each resulting
bamfile was run individually through the Somatic Sentieon Tumor Only module (v1.0.0), usingGAPFI4LJRN98.vcf.gzdbSNP file for known SNPs.The 20 resulting
vcfoutput files were merged using BCFtools (1.10.2).This file was uploaded to the CGAP Portal as:
833c91e9-a8cd-470e-8100-32b49ed14159/GAPFIV1QKYU9.vcf.gz
SV Pipeline - Manta
20 unrelated
fastqfiles from UGRP dataset were uploaded to the (now decommissioned) cgap-wolf environment.Each of the 20 samples was run through the Upstream GATK module (v24), ending with a final
bamfile followingworkflow_gatk-ApplyBQSR.Each of the resulting final
bamfiles was run through a proband-only Manta workflow (v2) to producevcffiles.The resulting
vcffiles were downloaded to a folder namedunrelated, which was compressed:
tar -cvf unrelated.tar unrelated
This file was uploaded to the CGAP Portal as:
cd647c0c-ac11-46db-9c51-bfe238e9ac13/GAPFIH794KXC.vcf.tar
CNV Pipeline - BICseq2
20 unrelated
fastqfiles from UGRP dataset were retrieved from Glacier Deep Archive and uploaded to the current cgap-wolf environment.Each of the 20 samples was run through the Upstream GATK module (v27), ending with a final
bamfile followingworkflow_gatk-ApplyBQSR.Each of the resulting final
bamfiles was run through the development version of the CNV module (v1), which included only 2 steps (workflow_BICseq2_map_norm_segandworkflow_BICseq2_vcf_convert_vcf-check). This development version still included chromosomes X and Y as well, which have since been removed from the production version.The resulting
vcffiles were downloaded to a folder namedunrelated, which was compressed:
tar -cvf unrelated.tar unrelated
This file was uploaded to the CGAP Portal as:
318788cd-661f-4327-b571-d58a9b7c301e/GAPFICPW2884.vcf.tar