HaplotypeCaller Exome Region File
Data sources and code used to generate the exome region file used by GATK HaplotypeCaller
in WES runs.
VEP v101
Accessed 2021-10-21.
VEP v101 archive website.
VEP v101 gtf file:
Homo_sapiens.GRCh38.101.gtf.gz
A copy of this file is also stored within the exome_regions
folder of the cgap-annotations s3 bucket.
Reference File Creation
To transform this VEP gtf
file into a comprehensive bed
file of all possible transcripts and UTR regions, one python script and two BEDTools (v2.30.0) commands were used.
bgzip -d Homo_sapiens.GRCh38.101.gtf.gz
python exome_hg38_region_of_interest.py Homo_sapiens.GRCh38.101.gtf regions_bed_final.bed
bedtools sort -i regions_bed_final.bed > sort_regions_bed_final.bed
bedtools merge -i sort_regions_bed_final.bed > merge_sort_regions_bed_final.bed
exome_hg38_region_of_interest.py
is available in this repository in /genes/exome_regions/
.