Data Sources

Data sources available for genes annotation.

RefSeq

Files

ncbi refseq RefSeqGene:

  • LRG_RefSeqGene

  • refseqgene.<n>.genomic.gbff.gz

ncbi refseq mRNA_Prot:

  • human.<n>.rna.gbff.gz

ncbi gene:

  • gene2ensembl.gz

  • gene2refseq.gz

Description

LRG_RefSeqGene is a tab-delimited file reporting, for each gene, the accession.version of the genomic RefSeq (RSG) that is the standard reference. Additionally reports the accession.version of the associated RNA and protein RefSeqs.

#tax_id   GeneID   Symbol   RSG    LRG   RNA    t   Protein   p   Category

refseqgene.<n>.genomic.gbff report annotations for each RSG in GenBank format.

human.<n>.rna.gbff report annotations for each RNA and protein RefSeq in GenBank format.

gene2ensembl is a tab-delimited file matching NCBI to Ensembl annotations.

#tax_id   GeneID   Ensembl_gene_identifier   RNA_nucleotide_accession.version   Ensembl_rna_identifier   protein_accession.version   Ensembl_protein_identifier

gene2refseq is a tab-delimited file reporting genomic/RNA/protein sets of matching RefSeqs.

#tax_id   GeneID   status   RNA_nucleotide_accession.version   RNA_nucleotide_gi   protein_accession.version   protein_gi   genomic_nucleotide_accession.version   genomic_nucleotide_gi   start_position_on_the_genomic_accession   end_position_on_the_genomic_accession   orientation   assembly   mature_peptide_accession.version   mature_peptide_gi   Symbol

Version

Current version accessed 2020-10-22.

  • LRG_RefSeqGene: v20201020

  • refseqgene.<n>.genomic.gbff.gz: v20201020

  • human.<n>.rna.gbff.gz: v20201020

  • gene2ensembl.gz: v20201022

  • gene2refseq.gz: v20201022