Reference data overview
Human hg19
- Human reference sequence, ENSEMBL
ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/
Chromosomes 1-22, X, Y and M human reference DNA sequence GRCh37(HG19) - Human genes, coding sequences and transcripts, ENSEMBL
ftp://ftp.ensembl.org/pub/current_gtf/homo_sapiens/
All annotated protein coding genes for human reference sequence GRCh37(HG19). The annotation was done by ENSEMBL and includes annotations from RefSeq, CCDS as well as ENSEMBL itself. - HapMap variants, ENSEMBL
ftp://ftp.ensembl.org/pub/current_variation/gvf/homo_sapiens/
The goal of the International HapMap Project is to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation (for more information about HapMap see http://hapmap.ncbi.nlm.nih.gov/). Please note that there are 12 different files (tracks) to be downloaded (one file for each population). It is recommended that you configure your workflows with the file from this population that best matches the ethnicity of the patient from which the sample was taken. You can find more about the population codes, which are part of the filename here: http://www.sanger.ac.uk/resources/downloads/human/hapmap3.html - Variants found by the 1000 Genomes Project, ENSEMBL
ftp://ftp.ensembl.org/pub/current_variation/gvf/homo_sapiens/
The 1000 Genomes Project Phase 1 created an integrated map of genetic variations from 1092 human genomes[ et al., 2012]. Please note that there are 4 different files (tracks) to be downloaded (one file for each population). It is recommended that you configure your workflows with the file from the population that bests matches the ethnicity of patient from which the sample was taken. You can learn more about the population codes that are part of the filename here: http://www.1000genomes.org/. - dbSNP variants, UCSC
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp138.txt.gz
Human variants present in the Single Nucleotide Polymorphism Database (dbSNP), which includes smaller insertions, deletions, replacements, SNPs and MNVs. Please note that most variants in dbSNP are not validated and everybody can submit data to dbSNP. The collection of variants includes clinical relevant as well as common variants. Please note that the url must be modified according to what you would like to download - e.g. if you are interested insnp141Common.txt.gz
, "138" in the url should be replaced with "141Common" (for a full list see http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/). - dbSNP common variants, UCSC
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp138Common.txt.gz
Uniquely mapped variants that appear in at least 1% of the population or are 100% non-reference. Please note that the url must be modified according to what you would like to download - e.g. if you are interested insnp141Common.txt.gz
, "138" in the url should be replaced with "141" (for a full list see http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/) - ClinVar database variants, NCBI
http://www.ncbi.nlm.nih.gov/clinvar/docs/maintenance_use/
ClinVar is designed to provide a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. - PhastCons Conservation Scores, UCSC
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/phastCons100way/hg19.100way.phastCons/
Conservation track of UCSC from a multiple alignments of 100 species and measurements of evolutionary conservation using the phastCons algorithm from the PHAST package. - Human Gene Ontology (GO slim) file, EBI
http://www.ebi.ac.uk/QuickGO/GMultiTerm
Gene Ontology file in slim format (only high level GO terms annotated) for the GO categories Molecular Function, Biological Process and Cellular Component annotated on human genes. The file was made using the QuickGO tool from the EBI (http://www.ebi.ac.uk/QuickGO/ GMultiTerm). - target primers and target regions QIAGEN_v2
https://www.qiagen.com/dk/shop/sample-technologies/dna-sample-technologies/genomic-dna/generead-dnaseq-gene-panels-v2/
These primers and regions are defined and provided for by QIAGEN GeneRead DNAseq Targeted Panels V2.
Human hg38
- Human reference sequence, ENSEMBL
ftp://ftp.ensembl.org/pub/release-80/fasta/homo_sapiens/dna/
The fileHomo_sapiens.GRCh38.dna.toplevel.fa.gz
has chromosomal sequences along with several scaffolds. The scaffolds were removed in the workbench. - Human genes, coding sequences and transcripts, ENSEMBL
ftp://ftp.ensembl.org/pub/release-80/gtf/homo_sapiens/
filename:Homo_sapiens.GRCh38.80.gtf.gz
- HapMap variants, ENSEMBL
ftp://ftp.ensembl.org/pub/release-80/variation/gvf/homo_sapiens/
The goal of the International HapMap Project is to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation (for more information about HapMap see http://hapmap.ncbi.nlm.nih.gov/). Please note that there are 12 different files (tracks) to be downloaded (one file for each population). It is recommended that you configure your workflows with the file from this population that best matches the ethnicity of the patient from which the sample was taken. You can find more about the population codes, which are part of the filename here: http://www.sanger.ac.uk/resources/downloads/human/hapmap3.html - Variants found by the 1000 Genomes Project, ENSEMBL
ftp://ftp.ensembl.org/pub/release-80/variation/gvf/homo_sapiens/
The 1000 Genomes Project Phase 1 created an integrated map of genetic variations from 1092 human genomes[ et al., 2012]. Please note that there are 4 different files (tracks) to be downloaded (one file for each population). It is recommended that you configure your workflows with the file from the population that bests matches the ethnicity of patient from which the sample was taken. You can learn more about the population codes that are part of the filename here: http://www.1000genomes.org/. - dbSNP variants, UCSC
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/
Human variants present in the Single Nucleotide Polymorphism Database (dbSNP), which includes smaller insertions, deletions, replacements, SNPs and MNVs. Please note that most variants in dbSNP are not validated and everybody can submit data to dbSNP. The collection of variants includes clinical relevant as well as common variants. filename:snp142.txt.gz
- dbSNP common variants, UCSC
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/
Uniquely mapped variants that appear in at least 1% of the population or are 100% non-reference. filename:snp142Common.txt.gz
- ClinVar database variants, NCBI
ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/
ClinVar is designed to provide a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. filename:clinvar_20150629.vcf
- PhastCons Conservation Scores, UCSC
http://hgdownload.cse.ucsc.edu/goldenPath/hg38/phastCons20way/
Conservation track of UCSC from a multiple alignments of 100 species and measurements of evolutionary conservation using the phastCons algorithm from the PHAST package. filename:hg38.phastCons20way.wigFix
- Human Gene Ontology (GO slim) file, EBI
http://www.ebi.ac.uk/QuickGO/GMultiTerm
Gene Ontology file in slim format (only high level GO terms annotated) for the GO categories Molecular Function, Biological Process and Cellular Component annotated on human genes. The file was made using the QuickGO tool from the EBI (http://www.ebi.ac.uk/QuickGO/ GMultiTerm). - target primers and target regions QIAGEN_v2
https://www.qiagen.com/dk/shop/sample-technologies/dna-sample-technologies/genomic-dna/generead-dnaseq-gene-panels-v2/
These primers and regions are defined and provided for by QIAGEN GeneRead DNAseq Targeted Panels V2.
Mouse Mm10
- Mouse reference sequence, ENSEMBL
ftp://ftp.ensembl.org/pub/release-80/fasta/mus_musculus/dna/
The fileMus_musculus.GRCm38.dna_sm.toplevel.fa.gz
has chromosomal sequences along with several scaffolds. The scaffolds were removed in the workbench. - Mouse genes, coding sequences and transcripts, ENSEMBL
ftp://ftp.ensembl.org/pub/release-80/gtf/mus_musculus/
filename:Mus_musculus.GRCm38.80.gtf.gz
- dbSNP variants, ENSEMBL
ftp://ftp.ensembl.org/pub/release-80/variation/gvf/mus_musculus/
filename:Mus_musculus.gvf.gz
- PhastCons Conservation Scores, UCSC
http://hgdownload.cse.ucsc.edu/goldenPath/mm10/phastCons60way/mm10.60way.phastCons/
Each chromosome has a separate wigfix file. Each needs to be downloaded (22 files) and then combined to make single wigfix file before importing in workbench. filename:*.phastCons60way.wigFix.gz
- Mouse Gene Ontology (GO slim) file, EBI
http://www.ebi.ac.uk/QuickGO/GMultiTerm
Gene Ontology file in slim format (only high level GO terms annotated) for the GO categories Molecular Function, Biological Process and Cellular Component annotated on mouse genes. The file was made using the QuickGO tool from the EBI (http://www.ebi.ac.uk/QuickGO/ GMultiTerm).
Rat Rnor5.0
- Rat reference sequence, ENSEMBL
ftp://ftp.ensembl.org/pub/release-79/fasta/rattus_norvegicus/dna/
The fileRattus_norvegicus.Rnor_5.0.dna.toplevel.fa.gz
has chromosomal sequences along with several scaffolds. The scaffolds were removed in the workbench. - Rat genes, coding sequences and transcripts, ENSEMBL
ftp://ftp.ensembl.org/pub/release-79/gtf/rattus_norvegicus
filename:Rattus_norvegicus.Rnor_5.0.79.gtf.gz
- dbSNP variants, ENSEMBL
ftp://ftp.ensembl.org/pub/release-79/variation/gvf/rattus_norvegicus/
filename:Rattus_norvegicus.gvf.gz
- PhastCons Conservation Scores, UCSC
http://hgdownload.cse.ucsc.edu/goldenPath/rn5/phastCons13way/
Each chromosome has a separate wigfix file. Each needs to be downloaded (22 files) and then combined to make single wigfix file before importing in workbench. filename:phastCons13way.wigFix.gz
- Rat Gene Ontology (GO slim) file, EBI
http://www.ebi.ac.uk/QuickGO/GMultiTerm
Gene Ontology file in slim format (only high level GO terms annotated) for the GO categories Molecular Function, Biological Process and Cellular Component annotated on mouse genes. The file was made using the QuickGO tool from the EBI (http://www.ebi.ac.uk/QuickGO/ GMultiTerm).