How reads are downloaded
SRA reads are downloaded in the ".sra" format using the NCBI SRA-toolkit. A .sra file is typically 2.5x smaller than an equivalent zipped fastq file. Download uses the NCBI 'prefetch' utility, and the resulting file is read into the workbench using 'fastq-dump'.
Sometimes runs in SRA cannot be downloaded. The affected runs are listed in a Problems panel together with a description of the problem. It is still possible to download the remaining runs.
The most common problems are:
- "The selected SRA reads contain no spots, and cannot be imported in the workbench.": The run has no associated sequencing data.
- "The selected SRA reads are dbGaP restricted.": For data protection reasons, you must request access to these reads. Requests and download cannot happen within the workbench, but you can follow the procedures here: http://www.ncbi.nlm.nih.gov/books/NBK5295/.
- "The selected SRA reads are made with an unsupported sequencing platform.": For example, Complete Genomics reads consist of eight regions separated by gaps of variable lengths, and should be analyzed by specialist tools.
We support download of reads via the commercial FASP protocol from Aspera. In our testing, Aspera download is up to 10x faster than a normal http download.
To enable this functionality, you have to download the Aspera Connect software from http://downloads.asperasoft.com/connect2/.
On Windows, choose to do a "Custom" install and choose to "Install for all users of this machine". You can then test if the installation worked by downloading a small file. The log (accessible from the Processes tab) will include the line "Downloading via fasp" if everything worked.
It is possible to change the Aspera options using Preferences | Advanced | SRA Download.
The following options are available:
- Use Aspera when available Per default, Aspera is automatically used if installed. This option makes it possible to disable Aspera.
- Limit Aspera download speeds to [ ] Mb/s (Mac and Linux only) Using Aspera may take up a lot of network resources. Use this option to specify a maximum download speed (in megabit per second). Note that this option is only available on Mac and Linux. For Windows users, it is possible to limit the maximum download speed by modifying the
aspera.conf file
, which can be found in
C:\Program Files (x86)\Aspera\Aspera Connect\etc
. See http://download.asperasoft.com/download/docs/csrv/3.3.4/linux/html/index.html and http://download.asperasoft.com/download/docs/csrv/3.3.4/linux/html/fasp/setting-global-bandwidth.html for more details.