How reads are downloaded

Reads are downloaded in SRA (.sra) format using the NCBI SRA-toolkit. These files are typically 2.5x smaller than an equivalent zipped FASTQ format file. NCBI's prefetch utility is used for downloading the data, and the resulting file is then processed using 'fastq-dump'8.1.

Biological reads are imported. Technical reads are not. For paired reads, 2 biological reads are expected.

Sometimes runs in SRA cannot be downloaded. The affected runs are listed in a Problems panel together with a description of the problem. It is still possible to download the remaining runs.

The most common problems are:

Show Metadata for Selection

Information about SRA entries of interest can be downloaded to a CLC Metadata Table without downloading the sequence data. Select the rows of interest in the results table and then click on Show Metadata for Selection. Sequence data can be downloaded later if desired.

The first columns of the resulting CLC Metadata Table contain the same database identifiers as in the original results table. Later columns contain details associated with the biosample, which are pulled form SRA. In the side panel, to the right, the columns to show in the table can be configured.

Tips relating to retrieving sequence data later using the CLC Metadata Table:



Footnotes

... 'fastq-dump'8.1
Downloading from SRA using Aspera is no longer supported. See https://github.com/ncbi/sra-tools/wiki/Avoid-using-ascp-directly-for-downloads