Downloading reads and metadata from SRA

At the bottom of the SRA results table are two buttons: Download Reads and Metadata and Show Metadata for Selection. The functionality of each is described in this section.

Download Reads and Metadata

Select the rows of interest in the results table and then click on Download Reads and Metadata to download and import reads and metadata from the selected runs.

Reads are imported into sequence lists. Import settings for reads from runs marked as paired are configurable, including the option to import technical reads in addition to biological reads.

Metadata is imported into a CLC Metadata Table. Each sequence list will have an association to the relevant row of the CLC Metadata Table. See Finding data elements based on metadata for details about data associations with CLC Metadata Tables. The CLC Metadata Table can be used directly to define the experimental design for differential expression analyses (Differential Expression for RNA-Seq) or edited, if desired (Editing Metadata tables).

Note: When the "Auto paired end distance detection" option is present in downstream analyses of paired data downloaded from SRA, we recommend it is enabled. This is because some SRA entries have an insert size that includes the length of the reads, while others exclude the length of the reads.

After clicking on Download Reads and Metadata, a wizard appears to guide you through the import of the selected runs.

Import Options

Image srasearchimport
Figure 10.7: Import options in the SRA Download wizard

Space requirements

During download and import: The Download size reported is the combined size of all the SRA format files that will be retrieved.

We recommend that at least twice the download size of the largest sample is available as temporary space during download and import.

If the SRA file is reference-compressed, a copy of the genome must also be retrieved before the reads can be imported, which will also require disk space.

For the imported data: The size of the sequence lists after import will often be comparable in size to the SRA files downloaded (often between half to twice the size). The size depends on multiple factors, including whether compression has been turned off, whether read names and quality scores were retained, and whether you imported technical reads as well as biological reads, where relevant.

A few examples of SRA file sizes relative to imported sequence list sizes are given below. Relative sizes may differ on your system depending on your settings.

Description SRA file After import, with read names and qual. scores After import, no read names or quality scores
Single end reads 84 MB 110 MB 32 MB
Paired end biological reads 107 MB 138 MB 59 MB
2 technical reads and 1 biological read, only the biological imported 1311 MB 760 MB 208 MB

Edit Paired End Settings

If at least one of the selected runs is marked as paired, the next wizard step allows you to review and edit the paired end settings (figure 10.8).

Values in shaded cells can be configured by selecting rows and clicking on the "Edit Selected Rows" button. The settings in the edit dialog when you click on OK are applied to all the selected rows, so we recommend selecting either a single row, or sets of rows where the information should be the same.

Image srasearchpaired
Figure 10.8: Paired end information includes the reads available for that run, as well as the read structure, distance and read orientation. Values in shaded cells can be configured by selecting rows and clicking on the "Edit Selected Rows" button.

N/A values in the Distance and Read orientation columns are expected when only one of the reads is biological (figure 10.9). If the read structure is edited such that paired reads will be imported, values will appear in these columns.

Image srasinglewithtechnicalreads
Figure 10.9: Only R2 is biological, so by default a sequence list containing single sequences from R2 would be imported.

Image srapaired-includingtechnical
Figure 10.10: Paired end information includes the reads available for that run, as well as the read structure, distance and read orientation. Reads specified as technical by the SRA submitter are marked with (T), while other reads are marked with (B) for biological. The first 4 entries listed in the SRA Download wizard are examples for runs marked as paired with only one read specified as biological. These are imported as single end reads by default.

Image sra-readsavailable-tooltip
Figure 10.12: Mousing over an entry in the Reads available column in the Edit Paired End Settings wizard step reveals a tooltip with details for each read in that run.

When the settings match your expectations, click on Next to select where to save the data, and then start the download.

Image sra-include-technical-in-import
Figure 10.11: The read structure for import has been edited in the first 4 entries listed in the SRA Download wizard, using the settings shown in the Edit Paired Information dialog. Technical reads from I1 and R1 will be prepended to R2 reads and these sequences imported into a single end sequence list.

Show Metadata for Selection

Information about SRA entries of interest can be downloaded to a CLC Metadata Table without downloading the sequence data. Select the rows of interest in the results table and then click on Show Metadata for Selection. Sequence data can be downloaded later if desired.

The first columns of the resulting CLC Metadata Table contain the same database identifiers as in the original results table. Later columns contain details associated with the biosample, which are pulled from SRA. In the side panel, to the right, the columns to show in the table can be configured.

Tips relating to retrieving sequence data later using the CLC Metadata Table: