RNA-Seq and Differential Gene Expression Analysis workflow

The RNA-Seq and Differential Gene Expression Analysis workflow calculates gene expression profiles per sample, and then performs differential expression analysis across samples. It also generates various reports and visualizations of the expression profiles and differential expression results. The ability to run parts of the workflow on a per-sample basis and other parts based on all samples, is possible due to the Iterate and Collect and Distribute workflow elements, see Workflow control flow elements.

Validation of results should be carried out. Some common workflow customizations are provided at the end of this section.

Inputs to the workflow

To run this workflow, you will need:

Image metadata_table_workflow
Figure 14.86: Metadata describing samples from a tumor-normal comparison experiment.

Launching the workflow

The RNA-Seq and Differential Gene Expression Analysis workflow is at:

        Toolbox | Template Workflows | Basic Workflow Designs (Image basic_twf_folder_closed_16_n_p) | RNA-Seq and Differential Gene Expression Analysis (Image rna_diff_expression_twf_16_n_p)

Launch the workflow and step through the wizard.

  1. Select the trimmed reads to be processed.
  2. In the next steps, select the reference sequence, genes, mRNA, and CDS tracks, and finally, the gene ontology. See the customizations at the end of this section if not all inputs are relevant for your species.
  3. Next, choose "Use metadata" for defining the batch units. Select the CLC Metadata Table or the Excel, CSV or TSV format file containing information about the samples, and choose the column used for grouping the reads into batch units (figure 14.87). For further details see Defining batch units based on metadata.
  4. In the next step, you can review the batch units resulting from your selections above.
  5. Specify next the differential expression settings (figure 14.88).
  6. In the next step, you can click on Preview All Parameters to review your settings.
  7. In the final step, choose a location to save the results to.

Image metadata_wizard
Figure 14.87: After selecting the metadata source, specify the column containing the information that groups the reads appropriately for the RNA-Seq analysis. Usually this would be a column containing a unique identifier per sample.

Image wizard-for-selecting-de
Figure 14.88: Specify the settings for the differential expression analysis. The columns from the metadata provided earlier will be available for selection in relevant options.

Tools in the workflow and generated outputs

The RNA-Seq and Differential Gene Expression Analysis workflow contains several tools and produces multiple outputs. The workflow has been configured to save many of the outputs to subfolders. These are created automatically within the folder that you selected to save results to when launching the workflow. See Configuring custom output names for details.

The following tools produce elements per sample:

The following tools output elements across all samples. Their outputs are saved to the subfolder Expression Analysis:

The following tools output elements across all samples. Their outputs are saved to the folder selected to save results to when launching the workflow:

Customizing the workflow

Template workflows can be easily edited to add, remove or change analysis steps. See Template workflows for information about how to open a copy of a template workflow for editing.