Identify Variants and Add Expression Values

The Identify Variants and Add Expression Values workflows can be used to identify novel and known mutations in RNA-Seq data, automatically map, quantify, and annotate the transcriptomes, and compare the mutational patterns in the samples with the expression values of the corresponding transcripts and genes.

The workflows can be found in the Toolbox at:

        Toolbox | Template Workflows | Biomedical Workflows (Image biomedical_twf_folder_open_16_n_p) | Whole Transcriptome Sequencing (Image rna_seq_group_closed_16_n_p) | Human (Image human_folder_closed_16_n_p) | Identify Variants and Add Expression Values (Image identify_variants_and_add_expression_values_wts_human_16_n_p)

        Toolbox | Template Workflows | Biomedical Workflows (Image biomedical_twf_folder_open_16_n_p) | Whole Transcriptome Sequencing (Image rna_seq_group_closed_16_n_p) | Mouse and Rat (Image mouse_folder_closed_16_n_p) | Identify Variants and Add Expression Values (M and R) (Image identify_variants_and_add_expression_values_wts_mouse_16_n_p)

After starting the workflow:

  1. If you are connected to a server, you will first be asked where you would like to run the analysis.

  2. Specify the RNA-Seq reads to analyze. The reads can be selected by double-clicking on the reads file name or clicking once on the file and then clicking on the arrow pointing to the right side in the middle of the wizard (figure 22.26).

    Image rnaseq_identify_variants_expression_step2
    Figure 22.26: Select the sequencing reads to analyze.

  3. Select the Reference Data Set that is relevant to your study (figure 22.27).

    Image rnaseq_identify_variants_expression_step1
    Figure 22.27: Select the relevant data set for the samples being studied.

  4. Configure the parameters for the RNA-Seq Analysis (figure 22.28).

    Image rnaseq_identify_variants_expression_step3
    Figure 22.28: Configure the RNA-Seq Analysis.

    If you wish to use spike-in controls, add the relevant file in the "Spike-in controls" field.

    You can also specify that the reads should be mapped only in their forward or reverse orientation (it is by default set to both). Choosing to restrict mapping to one direction is typically appropriate when a strand specific protocol for read generation has been used, as it allows assignment of the reads to the right gene in cases where overlapping genes are located on different strands. Also, applying the 'strand specific' 'reverse' option in an RNA-Seq run could allow the user to assess the degree of antisense transcription. Note that mate pairs are not supported when choosing the forward only or reverse only option.

  5. Specify a target region for the Indels and Structural Variants tool (figure 22.29).

    Image rnaseq_identify_variants_expression_step4
    Figure 22.29: Specify the target region for the Indels and Structural Variants tool.

    The targeted region file is a file that specifies which regions have been sequenced. This file is something that you must provide yourself, as this file depends on the technology used for sequencing. You can obtain the targeted regions file from the vendor of your targeted sequencing reagents. Remember that you have a hg38-specific BED file when using hg38 as reference, and hg19-specific BED file when using hg19 as reference.

  6. Set the parameters for the Low Frequency Variant Detection step (see figure 22.30). For a description of the different parameters that can be adjusted in the variant detection step, see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Low_Frequency_Variant_Detection.html.

    Image rnaseq_identify_variants_expression_step5
    Figure 22.30: Specify the parametes for transcriptomic variant detection.

  7. If you are working with the workflow for Human, specify here the relevant 1000 Genomes population (and HapMap populations at the next step) (see figure 22.31). Choose the population that matches best the population your samples are derived from.

    Image rnaseq_identify_variants_expression_step6
    Figure 22.31: Select the relevant population from the drop-down list for Hapmap databases.

    Under "Locked settings" you can see that "Automatically join adjacent MNVs and SNVs" has been selected. The reason for this is that many databases do not report a succession of SNVs as one MNV as is the case for CLC Workbench, and as a consequence it is not possible to directly compare variants called with CLC Workbench with these databases. In order to support filtering against these databases anyway, the option to Automatically join adjacent MNVs and SNVs is enabled. This means that an MNV in the experimental data will get an exact match, if a set of SNVs and MNVs in the database can be combined to provide the same allele. This assumes that SNVs and MNVs in the track of known variants represent the same allele, although there is no evidence for this in the track of known variants.

  8. In the last wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters. In the Preview All Parameters wizard you can only check the settings, and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows.

  9. Choose to Save your results and click on the button labeled Finish.

The following outputs are generated:

  1. Gene expression (Image rnaseqtrack_16_h_p) A track showing gene expression annotations. Hold the mouse over or right-click on the track: a tooltip will appear with information about e.g. gene name and expression values.
  2. Transcript expression (Image rnaseqtrack_16_h_p) A track showing transcript expression annotations.
  3. RNA-Seq Mapping Report (Image proteinreport_16_n_p) This report contains information about the reads, reference, transcripts, and statistics. This is explained in more details here: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=RNA_Seq_report.html.
  4. Read Mapping (Image read_track_16_n_p) The mapped RNA-Seq reads. The RNA-Seq reads are shown in different colors depending on their orientation, whether they are single reads or paired reads, and whether they map unambiguously (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Coloring_mapped_reads.html).
  5. Annotated Variants with Expression Values (Image variant_track_16_n_p) Annotation track showing the variants. Hold the mouse over one of the variants or right-clicking on the variant. A tooltip will appear with detailed information about the variant.
  6. RNA-Seq Track List (Image trackset_16_n_p) A collection of tracks presented together. Shows the annotated variant track together with the human reference sequence, genes, transcripts, coding regions, and variants detected in ClinVar and dbSNP (see figure 22.16).
  7. Log (Image table) A log of the workflow execution.