Identify variants and add expression values

The Identify Variants and Add Expression Values ready-to-use workflows can be used to identify novel and known mutations in RNA-seq data, automatically map, quantify, and annotate the transcriptomes, and compare the mutational patterns in the samples with the expression values of the corresponding transcripts and genes.

To run the ready-to-use workflow:

        Toolbox | Ready-to-Use Workflows | Whole Transcriptome Sequencing (Image rna_seq_group_closed_16_n_p) | Identify Variants and Add Expression Values (Image identify_variants_and_add_expression_values_wts_16_n_p)

  1. Double-click on the Identify Variants and Add Expression Values tool to start the analysis. If you are connected to a server, you will first be asked, where you would like to run the analysis. Next, you will be asked to select the RNA-seq reads. The reads can be selected by double-clicking on the reads file name or clicking once on the file and then clicking on the arrow pointing to the right side in the middle of the wizard (figure 16.30).

    Image rnaseq_identify_variants_expression_step2
    Figure 16.28: Select the sequencing reads to analyze.

    Click on the button labeled Next.

  2. In the next wizard step (figure 16.29) you can specify the parameters for variant detection. For a description of the different parameters that can be adjusted in the variant detection step, we refer to the description of the "Low Frequency Variant Detection" tool in the CLC Cancer Research Workbench user manual (http://www.clcsupport.com/clccancerresearchworkbench/current/index.php?manual=Low_Frequency_Variant_Detection.html). As general filters are applied to the different variant detectors that are available in CLC Cancer Research Workbench, the description of the filters are found in a separate section called "Filters" (see http://www.clcsupport.com/clccancerresearchworkbench/current/index.php?manual=Variant_Detectors_filters.html).

    Image rnaseq_identify_variants_expression_step3
    Figure 16.29: Specify the parameters for variant calling.

  3. The next two wizard steps are annotation steps where the detected variants are annotated with information from known databases. Actually the variants are annotated with a range of different data in this ready-to-use workflow, but only databases that provide data from more than one population needs to be specified by the user. This is the case for HapMap and the 1000 Genomes Project. First, the variants are annotated with information from the 1000 Genomes Project (see figure 16.30). From the drop-down list you can choose the population that matches the population your samples are derived from. The drop-down list shows the populations that were selected under "Data Management" as described in the CLC Cancer Research Workbench user manual (http://www.clcsupport.com/clccancerresearchworkbench/current/index.php?manual=Download_configure_reference_data.html).

    Under "Locked settings" you can see that "Automatically join adjacent MNVs and SNVs" has been selected. The reason for this is that many databases do not report a succession of SNVs as one MNV as is the case for the CLC Cancer Research Workbench, and as a consequence it is not possible to directly compare variants called with CLC Cancer Research Workbench with these databases. In order to support filtering against these databases anyway, the option to Automatically join adjacent MNVs and SNVs is enabled. This means that an MNV in the experimental data will get an exact match, if a set of SNVs and MNVs in the database can be combined to provide the same allele.

    Note! This assumes that SNVs and MNVs in the track of known variants represent the same allele, although there is no evidence for this in the track of known variants.

    Image rnaseq_identify_variants_expression_step4
    Figure 16.30: Select the relevant population from the drop-down list.

  4. Click on the button labeled Next and do the same to annotate with information from HapMap (figure 16.31).

    Image rnaseq_identify_variants_expression_step5
    Figure 16.31: Select the relevant population from the drop-down list.

  5. Click on the button labeled Next to go to the last wizard step (shown in figure 16.32).

    Image rnaseq_identify_variants_expression_step6
    Figure 16.32: Check the selected parametes by pressing "Preview All Parameters".

    Pressing the button Preview All Parameters allows you to preview all parameters. At this step you can only view the parameters, it is not possible to make any changes (see figure 16.33). Choose to save the results and click on the button labeled Finish.

    Image rnaseq_identify_variants_expression_step6preview
    Figure 16.33: Preview all parameters. At this step it is not possible to introduce any changes, it is only possible to view the settings.

Seven different output types are generated:

  1. Gene expression (Image rnaseqtrack_16_h_p) A track showing gene expression annotations. Hold the mouse over or right-clicking on the track. A tooltip will appear with information about e.g. gene name and expression values.
  2. Transcript expression (Image rnaseqtrack_16_h_p) A track showing transcript expression annotations. Hold the mouse over or right-clicking on the track. A tooltip will appear with information about e.g. gene name and expression values.
  3. RNA-Seq Mapping Report (Image proteinreport_16_n_p) This report contains information about the reads, reference, transcripts, and statistics. This is explained in more detail in the CLC Cancer Research Workbench reference manual in section RNA-Seq report (http://clcsupport.com/clccancerresearchworkbench/current/index.php?manual=RNA_Seq_report.html).
  4. Read Mapping (Image read_track_16_n_p) The mapped RNA-seq reads. The RNA-seq reads are shown in different colors depending on their orientation, whether they are single reads or paired reads, and whether they map unambiguously. For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here: http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=View_settings_in_Side_Panel.html.
  5. Annotated Variants with Expression Values (Image variant_track_16_n_p) Annotation track showing the variants. Hold the mouse over one of the variants or right-clicking on the variant. A tooltip will appear with detailed information about the variant.
  6. RNA-Seq Genome Browser View (Image trackset_16_n_p) A collection of tracks presented together. Shows the annotated variants track together with the human reference sequence, genes, transcripts, coding regions, and variants detected in COSMIC, ClinVar and dbSNP (see figure 16.18).
  7. Log (Image table) A log of the workflow execution.