Identify Candidate Variants and Genes from Tumor Normal Pair

The Identify Candidate Variants and Genes from Tumor Normal Pair tool identifies somatic variants and differentially expressed genes in a tumor normal pair. One tumor normal pair can be compared at the time. If you would like to compare more than one pair you must repeat the analysis with the next tumor normal pair.

To run the ready-to-use workflow:

        Toolbox | Ready-to-Use Workflows | Whole Transcriptome Sequencing (Image rna_seq_group_closed_16_n_p) | Identify Candidate Variants and Genes from Tumor Normal Pair (Image identify_variants_tumor_normal_pair_wts_16_n_p)

  1. Double-click on the Identify Candidate Variants and Genes from Tumor Normal Pair tool to start the analysis. If you are connected to a server, you will first be asked where you would like to run the analysis. Next, you will be asked to select the RNA-seq reads from the normal sample. The panel in the left side of the wizard shows the kind of input that should be provided (figure 16.19). Select by double-clicking on the reads file name or clicking once on the file and then clicking on the arrow pointing to the right side in the middle of the wizard. Click on the button labeled Next.

    Image rnaseq_identify_candidate_variants_step2
    Figure 16.19: Select the RNA-seq reads from the normal sample.

  2. In the next step you will be asked to select the RNA-seq reads from the tumor sample (see figure 16.20).

    Image rnaseq_identify_candidate_variants_step3
    Figure 16.20: Select the RNA-seq reads from the tumor sample.

  3. Click on the button labeled Next. In this wizard step (figure 16.21) you can adjust the settings for the Create fold change track tool. In brief, what the tool does is, for each transcript or gene, to calculate the ratio between the expression values in the normal and the tumor sample. This makes it possible to filter on fold changes and expression values, which makes it easy to identify differentially expressed transcripts or genes. The parameters that can be adjusted in this wizard step are described in detail in the CLC Cancer Research Workbench user manual (see http://clcsupport.com/clccancerresearchworkbench/current/index.php?manual=Create_fold_change_track.html).

    Image rnaseq_identify_candidate_variants_step4
    Figure 16.21: Specify the parameters for variant calling.

  4. Click on the button labeled Next. This will allow you to specify the parameters for the variant detection (figure 16.22). For a description of the different parameters that can be adjusted in the variant detection step, we refer to the description of the "Low Frequency Variant Detection" tool in the CLC Cancer Research Workbench user manual (http://www.clcsupport.com/clccancerresearchworkbench/current/index.php?manual=Low_Frequency_Variant_Detection.html). . As general filters are applied to the different variant detectors that are available in CLC Cancer Research Workbench, the description of the filters are found in a separate section called "Filters" (see http://www.clcsupport.com/clccancerresearchworkbench/current/index.php?manual=Variant_Detectors_filters.html). If you click on "Locked Settings", you will be able to see all parameters used for variant detection in the ready-to-use workflow.

    Image rnaseq_identify_candidate_variants_step5
    Figure 16.22: Specify the parameters for variant calling.

  5. The next wizard step (figure 16.23) concerns removal of germline variants. You are asked to supply the number of reads in the control data set that should support the variant allele in order to include it as a match. All the variants where at least this number of control reads show the particular allele will be filtered away in the result track.

    Image rnaseq_identify_candidate_variants_step6
    Figure 16.23: Specify the number of reads to use as cutoff for removal of germline variants.

  6. In the next wizard step variants found in known databases are removed. Actually the variants from a range of different databases are removed in this ready-to-use workflow, but only databases that provide data from more than one population needs to be specified by the user. This is the case for the HapMap database. From the drop-down list you can choose the population that matches the population your samples are derived from (figure 16.24). The drop-down list shows the populations that were selected under "Data Management" as described in the CLC Cancer Research Workbench user manual (http://www.clcsupport.com/clccancerresearchworkbench/current/index.php?manual=Download_configure_reference_data.html).

    Image rnaseq_identify_candidate_variants_step7
    Figure 16.24: Select the relevant population from the drop-down list.

  7. Click on the button labeled Next to go to the last wizard step (shown in figure 16.25).

    Image rnaseq_identify_candidate_variants_step8
    Figure 16.25: Check the selected parametes by pressing "Preview All Parameters".

    Pressing the button Preview All Parameters allows you to preview all parameters. At this step you can only view the parameters, it is not possible to make any changes (see figure 16.26). Choose to save the results and click on the button labeled Finish.

    Image rnaseq_identify_candidate_variants_step7
    Figure 16.26: Preview all parameters. At this step it is not possible to introduce any changes, it is only possible to view the settings.

Thirteen types of output are generated:

  1. Gene Expression Normal (Image rnaseqtrack_16_h_p) A track showing gene expression annotations. Hold the mouse over or right-clicking on the track. A tooltip will appear with information about e.g. gene name and gene expression values.
  2. Transcript Expression Normal (Image rnaseqtrack_16_h_p) A track showing transcript expression annotations. Hold the mouse over or right-clicking on the track. A tooltip will appear with information about e.g. gene name and transcript expression values.
  3. RNA-Seq Mapping Report Normal (Image proteinreport_16_n_p) This report contains information about the reads, reference, transcripts, and statistics. This is explained in more detail in the CLC Cancer Research Workbench reference manual in section RNA-Seq report (http://clcsupport.com/clccancerresearchworkbench/current/index.php?manual=RNA_Seq_report.html).
  4. Gene Expression Tumor (Image rnaseqtrack_16_h_p) A track showing gene expression annotations. Hold the mouse over or right-clicking on the track. A tooltip will appear with information about e.g. gene name and gene expression values.
  5. Transcript Expression Tumor (Image rnaseqtrack_16_h_p) A track showing transcript expression annotations. Hold the mouse over or right-clicking on the track. A tooltip will appear with information about e.g. gene name and transcript expression values.
  6. RNA-Seq Mapping Report Tumor (Image proteinreport_16_n_p) This report contains information about the reads, reference, transcripts, and statistics. This is explained in more detail in the CLC Cancer Research Workbench reference manual in section RNA-Seq report (http://clcsupport.com/clccancerresearchworkbench/current/index.php?manual=RNA_Seq_report.html).
  7. Differentially Expressed Genes (Image expression_comparison_track_16_n_p) A track showing the differentially expressed genes. The table view provides information about fold change, difference in expression, the maximum expression (observed in either the case or the control), the expression in the case, and the expression in the control.
  8. Read Mapping Tumor (Image read_track_16_n_p) The mapped RNA-seq reads. The RNA-seq reads are shown in different colors depending on their orientation, whether they are single reads or paired reads, and whether they map unambiguously. For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here: http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=View_settings_in_Side_Panel.html.
  9. Read Mapping Normal (Image read_track_16_n_p) The mapped RNA-seq reads. The RNA-seq reads are shown in different colors depending on their orientation, whether they are single reads or paired reads, and whether they map unambiguously. For the color codes please see the description of sequence colors in the CLC Genomics Workbench manual that can be found here: http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=View_settings_in_Side_Panel.html.
  10. Variant Calling Report Tumor (Image proteinreport_16_n_p) Report showing error rates for quality categories, quality of examined sites, and estimated frequencies of actual to called bases for different quality score ranges.
  11. Annotated Somatic Variants with Expression Values (Image variant_track_16_n_p) A variant track showing the somatic variants. When mousing over a variant, a tooltip will appear with information about the variant.
  12. Genome Browser View RNA-Seq Tumor_Normal Comparison (Image trackset_16_n_p) A collection of tracks presented together. Shows the annotated variants track together with the human reference sequence, genes, transcripts, coding regions, and variants detected in COSMIC, ClinVar and dbSNP (see figure 16.27).
  13. Log (Image table) A log of the workflow execution.

Image rnaseq_identify_candidate_variants_genomebrowserview
Figure 16.27: The Genome Browser View is a collection of a number of tracks. The Genome Browser View makes it easy to compare the different tracks. Each track kan be opened individually by double-clicking on the track name in the left side of the View Area.