Identify Candidate Variants and Genes from Tumor Normal Pair

The Identify Candidate Variants and Genes from Tumor Normal Pair workflows identify somatic variants and differentially expressed genes in a tumor normal pair. One tumor normal pair can be compared at the time. If you would like to compare more than one pair you must repeat the analysis with the next tumor normal pair.

The workflows can be found in the Toolbox at:

        Toolbox | Template Workflows | Biomedical Workflows (Image biomedical_twf_folder_open_16_n_p) | Whole Transcriptome Sequencing (Image rna_seq_group_closed_16_n_p) | Human (Image human_folder_closed_16_n_p) | Identify Candidate Variants and Genes from Tumor Normal Pair (Image identify_variants_tumor_normal_pair_wts_human_16_n_p)

        Toolbox | Template Workflows | Biomedical Workflows (Image biomedical_twf_folder_open_16_n_p) | Whole Transcriptome Sequencing (Image rna_seq_group_closed_16_n_p) | Mouse and Rat (Image mouse_folder_closed_16_n_p) | Identify Candidate Variants and Genes from Tumor Normal Pair (M and R) (Image identify_variants_tumor_normal_pair_wts_mouse_16_n_p)

After starting the workflow:

  1. If you are connected to a server, you will first be asked where you would like to run the analysis.

  2. Specify the RNA-Seq reads from the tumor sample (the panel in the left side of the wizard shows the kind of input that should be provided as in figure 22.17). Click Next.

    Image rnaseq_identify_candidate_variants_step2
    Figure 22.17: Select the RNA-Seq reads from the tumor sample.

  3. In the next step you will be asked to select the RNA-Seq reads from the normal sample (see figure 22.18). Click Next.

    Image rnaseq_identify_candidate_variants_step3
    Figure 22.18: Select the RNA-Seq reads from the normal sample.

  4. Select the Reference Data Set that is relevant to your study (figure 22.19).

    Image rnaseq_identify_candidate_variants_step1
    Figure 22.19: Select the relevant data set for the samples being studied.

  5. Configure the parameters for the RNA-Seq Analysis (figure 22.20), first for the tumor sample, and then for the normal sample in the following step.

    Image rnaseq_identify_candidate_variants_step4
    Figure 22.20: Configure the RNA-Seq Analysis. Here we specified a file for spike-in control but left the strand specific parameter to its default value.

    If you wish to use spike-in controls, add the relevant file in the "Spike-in controls" field.

    You can also specify that the reads should be mapped only in their forward or reverse orientation (it is by default set to both). Choosing to restrict mapping to one direction is typically appropriate when a strand specific protocol for read generation has been used, as it allows assignment of the reads to the right gene in cases where overlapping genes are located on different strands. Also, applying the 'strand specific' 'reverse' option in an RNA-Seq run could allow the user to assess the degree of antisense transcription. Note that mate pairs are not supported when choosing the forward only or reverse only option.

  6. Specify in the next two dialog a target region for the analysis of the sample with the Indels and Structural Variants tool, first for the tumor sample, followed by the normal sample (figure 22.12).

    Image wts5
    Figure 22.21: Specify the target region for the Indels and Structural Variants tool.

    The targeted region file is a file that specifies which regions have been sequenced. This file is something that you must provide yourself, as this file depends on the technology used for sequencing. You can obtain the targeted regions file from the vendor of your targeted sequencing reagents. Remember that you have a hg38-specific BED file when using hg38 as reference, and hg19-specific BED file when using hg19 as reference.

  7. Set the parameters for the Low Frequency Variant Detection step (see figure 22.22). For a description of the different parameters that can be adjusted in the variant detection step, see

    Image wts6
    Figure 22.22: Specify the parameters for variant calling.

  8. The next dialog called Remove Variants Present in Control Reads (figure 22.23) concerns removal of germline variants. You are asked to supply the number of reads in the control data set that should support the variant allele in order to include it as a match. All the variants where at least this number of control reads show the particular allele will be filtered away in the result track.

    Image wts7
    Figure 22.23: Specify the number of reads to use as cutoff for removal of germline variants.

  9. Finally, for the Remove Variants Found in HapMap (figure 22.24), you can also specify which specific Hapmap population(s) characterize(s) best the samples.

    Image wts8
    Figure 22.24: Remove Hapmap variants.

  10. In the last wizard step you can check the selected settings by clicking on the button labeled Preview All Parameters. In the Preview All Parameters wizard you can only check the settings, and if you wish to make changes you have to use the Previous button from the wizard to edit parameters in the relevant windows.

  11. Choose to Save your results and click on the button labeled Finish.

The following outputs are generated:

  1. Gene Expression Normal and Gene Expression Tumor (Image rnaseqtrack_16_h_p) A track showing gene expression annotations. Hold the mouse over or right-click on the track: a tooltip will appear with information about e.g. gene name and gene expression values.

  2. Transcript Expression Normal and Transcript Expression Tumor (Image rnaseqtrack_16_h_p) A track showing transcript expression annotations.

  3. RNA-Seq Mapping Report Normal and RNA-Seq Mapping Report Tumor (Image proteinreport_16_n_p) This report contains information about the reads, reference, transcripts, and statistics. This is explained in more detail in the CLC Workbench reference manual in section RNA-Seq report (

  4. Read Mapping Normal and Read Mapping Tumor (Image read_track_16_n_p) The mapped RNA-Seq reads. The RNA-Seq reads are shown in different colors depending on their orientation, whether they are single reads or paired reads, and whether they map unambiguously (see

  5. Differentially Expressed Genes file (Image expression_comparison_track_16_n_p) A track showing the differentially expressed genes. The table view provides information about fold change, difference in expression, the maximum expression (observed in either the case or the control), the expression in the case, and the expression in the control.

  6. Variant Calling Report Tumor (Image proteinreport_16_n_p) Report showing error rates for quality categories, quality of examined sites, and estimated frequencies of actual to called bases for different quality score ranges.

  7. Annotated Somatic Variants with Expression Values (Image variant_track_16_n_p) A variant track showing the somatic variants. When mousing over a variant, a tooltip will appear with information about the variant.

  8. Amino Acid Track

  9. Track List RNA-Seq Tumor_Normal Comparison (Image trackset_16_n_p) A collection of tracks presented together. Shows the annotated variant track together with the human reference sequence, genes, transcripts, coding regions, and variants detected in ClinVar and dbSNP Common (see figure 22.25).

Image rnaseq_identify_candidate_variants_genomebrowserview
Figure 22.25: The Track List is a collection of tracks that makes it easy to compare them to each other. Each track kan be opened individually by double-clicking on the track name in the left side of the Track List view.