Compare Variants in DNA and RNA

Integrated analysis of genomic and transcriptomic sequencing data is a powerful tool that can help increase our current understanding of genomic variants. The Compare Variants in DNA and RNA workflow identifies variants in DNA and RNA and studies the relationship between the identified genomic and transcriptomic variants.

To run the Ready-to-Use workflow:

        Toolbox | Ready-to-Use Workflows | Whole Transcriptome Sequencing (Image rna_seq_group_closed_16_n_p) | (Human, Mouse or Rat) | Compare Variants in DNA and RNA (Image compare_variants_dna_rna_wts_16_n_p)

  1. Double-click on the Compare variants in DNA and RNA ready-to-use workflow to start the analysis. If you are connected to a server, you will first be asked where you would like to run the analysis.

  2. Select the RNA reads that you would like to analyze (figure 24.8).

    Image compare_variants_dna_rna_step2
    Figure 24.8: Select the RNA reads to analyze.

  3. Select now the DNA reads to analyze (see figure 24.9).

    Image compare_variants_dna_rna_step3
    Figure 24.9: Select the DNA reads to analyze.

  4. Select the Reference Data Set that is relevant to your study (figure 24.10).

    Image compare_variants_dna_rna_step1
    Figure 24.10: Select the relevant data set for the samples being studied.

  5. If you are working with the workflow for Human, it is possible to specify in the next two steps the 1000 Genomes population that describes best your samples (see figure 24.11).

    Image wts3
    Figure 24.11: Select the relevant population from the drop-down list.

  6. Repeat the previous steps to specify the Hapmap population that characterizes best your samples (see figure 24.12).

    Image wts4
    Figure 24.12: Select the relevant population from the drop-down list.

  7. Configure the parameters for the RNA-Seq Analysis (figure 24.13).

    If you wish to use spike-in controls, add the relevant file in the "Spike-in controls" field.

    You can also specify that the reads should be mapped only in their forward or reverse orientation (it is by default set to both). Choosing to restrict mapping to one direction is typically appropriate when a strand specific protocol for read generation has been used, as it allows assignment of the reads to the right gene in cases where overlapping genes are located on different strands. Also, applying the 'strand specific' 'reverse' option in an RNA-seq run could allow the user to assess the degree of antisense transcription. Note that mate pairs are not supported when choosing the forward only or reverse only option.

    Image wts0
    Figure 24.13: Configure the RNA-Seq Analysis. Here we specified a file for spike-in control but left the strand specific parameter to its default value.

  8. Specify a target region for the analysis of the RNA sample with the Indels and Structural Variants tool (figure 24.14). Repeat for the DNA sample at the next step.

    The targeted region file is a file that specifies which regions have been sequenced. This file is something that you must provide yourself, as this file depends on the technology used for sequencing. You can obtain the targeted regions file from the vendor of your targeted sequencing reagents. Remember that you have a hg38-specific BED file when using hg38 as reference, and hg19-specific BED file when using hg19 as reference.

    Image wts1
    Figure 24.14: Specify the target region for the Indels and Structural Variants tool.

  9. Set the parameters for the Low Frequency Variant Detection step for your RNA sample (see figure 24.15), and for the DNA sample at the next step. For a description of the different parameters that can be adjusted in the variant detection step, see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Low_Frequency_Variant_Detection.html.

    Image wts2
    Figure 24.15: Specify the parametes for transcriptomic variant detection.

  10. Click Next to go to the result handling step. Preview All Parameters allows you to view all parameters, but not edit them. Choose to save the results and click Finish to select a location to save the results to and start the analysis.

The following outputs are generated:

  1. A DNA Read Mapping and a RNA Read Mapping (Image read_track_16_n_p) The mapped DNA or RNA sequencing reads. The sequencing reads are shown in different colors depending on their orientation, whether they are single reads or paired reads, and whether they map unambiguously (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Coloring_mapped_reads.html.

  2. A DNA Mapping Report and a RNA Mapping Report (Image proteinreport_16_n_p) This report contains information about the reads, reference, transcripts, and statistics (see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=RNA_Seq_report.html).

  3. An RNA Gene Expression (Image rnaseqtrack_16_h_p) A track showing gene expression annotations. Hold the mouse over or right-click on the track: if you have zoomed in to nucleotide level, a tooltip will appear with information about gene name and expression values.

  4. An RNA Transcript Expression (Image rnaseqtrack_16_h_p) A track showing transcript expression annotations.

  5. A Filtered Variant Track with All Variants Found in DNA or RNA (Image variant_track_16_n_p) This track shows all variants that have been detected in either RNA, DNA or both.

  6. A Filtered Variant Track with Variants Found in Both DNA and RNA (Image variant_track_16_n_p) This track shows only the variants that are present in both DNA and RNA. With the table icon (Image table) found in the lower left part of the View Area it is possible to switch to table view. The table view provides details about the variants such as type, zygosity, and information from a range of different databases.

  7. A Track List Variants Found in DNA and RNA (Image trackset_16_n_p) A collection of tracks presented together. Shows the annotated variant track together with the human reference sequence, genes, transcripts, coding regions, and variants detected in ClinVar and dbSNP Common (see figure 24.16).

Image compare_variants_dna_rna_genomebrowserview
Figure 24.16: The Track List view makes it easy to compare a range of different data.

The three most important tracks generated are the Variants found in both DNA and RNA track, All variants found in DNA or RNA track, and the Track List. The Track List view makes it easy to get an overview in the context of a reference sequence, and compare variant and expression tracks with information from different databases. The two other tracks (Variants found in both DNA and RNA track and All variants found in DNA or RNA track) provides detailed information about the detected variants when opened in table view.