Detect QIAseq RNAscan Fusions

QIAseq Targeted RNAscan Panels use molecular barcode technology to quantify a large number of fusion genes and identify new fusion gene partners.

The concept of molecular barcoding is that during library preparation of the samples, a Unique Molecular Index (UMI) is added to each read. The barcoded molecules are then amplified by PCR. Due to intrinsic noise and sequence-dependent bias, barcoded sequences may be amplified unevenly. Thus, target quantification can be better achieved by counting the number of UMIs in the reads rather than counting the total number of reads for each gene. Reads having different UMIs represent different original molecules, while reads having the same UMI are results of PCR duplication from one original molecule.

The first step in the Detect QIAseq RNAscan Fusions template workflow trims the UMI while retaining the UMI information as an annotation on the read. UMI reads are then created using the Create UMI Reads from Reads. Any remaining PCR adapters are also trimmed away before reads are mapped to the human transcriptome for the purposes of RNA-Seq analysis. In the final stage of the workflow, potential fusion genes are detected, and the identified fusion events are refined to increase the sensitivity and specificity of the calls.

Detection. The workflow first trims all remaining adapters from the reads. The trimmed reads are then mapped to the reference transcriptome sequence. The Detect and Refine Fusion Genes tool (http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Detect_Refine_Fusion_Genes.html) will identify fusion events based primarily on the number of fusion crossing reads, and subsequently on the number of fusion spanning reads. However, when determining whether a read actually crosses the fusion, the tool takes into account the length of the unaligned end, as well as exon boundaries (as at the RNA level, fusions usually happen at exon boundaries). Finally, other evidence, such as whether the unaligned end maps many places in the genome, are considered. Note that the parameters of the Define and Refine Fusion Genes tool when included in the workflow are configured differently than the default values of the tool used on its own. In particular, filters have been relaxed to not overlook any fusion.

Refinement. The Detect and Refine Fusion Genes tool considers at most 200 identified fusions for refinement (the ones with the lowest provisional p-value, i.e., the highest provisional Z-score). The tool then re-maps the original trimmed reads against a set of fusion references (i.e., the transcriptome including putative fusion transcripts). We expect that some previously unmapped or poorly mapped reads will now map directly to the fusion transcripts, resulting in a more accurate detection of fusion supporting reads.

The Detect and Refine Fusion Genes tool uses a binomial model to evaluate the fusions. The null hypothesis is that there is no fusion, i.e., the reads originate from the wild type transcript. Hence, a small p-value suggests a fusion transcript. Reads are assigned to either come from fusion or wild type transcripts based on how well they map to either. This assignment is based on mapping, and it will have an error rate (e) that we estimate from test data. In addition, we require a minimum number of reads to support any fusion breakpoint before considering it as a fusion. This guards against false positives due to low coverage. In addition, we require a minimum number of reads to support any fusion breakpoint before considering it as a fusion. This guards against false positives due to low coverage.

The Z-score and p-value are then calculated using a standard one-tailed binomial test and an "Assumed error rate". This Assumed error rate is a mapping error rate, i.e., the probability of an unaligned end mapping to another gene by random. The p-value represents the probability of spanning/crossing reads (indicating a fusion), under the null hypothesis where a fraction (i.e., the "Assumed error rate") of reads map there by chance.

The Detect and Refine Fusion Genes tool outputs a maximum of 200 identified fusions in a fusion track. In addition to that track, the tool will also generate a set of "fusion references", i.e., a sequence track, gene track, mRNA track, CDS track and primer track that assumes the identified fusions. The tool also outputs a fusion breakpoint track and a report.

The Detect QIAseq RNAscan Fusions template workflow can be found in the Toolbox at:

        Template Workflows | Biomedical Workflows (Image biomedical_twf_folder_open_16_n_p) | QIAseq Sample Analysis (Image qiaseqrna_folder_closed_16_n_p) | QIAseq RNA Workflows (Image qiaseq_workflows_folder_closed_16_n_p) | Detect QIAseq RNAscan Fusions (Image fusion_gene_detection_16_h_p)

Double-click on the Detect QIAseq RNAscan Fusions template workflow to run the analysis.

If you are connected to a CLC Server via the CLC Workbench, you will be asked where you would like to run the analysis. We recommend that you run the analysis on a CLC Server when possible.

In the Select reads dialog, specify the sequencing reads to analyze (figure 14.1).

Image selectreadsgfd
Figure 14.1: Select the sequencing reads by double-clicking on the file name or by clicking once on the file name and then on the arrow pointing to the right hand side.

The following dialog helps you set up the relevant Reference Data Set. If you have not downloaded the Reference Data Set yet, the dialog will suggest the relevant data set and offer the opportunity to download it using the Download to Workbench button. (figure 14.2).

Image selectrdsrnascan
Figure 14.2: The relevant Reference Data Set is highlighted; in the text to the right, the types of reference needed by the workflow are listed. There is also an indication of how many data sets can be used with the workflow. In this case, the other data sets would only be visible when opening the "QIAGEN Previous" or "QIAGEN Tutorial" folder.

Note that if you wish to Cancel or Resume the Download, you can close the template workflow and open the Reference Data Manager where the Cancel, Pause and Resume buttons are available.

If the Reference Data Set was previously downloaded, the option "Use the default reference data" is available and will ensure the relevant data set is used. You can always check the "Select a reference set to use" option to be able to specify another Reference Data Set than the one suggested.

In the Select primers dialog, choose the primer corresponding to the panel used to generate the reads (figure 14.3).

Image selectprimersrnascan
Figure 14.3: Select the primer track for the relevant panel.

In the Detect and Refine Fusion Genes dialog, it is possible to change the Promiscuity threshold, i.e., the maximum number of different fusion partners reported for a gene. You can also check for exon skippings by enabling the "Detect exon skippings" option, as well as check for fusions with novel exon boundaries by enabeling the "Detect fusions with novel exon boundaries" option.

In the final wizard step, choose to Save the results of the workflow and specify a location in the Navigation Area before clicking Finish.

Launching using the QIAseq Panel Analysis Assistant

The workflow is also available in the QIAseq Panel Analysis Assistant under Targeted RNAscan.



Subsections