Subsections


Detect and Refine Fusion Genes

Detect and Refine Fusion Genes finds fusion genes in a two-step process. The detect step identifies potential fusions and the refine step accumulates and evaluates the evidence for each fusion. Briefly, the detect step works by re-mapping the unaligned ends of reads and determining if these are consistent with a fusion. Fusions are identified from reads that must have an unaligned end close to an exon boundary that can be remapped close to another exon boundary. If the option for Detect fusions with novel exon boundaries has been enabled, the tool also considers reads that are far from an exon boundary and/or whose unaligned ends can be mapped far from an exon boundary in a second pass.

The refine step takes the fusions identified in the detect step, and re-counts the number of fusion crossing reads as well as the wildtype supporting reads using an RNA-Seq mapping against the wild type and fusion references. The fusion reference is an artificial reference sequence that "assumes" the detected fusions by generating new chromosomes corresponding to each fusion in addition to the original chromosomes (figure 7.13).

Image refinefusion
Figure 7.13: An artificial chromosome is created consisting of the vicinity of both ends of the fusion.

All reads are remapped to the artificial reference, with the expectation that reads that were used to detect the fusion will now map to the fusion transcript with a spliced read. In addition, some reads that did not originally map at all will now map to the artifical reference sequence, increasing evidence for the fusion event. The tool then calculates the Z-score and p-value using a binomial test.

The Detect and Refine Fusion Genes tool can be found in the Toolbox at:

        Tools | QIAseq Panel Expert Tools (Image qiaseq_expert_folder_closed_16_n_p) | QIAseq RNAscan Panel Expert Tools (Image fusion_gene_detection_folder_closed_16_n_p) | Detect and Refine Fusion Genes (Image find_fusions_16_n_p)

The Detect and Refine Fusion Genes tool takes takes a sequence list (Image seq_list_nucleotide) as input (figure 7.14).

Image detect_and_refine_1
Figure 7.14: Select sequences.

In the next dialog figure 7.15, specify the reads track containing an RNA-Seq read mapping as well as reference sequence, gene and mRNA tracks from the CLC_References folder of the Navigation Area. It is possible - but optional - to add a CDS or primer track to run the analysis.

Image detect_and_refine_2
Figure 7.15: Specify reads track, references and parameters for the detection.

The additional parameters to set are:

In the Refine dialog figure 7.16, specify parameters related to the refine step.

Image detect_and_refine_3
Figure 7.16: Specify parameters for refinement.

The remaining parameters apply to the RNA-Seq read mapping to the artificial references (see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Mapping_settings.html for details).

Output from the Detect and Refine Fusion Genes tool

The Detect and Refine Fusion Genes tool produces a number of outputs:

  1. Fusion Genes (WT): The Fusion Genes track contains the breakpoints of all detected fusions. The track is described in more details below, see 7.4.1.
  2. Reads (WT): A read mapping to the WT genome. Reads are mapped to a combination of the WT genome and the artificial fusion chromosomes. Reads mapping better to the artificial fusion chromosomes will be in the Reads (fusions) output.
  3. Unaligned Ends: A read mapping showing where the unaligned ends map to the reference genome. The unaligned ends track is useful when choosing how to set the parameters "Minimum unaligned end read count", "Minimum length of unaligned sequence", and "Maximum distance to exon boundary" for a particular panel and sequencing protocol in order to find known fusions, as it shows which unaligned ends of reads were considered and where they were mapped.
  4. Fusion Genes (fusions): Breakpoints for the detected fusions on the artificial reference.
  5. Reads (fusions): A read mapping to the artificial fusion chromosomes. Reads are mapped to a combination of the WT genome and the artificial fusion chromosomes. Reads mapping better to WT genome will be in the Reads (WT) output.
  6. Reference Sequence (fusions): Reference sequence for the artificial reference.
  7. mRNA (fusions): mRNA transcripts corresponding to the detected fusions on the artificial reference.
  8. Genes (fusions): Gene region for the fused gene product on the artificial reference.
  9. CDS (fusions): If the CDS track was provided, this track contains the CDS region for the fused gene product on the artificial reference.
  10. Primers (fusions): If a primer track was provided, this track contains the primer regions on the artifical reference. Note that only primers for genes involved in a detected fusion will be represented here, and that the same primer can be in multiple fusion chromosomes, if the the same gene is involved in multiple fusions.
  11. Report: A report containing graphical representations of the fusions passing all filters. The report is described in more detail below, see 7.4.1.


Fusion tracks

The fusion track has a table view containing the following information:


Detect and Refine Fusion Genes report

In the Result handling dialog, it is possible to choose to output a report containing sample and unaligned ends information as well as detailed information and plots for all fusions passing filters (figure 7.17):

This report can be used together with the Combine Reports tool (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Combine_Reports.html)

The report has three sections: "Summary", "Unaligned Ends" and "Fusions". The summary section has a table with the sample name. The unaligned ends section contains a table with statistics on the unaligned ends used to detect fusions, the table has the following information:

Image detect_and_refine_report_unaligned_ends
Figure 7.17: Unaligned ends section in Detect and Refine Fusions report.

The Fusion section lists all fusions with FILTER=PASS. Each Fusion Gene is described by two tables and a fusion plot (figure 7.18).

Image refinefusionreport
Figure 7.18: A report section for a fusion gene.

The first table contains an overview of the most supported fusion for the fusion gene. Values in this table include:

The second table lists values for all supported fusion breakpoints in the fusion gene, sorted by read count. Therefore the first row in the table recapitulates some of the values from the first table. Additional rows show evidence for other fusions between the same two genes. At most 10 rows are shown.

The fusion plot visualizes all fusions between the reported transcripts.

Known limitations