QIAseq RNA workflows

QIAseq Targeted RNAscan Panels use molecular barcode technology to quantify a large number of fusion genes and identify new fusion gene partners.

The concept of molecular barcoding is that during library preparation of the samples, a Unique Molecular Index (UMI) is added to each read. The barcoded molecules are then amplified by PCR. Due to intrinsic noise and sequence-dependent bias, barcoded sequences may be amplified unevenly. Thus, target quantification can be better achieved by counting the number of UMIs in the reads rather than counting the total number of reads for each gene. Reads having different UMIs represent different original molecules, while reads having the same UMI are results of PCR duplication from one original molecule.

The first step in the Detect QIAseq RNAscan Fusions template workflow trims the UMI while retaining the UMI information as an annotation on the read. UMI reads are then created using the Create UMI Reads from Reads. Any remaining PCR adapters are also trimmed away before reads are mapped to the human transcriptome for the purposes of RNA-Seq analysis. In the final stage of the workflow, potential fusion genes are detected, and the identified fusion events are refined to increase the sensitivity and specificity of the calls.

Detection. The workflow first trims all remaining adapters from the reads. The trimmed reads are then mapped to the reference transcriptome sequence. The Detect and Refine Fusion Genes tool (http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Detect_Refine_Fusion_Genes.html) will identify fusion events based primarily on the number of fusion crossing reads, and subsequently on the number of fusion spanning reads. However, when determining whether a read actually crosses the fusion, the tool takes into account the length of the unaligned end, as well as exon boundaries (as at the RNA level, fusions usually happen at exon boundaries). Finally, other evidence, such as whether the unaligned end maps many places in the genome, are considered. Note that the parameters of the Define and Refine Fusion Genes tool when included in the workflow are configured differently than the default values of the tool used on its own. In particular, filters have been relaxed to not overlook any fusion.

Refinement. The Detect and Refine Fusion Genes tool considers at most 200 identified fusions for refinement (the ones with the lowest provisional p-value, i.e., the highest provisional Z-score). The tool then re-maps the original trimmed reads against a set of fusion references (i.e., the transcriptome including putative fusion transcripts). We expect that some previously unmapped or poorly mapped reads will now map directly to the fusion transcripts, resulting in a more accurate detection of fusion supporting reads.

The Detect and Refine Fusion Genes tool uses a binomial model to evaluate the fusions. The null hypothesis is that there is no fusion, i.e., the reads originate from the wild type transcript. Hence, a small p-value suggests a fusion transcript. Reads are assigned to either come from fusion or wild type transcripts based on how well they map to either. This assignment is based on mapping, and it will have an error rate (e) that we estimate from test data. In addition, we require a minimum number of reads to support any fusion breakpoint before considering it as a fusion. This guards against false positives due to low coverage. In addition, we require a minimum number of reads to support any fusion breakpoint before considering it as a fusion. This guards against false positives due to low coverage.

The Z-score and p-value are then calculated using a standard one-tailed binomial test and an "Assumed error rate". This Assumed error rate is a mapping error rate, i.e., the probability of an unaligned end mapping to another gene by random. The p-value represents the probability of spanning/crossing reads (indicating a fusion), under the null hypothesis where a fraction (i.e., the "Assumed error rate") of reads map there by chance.

The Detect and Refine Fusion Genes tool outputs a maximum of 200 identified fusions in a fusion track. In addition to that track, the tool will also generate a set of "fusion references", i.e., a sequence track, gene track, mRNA track, CDS track and primer track that assumes the identified fusions. The tool also outputs a fusion breakpoint track and a report.