Output from the Perform QIAseq Multimodal Analysis (Illumina)
The Perform QIAseq Multimodal Analysis workflow produces a large number of files organized into a number of subfolders as well as single elements.
The root folder contains four subfolders (QC & Reports, Tracks (WT), Tracks (Fusions) and VCF exportable tracks) in addition to the following output elements:
- A Workflow Result Metadata table keeping track of all generated output.
- A Gene Expression Track () with gene expression counts.
- A DNA Combined QC Report () summarizing important QC values for the DNA run.
- An RNA Combined QC Report () summarizing important QC values for the RNA run.
- A Fusion Report () (described in http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Output_from_Detect_Refine_Fusion_Genes_tool.html) with graphical representations of the fusions found. Double-clicking on a fusion plot in the report will open the plot in a view that allows it to be exported as a high resolution image.
- A Genome Browser View (WT) () containing the WT part of the analysis including fusions, DNA and RNA read mappings and variant callings.
- A Genome Browser View (Fusions) () containing only the fusion chromosomes, which are used for refining the fusions. Tracks for this view can be found in the Tracks (Fusion) folder.
The subfolder QC & Reports contain Reports for both the DNA and RNA part of the workflow. Each report has the prefix DNA or RNA as appropriate. The folder includes, among others, the following report types:
- Remove and annotate UMI reports which contain statistics on UMI barcodes.
- Trim Reads Reports called trim adapter report and quality trim report.
- A DNA UMI group report containing a breakdown of UMI groups with different number of reads, along with percentage of groups and reads (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Trim_output.html).
- Two UMI read reports for RNA and DNA reads, respectively, showing how many reads were ignored and the reason why the ignored reads were not included in a UMI read. Please note that the reports are generated by different tools and have different content.
- A DNA coverage report from the QC for Target Sequencing tool (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=QC_Targeted_Sequencing.html).
- An RNA-Seq report with statistics on the mapping of RNA reads (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=RNA_Seq_report.html).
- A CNV results report, if CNV detection has been run.
The subfolder called Tracks (WT) includes:
- Four UMI Read Mappings (). Three of these relate to the RNA reads; the "RNA read mapping" is the original RNA-Seq mapping, "Fusion genes unaligned ends" is the re-mapping of the unaligned ends of the RNA read mapping, and "Read mapping refined" is the reads that map to the original chromosomes when all reads are mapped to both wild-type and fusion chromosomes. The final read mapping is the DNA mapping.
- An unfiltered variant track () holds all detected variants before filters have been applied.
- A Per-Region Statistical Report track () holds information about coverage.
- An amino acid track () that displays a graphical representation of the amino acid changes. The track is based on the CDS track and in addition to the amino acid sequence of the coding sequence, all amino acids that have been affected by variants are shown as individual amino acids below the amino acid track. Changes causing a frameshift are symbolized with two arrow heads, and variants causing premature stop are marked with an asterisk.
- A Fusion Gene track () from the Detect and Refine Fusion Genes tool.
- Region- and Gene-level CNV tracks () if CNV detection has been run.
- Indels indirect evidence () produced by the Structural Variant Caller.
- An inversion and a long indels track () containing any inversions and indels longer than 100,000 bp respectively, produced by the Structural Variant Caller.
The folder Tracks (fusion) contains data related to the fusion chromosomes
- The Reference Elements for the Fusion genome (Reference sequence, Genes, mRNA, CDS and Primers).
- A Read Mapping () of the RNA UMI reads against the fusion chromosomes.
The final folder, VCF Exportable Tracks, contains outputs that can be exported together as a single VCF file using the VCF exporter. This folder contains a variant track of variants passing filters, a track of fusions, and, if CNV detection has been run, a CNV target-level track.
The difference between the Unfiltered variant track in the Tracks (WT) folder and the Variants passing filters track depends on the following options available in the filtering steps:
- Filter based on quality criteria: Average Quality (quality of the sequenced bases that carry the variant), QUAL (significance of the variant), and Read Direction Test Probability (relative presence of the variant in the reads from different directions that cover the variant position).
- Remove homopolymer error type variants, i.e., errors of the indel type that occur in homopolymer regions. These regions are known to be harder to sequence than non-homopolymeric regions.
- Remove false positive based on frequency Variants with a frequency above the specified threshold will be included in the filtered variant track. Note that the unfiltered variant track is generated by the Low Frequency Variant Detection tool run with a frequency cut-off value of 0.5. This value can be considered a pre-filter, which is initially applied to each site in the alignment and determines which sites the variant caller should consider potential variant sites when it starts the error rate and site type/frequencies parameter estimation. In the case of this option, a frequency cut-off is applied on the final candidate variant set (after variants that span across multiple alignment sites have been reconstructed). It is only meaningful to apply this post-filter at a value that is at least as high as the pre-filter value, and we actually recommend using a value that is as least twice as high (1.0). This allows for some wiggle-room when going from the single-site to the multiple site variant construction, in particular to avoid that long indels are fragmented due to coverage difference throughout the considered region.
For quality control of fusion calls, see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Interpretation_fusion_results.html. We particularly recommend carrying out manual quality control checks on results that include fusions with novel exon boundaries.