Targeted DNA

QIAseq Targeted DNA Panels integrate molecular barcode technology into a highly multiplexed PCR-based target enrichment process, enabling accurate variant calling at very low frequency. The concept of molecular barcoding is that during library preparation of the samples with a QIAseq Targeted DNA Panel, a Unique Molecular Index (UMI) and a common sequence prefix are added to each read before amplification. The barcoded molecules are then amplified by PCR. Due to intrinsic noise and sequence-dependent bias, barcoded sequences may be amplified unevenly. Thus, target quantification can be better achieved by counting the number of Unique Molecular Indices (UMIs) in the reads rather than counting the number of total reads for each gene. Sequence reads having different UMIs represent different original molecules, while sequence reads having the same UMI are results of PCR duplication from one original molecule.

However, during secondary analyses of the sequenced reads, UMIs (and their attached common sequence used as identifier) will hinder the mapping of the reads to a reference sequence. The first steps in the QIAseq Targeted DNA Panel Analysis ready-to-use workflow consist in trimming remaining PCR adapters, the UMI and the common sequence while retaining the UMI barcoding information as an annotation on the read. This is followed by mapping the sequencing reads to the human reference sequence. After mapping, the Create UMI Reads tool generates a single consensus read, called a "UMI read", from reads which have the same UMI.

The workflow then removes ligation artifacts from the read mapping. Next, the Indels and Structural Variants detection step generates a guidance track used for improving the mapping with the Local Realignment tool. Variants are then detected on the UMI reads using the Low Frequency Variant Detection tool for somatic workflows, and the Fixed Ploidy Variant Detection tool for germline applications. Finally, a series of filtering steps remove variants that are either not significant enough, likely due to artifacts, homopolymer errors, or too infrequent. The final output of the workflow includes, among other items, a list of filtered variants, including some present at very low frequency in the original dataset.