Trim Primers of Mapped Reads
The tool Trim Primers of Mapped Reads removes the primer parts of mapped reads (also from RNA-seq mapped reads, except for primers that span intron boundary), as they reflect the primer that was added and not the actual sample. Note that the tool will also remove any insertion located right after the primer part. The tool also removes artifacts due to mispriming events, happening when a primer binds an off-target location and thus amplifies an off-target sequence.
The tool can be found in the Toolbox here:
Tools | QIAseq Panel Expert Tools | QIAseq DNA Panel Expert Tools () | Trim Primers of Mapped Reads ()
In the first dialog (figure 3.40), select a read mapping.
Figure 3.40: Select a read mapping.
In the second dialog (figure 3.41), select the primer annotation track that was provided with the QIAseq DNA Panel. This track contains the original primers and their intended primer locations.
Figure 3.41: Select the primer annotation track specific to the panel, and add the parameters needed to deal with the type of reads you are working with.
This tool works on reads that potentially ends in a primer (rather than starts in a primers). The tool aims to unalign the primer parts of reads that came from that primer. It approximates this by only unaligning reads that end inside of a primer. In addition, set the following parameters:
- Decide on how many Additional bases to unalign right after the primer. This trimming is not done on reads for which dimer artifacts were identified.
- Single reads. Unalign reads that end up to 3 extra bases after the primer, as set by the parameter Maximal additional nucleotides.
- Paired reads only. If an aligned read starts within the span of a primer, and if it overlaps the primer with at least 70% (set by default for the Minimal primer overlap fraction option), then it is said to "hit" the primer. For reads "hitting" a primer, the part of the read that overlaps the primer will be unaligned. For reads not "hitting" a primer, the read will either be removed from or retained in the read mapping, depending on the option Remove reads without primer. If the option is checked, the tool will remove reads that do not "hit" a primer.
If one read in a UMI group runs past the primer it overlaps, it means that all reads in that group were not created from that primer. If this happens, then the tool will not unalign any reads in this UMI group.
In the Remove mispriming artifacts dialog (figure 3.42), you can specify a Primer mispriming events track containing the predicted off-target priming locations of the original primers. Mispriming tracks are available for each TMB QIAseq panels in the Reference manager.
Figure 3.42: Select a mispriming track specific to the panel, and configure the associated parameters if needed.
In addition, set the following parameters:
- Mispriming artifacts removal. The tool will unalign primer part of misprimed reads only if the misprimer overlapping part of the read has at least the Minimal primer sequence similarity fraction to the original primer sequence. For paired end reads, the tool will unalign primer of misprimed paired end reads only if the primer part has at least the Minimal primer overlap fraction with the off-target primer.
- Pseudogene and gene family interference. It is possible to specify a gene-pseudogene track that contains gene and pseudogene links information for removing reads that map well to pseudogene locations.
Finally, in the Post-filtering dialog (figure 3.43), you can choose to remove short reads with mismatches, and as well as all reads reads smaller than a certain length.
Figure 3.43: Post-filtering parameters.
The tool will output a trimmed read mapping. In addition, as seen in figure 3.44, you can choose to output off-target primers and/or regions tracks with counts.
If a mispriming track was specified in the earlier step, it is possible to output two additional tracks containing respectively region and primer mispriming statistics:
- The primer mispriming statistics track (figure 3.45) includes mispriming events and the counts of read unalignment those events caused in a process of mispriming artifact removal.
Figure 3.45: A primer mispriming track.- Chromosome, Region and Name referring to the identified mispriming locations.
- Intended target chromosome and
- Intended region referring to the true location of the primer.
- Unaligned primers Total number of reads with primer sequence unaligned after matching better to a mispriming event location than the original mapping location.
- The regions mispriming statistics track (figure 3.46) includes regions that are considered as alternative mapping locations when attempting to remove mismapped reads. Primary target locations in this track are annotated with:
- Name of the primer
- Primary target True or false depending if the region describes the primer or a possible mispriming event.
- Reads checked Total number of reads checked for better matches to a mispriming event region or to a position in the vicinity of the original mapping location.
- Reads matching mispriming location Number of reads mapping with a better score to a mispriming event region than to the original location after the primer sequence had been unaligned.
- Reads matching target vicinity Number of reads with a better match to a position in the the vicinity of the original mapping location after the primer sequence had been unaligned.
Figure 3.46: A region mispriming track.