Trim Primers and their Dimers from Mapping

Trim Primers and their Dimers from Mapping unaligns the primer parts of reads in read mappings, and also unaligns parts of reads identified as primer dimerization artifacts. Unaligned regions are not considered by downstream tools such as variant callers, where it would be undesirable to consider primer regions.

Trim Primers and their Dimers from Mapping was designed for use with data generated using GeneReadTM DNAseq Targeted Panels V2, where target specific primer pairs are used for multiplexed PCR-based target enrichment. We expect the tool to work with other targeted amplicon sequence data that employ target specific primer pairs, but we have not tested it for that purpose. This tool is included in the QIAGEN GeneRead Panel Analysis ready-to-use workflow, where the relevant workflow element has been named "Trim Primers and their Dimers of Mapped Reads".

Primer trimming

Target primer locations need to be imported before using this tool. Importing descriptions of primer locations from a generic text format file or from a QIAGEN gene panel primer file is described in the Import Primer Pairs section of the CLC Genomics Workbench manual: http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Import_Primer_Pairs.html.

The fraction of the primer that must overlap with a read's aligned bases in order to record a primer hit is configurable.

By default, reads are only retained in the mapping if primer sequence is detected at the start of the read, but this behavior is optional.

For paired end data, if the primer found on R2 is not a member of the same target region primer pair as the primer found on R1, both members of the pair will be removed from the mapping.

Trim Primers and their Dimers from Mapping is strict regarding primer position. Primers are expected at the 5' end or the 3' end of the read, whether the end is aligned or unaligned. If there are any additional bases at the 5' or 3' end, the region will not be identified as primer sequence.

Primer dimer trimming

Two steps are involved in trimming primer dimerization artifacts:

  1. All primers are compared against all others to look for pairs likely to dimerize. The minimum number of overlapping bases, used to identify primers that may dimerize with each other, is configurable. A list is compiled containing, for each primer, all primers with potential to dimerize with it.

  2. If a primer p has been trimmed (unaligned), and the still-mapped section of the affected read starts with the sequence of a primer identified as dimerizing with p, it is assumed that the read contains primer-dimer artifact. This predicted dimerization artifact is then unaligned. If a read only contains primer-dimer artifact, the read is removed from the mapping and discarded.

The Trim Primers and their Dimers from Mappings tool also includes an option for trimming primers of amplicon fragments. This is particularly useful for trimming reads originating from short fragments. The Trim primers of amplicon fragments option allows trimming of both forward and reverse primers of reads in both directions. In cases where primers overlap, the innermost primer is used for trimming irrespective of the read orientation and expected primer pairing. In other words, the primer ID is ignored when selecting the Trim primers of amplicon fragments option. This allows trimming of reads that end in a region with multiple overlapping primers, where it cannot be determined which primer the read originated from and consequently how much of the read end is primer sequence.

Running the tool

To launch Trim Primers and their Dimers from Mapping, go to:

        Toolbox | Resequencing Analysis (Image resequencing) | Trim Primers and their Dimers from Mapping (Image primer_dimer_trim_16_n_p)

In the first wizard step (figure 17.12), you are asked to select a read mapping. If you would like to analyze more than one read mapping, you can run the analysis in batch mode by ticking the "Batch" box in the lower left corner of the wizard. Running jobs in batch mode is described in the CLC Genomics Workbench manual: https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Standard_batch_processing.html.

Image trim_primers_and_their_dimers_of_mapped_reads_step1
Figure 17.17: Select files to import.

In the next wizard step (figure 17.13), settings for the tool are configured.

Image trim_primers_and_their_dimers_of_mapped_reads_step2
Figure 17.18: Select your primer location file and choose whether you want to keep or discard reads with no matching primers. The option Trim primers of amplicon fragments is useful when working with reads that originate from short fragments.

In the last wizard step, choose the results to save and and click on Finish.

Output of Trim Primers and their Dimers from Mapping

The default output is a read mapping with primer and primer-dimer regions of the mapped reads unaligned. The name of the read mapping generated is based on the name of the mapping used as input, with "trimmed reads" appended. Optionally, a track containing the primer-dimer regions used for trimming the reads will also be generated. This track contains information about why each primer-dimer pair was predicted and the number of times it was used to partially trim a read or to remove a read from the mapping because it consisted only of primer-dimer sequence.