Identify Mispriming Events

Primers with high similarity to multiple genomic regions have the potential to be involved in mispriming, where reads are amplified from a region of the genome other than the intended target region. Reads resulting from such mispriming events are fused constructs: the primer part and the read part represent different regions of the genome.

A fraction of the population of a given primer may be involved in mispriming events, resulting in low frequency variants. How large this fraction is depends on the binding affinity and specificity of the primer, and the conditions that the lab work was performed under. Reads originating from mispriming events should be identified and removed from mappings to avoid calling false positive variants. If reads from mispriming events map to the region they originate from, and non-target regions of interest are known, then the primer can be unaligned, rather than removing the whole read.

Identify Mispriming Events generates a list of potential mispriming events for a set of panel primers and a specified reference genome. This list of mispriming events can then be supplied to Trim Primers of Mapped Reads to remove reads likely to represent a mispriming event, or to unalign primer parts of such reads, as relevant. This should precede variant detection, so as to minimize false positive variant calls due to artifacts generated from mispriming events.

Remove reads or unalign primer regions?

Trim Primers of Mapped Reads can handle misprimed reads when provided with a track of predicted mispriming events. Reads are either removed completely from the read mapping or having their primer region unaligned. This is done automatically during primer trimming to avoid calling false positiv variants, and the action needed depends on where reads resulting from a mispriming event are mapped:

A given primer can be involved in mispriming events leading to the amplification of reads that map to the original target region and to the region they were amplified from.

Image misprimingevent1
Figure 6.3: An example of mispriming, where the reads map to the original intended target region. The two A variants and the single T variant, occuring in a non-primer part of the mapped reads, are consequences of mispriming. The reads supporting these variants should be removed from the mapping before variant detection is carried out. The reverse paired end reads (light blue) shown in the "Mapped reads" track were amplified from a mispriming binding site at chromosome 9 (not shown). While the primer had only 62% similarity with that site, the 3' primer end aligned perfectly, allowing it to anneal and for reads to be generated. Most of these reads mapped to the original target region, shown here, due to the low similarity of the primer region and high overall similarity with the intended target region.

Image misprimingevent2
Figure 6.4: An example of mispriming, where the reads map to the non-target region it represents. The A to G variant, found in the primer part of these forward, paired end reads (dark blue), is a consequence of mispriming. This primer was designed for a different region, but had 95.24% similarity with the region shown. Thus some copies of the primer annealed to this region and generate reads with a single mismatch, as shown in the "Mapped reads" track. The primer part of these reads should be unaligned, but the remaining part of the read can still be used for variant calling since it reflects the DNA fragments of the same genomic region.

How the Identify Mispriming Events tool works

Identify Mispriming Events takes this approach:

  1. A BLAST search is run using the primers as query sequences to search against a BLAST database of the relevant reference genome.
  2. The BLAST hits returned are filtered. For each primer, hits are kept if that sequence has a high enough similarity to the intended target region and few mismatches at the 3' end.
  3. The remaining BLAST hits for each primer are checked for their potential to cause mispriming artifacts of the two types mentioned above.
    • The sequence downstream of the intended target binding site is aligned to the sequence downstream of the mispriming site. If this pairwise alignment has a similarity fraction of at least 0.8, the BLAST hit is considered to be a mispriming event. The length of the sequence used of alignment can be changed using the parameter Amplicon length (bp).
    • If a target region track is provided as input, and the mispriming region overlap a target region, the BLAST hit is considered a mispriming event.
    BLAST hits that are unlikely to represent sources of false positive variant calls are discarded.

Running the Identify Mispriming Events tool

To launch Identify Mispriming Events, go to:

        Toolbox | Biomedical Genomics Analysis (Image biomedical_folder_closed_16_n_p) | Biomedical Utility Tools (Image utilities_closed_16_n_p) | Identify Mispriming Events (Image mispriming_16_h_p)

In the first dialog, select a primer track (Image annotation_track_16_n_p) as input.

Image misprimingstep1
Figure 6.5: Select a primer track.

Settings related to the reference data are configured in the next wizard step (figure 6.6).

Image misprimingstep2
Figure 6.6: Reference data settings for the Identify Mispriming Events tool.

Tip: Use Create BLAST Database if you do not already have a BLAST database of your reference genome (see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Create_local_BLAST_databases.html ). This tool takes a sequence list (Image sequence_dna) as input. If your reference genome is in a sequence track (Image sequence_track_16_n_p), use Convert from Tracks to convert it to a sequence list (Image sequence_dna) before running Create BLAST Database.

Specificity settings are configured in the next wizard step (figure 6.7).

Image misprimingstep3
Figure 6.7: The specificity settings of the Identify Mispriming Events tool. The default settings are a good starting point, but BLAST settings and/or Mispriming events filters can be adjusted to make the settings more relaxed or stringent.



Subsections