Extract Reads Matching Primers

Contamination of next generation sequencing data is a common problem. Some of the problems when analyzing RNA sequencing data are caused by the presence of ribosomal RNA (rRNA) or genomic DNA (gDNA) in the sample. Likewise, RNA contamination can be a problem in DNA sequencing data.

Depletion of rRNA from RNA sequencing experiments is often performed using polyA enrichment for retaining RNA molecules with a polyA tail, a common feature of protein coding transcripts, and in this way eliminating rRNA sequences. Although the positive polyA selection usually is an efficient way to get rid of rRNA, polyA rich rRNA sequences do exist, and this type of rRNA may remain in sample.

A number of QIAseq RNA sequencing protocols can benefit from cleaning up the reads prior to mapping to remove contaminating rRNA reads. Using panel primer sequences as anchor, to only keep reads that match a primer sequence, can eliminate potential rRNA sequences. This is of particular importance when detecting gene fusions as polyA rich rRNA sequences can produce false positive gene fusions. Removal of these polyA rich rRNA sequences not only increases the quality of the fusion call but also decreases the run time as less fusion events are formed and analyzed.

In multi-modal applications, the presence of contaminating RNA in the DNA samples can also lead to false positive variant calls due to the different nature of the read composition where especially InDels are problematic, but also RNA editing can introduce variants that are RNA editing artifacts.

To improve the sample purity and thereby potentially decreasing the number of false positive calls it can be useful to remove reads that do not match any primers. The tool "Extract Reads Matching Primers" extracts reads that match a primer and discards reads that do not match a primer. The tool takes unmapped DNA or RNA sequencing reads as input. We recommend using the "Extract Reads Matching Primers" tool on the raw sequencing reads before analyzing the data.

To run the Extract Reads Matching Primers tool, go to:

        Toolbox | Biomedical Genomics Analysis (Image biomedical_folder_closed_16_n_p) | Biomedical Utility Tools (Image utilities_closed_16_n_p) | Extract Reads Matching Primers (Image extract_reads_matching_primers_16_n_p)

After you have specified whether you want to run the job locally or connected to a server, you are asked to select sequencing reads (figure 6.1).

Image extractreadsmatchingprimers_step1
Figure 6.1: Select unmapped sequencing reads.

In the next dialog a number of different settings can be adjusted (figure 6.2).

Image extractreadsmatchingprimers_step2
Figure 6.2: In addition to selecting reference sequence and the relevant primer track, different settings can be adjusted in this dialog.

The settings you can specify or adjust are:

The output from the "Extract Reads Matching Primers" tool is a list of sequencing reads that match a primer and that have a length that is at least the length of what was specified under "Minimum read length excluding primer" if the option "Ignore short reads" was selected.