Taxonomic profiling of unmapped reads

For RNA samples with more than 100,000 reads and where less than 75% of the reads map, an additional taxonomic profiling analysis is performed with the aim of detecting potential contamination from bacteria and archaea.

The analysis uses the tool Taxonomic Profiling. This takes as input the reads that go unmapped in the Align and count analysis and maps these to a reference database of complete archaea and bacteria genomes. If a read is found to map to multiple genomes in the reference database, it will be assigned to the lowest common ancestor.

Parameter settings and reference database for the Taxonomic Profiling analysis are independent of sample kit and reference:

Taxonomic Profililng  
Reference index QMI-PTDB Genus (v2.0)
Filter host reads No
Auto-detect paired distances Yes
Minumum seed length 30
Adjust read count abundances Yes

The Taxonomic Profiling tool is described in more detail in the CLC Microbial Genomics Module manual: https://resources.qiagenbioinformatics.com/manuals/clcmgm/2200/index.php?manual=Taxonomic_Profiling.html.

Taxonomic profiling reference index. The reference index QMI-PTDB Genus (v2.0) database was obtained using the tool Download Curated Microbial Reference Database. In the previous version the database (QMI-PTDB - Approx. 22GB (Jan2022)) was enriched with six additional species (Mycoplasma hominis, Mycoplasma fermentans, Mycoplasma salivarium, Mycoplasma arginini, Mycoplasma orale, Escherichia coli O157:H7). These are now integrated in the database off the shelf. The index covers a total of 8626 species. See Table 1 for a taxonomic summary.


Table 1: Taxonomic summary of the QMI-PTDB Genus (v2.0) taxonomic profiling reference index.
Taxonomic level Kingdom Phylum Class Order Family Genus Species
Classifications 2 64 140 374 872 4381 7794


The tools used for downloading curated and custom databases and for creating taxonomic indexes are described in the CLC Microbial Genomics Module manual at https://resources.qiagenbioinformatics.com/manuals/clcmgm/2200/index.php?manual=Databases_Taxonomic_Analysis.html.