QIAseq UPX 3' Transcriptome Kits
Quality control summaryThis first summary table is a combination of the most important data points from the Quality control report. All the data can be seen in the context of related QC data below.
- Sample name: In all tables the fist column is the sample name as the data relates to each sample per row.
- Reads: The number of input reads in the sample data.
- Low numbers in any samples indicate failed library preps.
- Spillover into unused wells causes those wells to return small numbers of reads.
- Trimmed reads: Quality trimming is performed on the input reads.
- UMI Reads: Unique Molecular Indexes join similar reads into UMI reads. This allows for better quantification of the RNAs by eliminating any library amplification and sequencing bias.
- Avg Q score, UMI reads: the average quality score of the UMI reads.
- Numbers less than 30 indicate poor-quality library prep or instrument runs.
- Trimmed UMI reads: A second quality trimming is performed on the resulting UMI reads before mapping.
- Mapped: Percentage of UMI reads that mapped to the selected reference.
- If these percentages are low, make sure you selected the correct reference species. The selection of reference species is part of the Align and count process (step 2).
- Mapped to total rRNA: Percentage of reads mapped to ribosomal RNA.
- If rRNA was not depleted, percentages can vary widely from 10%-30%.
- If rRNA was depleted, percentages should be low, about or less than 1%.
- Samples not depleted of rRNA or samples with higher percentages can still be used for differential expression, but expression values such as TPM and RPKM may not be comparable to those of other samples. To troubleshoot the issues in future experiments, check for rRNA depletion prior to library preparation. Also, if an rRNA depletion kit was used, check that the kit matches the species being studied.
Trimming, raw Reads
These numbers highlight the process of quality trimming showing how many reads were used as input, how many reads remained after trimming and the average read length before and after trimming. A large drop in number of reads and average read length is an indication of poor quality reads.
- Reads before trim: This is also shown in the 'Reads' column in the summary table.
- Avg length before trim
- Reads after trim: This is also shown in the 'Trimmed reads' column in the summary table.
- Avg length after trim
Creation of UMI reads
Detailed QC from the process of creating the UMI reads as single consensus reads, from reads that have the same Unique Molecular Index.
- Read pairs and single reads annotated with UMIs: the percentage of reads that were annotated with UMIs.
- Input Reads: The number of reads in the sample - after trimming.
- Avg Q score, input reads: The quality score of the input data.
- Numbers less than 30 indicate poor-quality library preps or instrument runs.
- Detected barcode length: The sequence length of the UMI barcode is defined in the UMI protocol. Use this number as a check to see if the analysis detected the expected length.
- UMIs: The number of UMI reads in the sample after input reads are merged into UMI reads.
- Avg reads per UMI: How many inputs reads have contributed to a UMI read on average.
- Should be greater than one. For most applications, the ideal UMI group size will be around 2-4.
- UMIs with more than 10 reads (Pct of UMIs) (Pct of reads): This and the next column highlights the extreme end of the UMI grouping distribution and should be seen as indications of potential problems in sequencing or library prep. For most applications, the ideal merged UMI group size will be around 2-4 reads. Larger UMI groups tend to have diminishing returns for the increased sequencing budget.
- A very low percentage is preferable. If the percentage is higher than 5, or there are large varioationbetween samples, a reevaluation of the sample prep may be needed.
- Max reads per UMI: This indicates the extreme end of the UMI grouping, again, in order to highlight distribution and potential problems in sequencing.
- Avg Q Score: UMI reads: The quality score of the resulting UMI reads. This average Q score should be higher than the average Q score for the input reads.
- Numbers less than 30 indicate poor-quality library preps or instrument runs.
Trimming, UMI reads
These numbers highlight the process of quality trimming of the merged UMI reads showing how many reads were used as input, how many reads remained after trimming and the average read length before and after trimming. A large drop in number of reads and average read length is an indication of poor quality UMI reads. Trimming is performed on the UMI reads even though they are merged from reads that have already been quality trimmed. Very few reads should be trimmed at this step, and if a considerable amount are trimmed it could indicate of a problem in the UMI process or further upstream in the pipeline.
- UMI reads before trim: The number of UMI reads output by the Creation of UMI reads process mentioned above.
- Avg length before trim
- UMI reads after trim
- Avg length after trim
Spike-ins quality control
This section appears when the sample analysis started in the Align and Count dialog has checked the Spike-ins option.
- Number of spike-ins detected: The number of spike-ins detected relative to the spike-ins used.
- R2: Correlation of expected and sequenced spike-ins using the Pearson Correlation coefficient
- When samples have a poor correlation (R2 < 0.8) between known and measured spike-in concentrations, it indicates problems with the spike-in protocol or a more serious problem with the sample.
- Reads mapped to spike-ins: The number of reads that mapped to the detected spike-ins.
- If fewer than 10,000 reads mapped to spike-ins, consider using more spike-in mix in future experiments.
- Lower limit of detection (attomoles/ul): Spike-ins concentration measurement. The lower limit of detection is the lowest concentration spike-in to which at least 3 reads map. This provides a rough estimate of the minimal concentration of mRNA that can be detected in this sample.
Mapping statistics
This describes how the UMI reads were used in the mapping step.
- UMI reads: The number of UMI reads in each sample, after the trimming step mentioned above.
- Paired (yes/no): Indicating whether or not the input reads are paired reads or not. This should fit with the applied protocol.
- Reads mapped: The percentage of the UMI reads that were mapped to the selected reference. Excludes both reads that were ignored due to wrong strand and reads that could not be mapped to the reference.
- Strand-specific setting: Read direction of UMI reads. This should fit the applied protocol.
- Forward % of reads mapped: Percent of UMI reads mapped in the forward direction.
- Reverse % of reads mapped: Percent of UMI reads mapped in the reverse direction.
- Ignored reads % (wrong strand): The percentage of UMI reads that were ignored due to not meeting the defined protocol specified strand distribution.
- If percentages are greater than 20-25%, then the wrong strand protocol may have been used in library prep.
Mapped by Type
This section describes the relative mapping of the UMI reads in terms of the type of target.
- Mapped to gene: Percentage of UMI reads that map to genes.
- Mapped to gene, intron: Percentage of UMI reads that mapped partly or entirely within an intron.
- Mapped to gene, exon: Percentage of UMI reads that mapped entirely within an exon or in an exon-exon junction.
- Mapped to intergenic region: Percentage of UMI reads that mapped partly or entirely between genes.
Biotype Distribution
Details of the various biotype detection levels in each sample. The content of the plot and table depends on the result of the analysis and may vary between pipelines and sample batches. The point of both the plot and the table is to show which biotypes are found in the samples and at which relative abundance in each sample. The names or clasification of the biotypes is based on the Ensembl definitions found here: http://www.ensembl.org/info/genome/genebuild/biotypes.html
Taxonomic profile of unmapped reads
Taxonomic profiling is performed for samples with a high level of unmapped reads as this can indicate sample contamination. If all samples have low levels of unmapped reads, this section will be empty.
Plot and table show the relative abundance at phylum level. Reads that map equally well to two or more phyla are assigned to the common ancestor (kingdom level).
Taxonomic profiling summary
Information about which taxonomic levels were found in the data sample and how many different taxa were found on each level.
- Kingdom
- Phylum
- Total reads: The number of reads that were not mapped to the reference
- Classified reads: The number of reads that were able to map to the taxonomic profiling database
- Unclassified reads: The number of reads that were unable to map to either the reference or the taxonomic profiling database. These are reads of unknown origin. If this number constitutes a significant portion of the input reads, it is likely due to the selection of the wrong reference in the Align and count setup for the sample creation and analysis. A new Align and count analysis will have to be be initiated.