Report from LightSpeed Fastq to Variants tools
The report from the LightSpeed variant calling tools provides information about each step that has been enabled in a given analysis. In the following, each section in the report is described.
The following terms are used in many sections of the report:
- Specific read pairs Read pairs where one best match for mapping was identified.
- Non-specific read pairs Reads that map equally well to more than one genomic position.
- Proper read pairs Read pairs where the distance between read 1 and read 2 are within the expected range for a pair.
- Broken read pairs Read pairs where the distance between read 1 and read 2 exceed the expected distance for a read pair and read pairs where only one of the reads were mapped.
Summary
- Input read pairs Total number of read pairs in the fastq files.
- Read pairs discarded by quality trimming Trimmed read pairs, that after trimming are shorter than specified in the option "Minimum read length after quality trim" and have been discarded.
- Read pairs trimmed by quality trimming Read pairs that have been trimmed and are longer than "Minimum read length after quality trim".
- Read pairs discarded by adapter trimming Trimmed read pairs, that after trimming are shorter than specified in the option "Minimum read length after adapter trim" and have been discarded.
- Read pairs trimmed by adapter trimming Read pairs that have been trimmed and are longer than "Minimum read length after adapter trim".
- Average read length before trimming Average length of reads in input.
- Average read length after trimming Average length of reads after quality trimming and adapter trimming.
- Read pairs remaining after trimming Read pairs remaining after quality and adapter trimmming. These are the read pairs that are mapped.
- Unmapped read pairs Read pairs that did not map to the reference.
- Mapped read pairs The total number of mapped read pairs including specific, non-specific and broken pairs.
- Proper read pairs Read pairs that are mapped as pairs. The percentage is calculated relative to "Mapped read pairs".
- Broken read pairs Mapped read pairs where the distance between the individual reads in the pair exceeded the expected distance for paired reads, or where only one of the reads in the pair was mapped. The percentage is calculated relative to "Mapped read pairs".
- Specific proper read pairs Read pairs that are mapped as pairs and are specific. The percentage is calculated relative to "Mapped read pairs".
- Non-specific proper read pairs Read pairs that are mapped as pairs, but are non-specific. The percentage is calculated relative to "Mapped read pairs".
- Average insert size Average insert size calculated from specific proper read pairs.
- Median insert size Median insert size calculated from specific proper read pairs.
- Read pairs after deduplication Read pairs after deduplication.
- UMI read pairs Read pairs after UMI grouping.
- Singleton UMI read pairs UMI read pairs generated from only one read pair. The percentage is calculated relative to "UMI read pairs".
- Simplex UMI read pairs UMI read pairs where input reads all originate from the same strand. Singleton UMI read pairs are a subset of the simplex UMI read pairs. The percentage is calculated relative to "UMI read pairs"
- Duplex UMI read pairs UMI read pairs that are based on input reads from both strands. The percentage is calculated relative to "UMI read pairs"
- Average number of reads per UMI The average number of read pairs per UMI read pair.
- Median number of reads per UMI The median number of read pairs per UMI read pair.
- Average number of reads per duplex UMI The average number of read pairs per duplex UMI read pair.
- Median number of reads per duplex UMI The median number of read pairs per duplex UMI read pair.
- Read pairs after primer trimming Read pairs remaining after primer trimming.
Input read QC
This section contains information about the input reads before quality and adapter trimming. Full descriptions of the per-sequence plots are available at http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Per_sequence_analysis.html and full descriptions of the per-sequence plots are available at http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Per_base_analysis.html.
Per-sequence analysis
- Lengths distribution Plot showing the distribution of R1 and R2 read lengths.
- GC-content Plot showing the distribution of GC-content in R1 and R2 reads.
- Ambiguous base-content Plot showing the distribution of ambiguous base-content in R1 and R2 reads.
- Quality distribution Plot showing the distribution of average quality per read for R1 and R2 reads.
- Reads passing average quality thresholds Table providing the percentage of R1 and R2 reads with average quality above 25, 30 and 35.
Per-base analysis
- Coverage Plot showing the coverage for individual base positions of R1 and R2.
- Nucleotide contributions Plots for R1 and R2 showing the nucleotide contributions per position.
- GC-content Plot showing the GC content in R1 and R2 reads per position.
- Ambiguous base-content Plot showing the ambiguous base-content in R1 and R2 reads per position.
- Quality distribution Plots for R1 and R2 individually and combined showing the quality per position.
Quality trimming
- Input read pairs Total number of read pairs in the fastq files.
- Read pairs discarded by quality trimming Trimmed read pairs, that after trimming are shorter than specified in the option "Minimum read length after quality trim" and have been discarded.
- Read pairs trimmed by quality trimming Read pairs that have been trimmed and are longer than "Minimum read length after quality trim".
- R1 reads trimmed by quality trimming Number of R1 reads trimmed by quality trimming.
- R2 reads trimmed by quality trimming Number of R2 reads trimmed by quality trimming.
- Average read length before quality trimming Average read length of the raw reads in the fastq files.
- Average read length after quality trimming Average read length after quality trimming. This read length may be longer than Average read length before quality trimming because short reads can have been removed.
The plot Read lengths of quality trimmed reads before / after trimming shows the length and number of reads that were quality trimmed before and after trimming (figure 3.19).
Figure 3.19: The number and length of quality trimmed reads before and after quality trimming.
Adapter trimming
- Input read pairs Total number of read pairs in the fastq files.
- Read pairs discarded by adapter trimming Trimmed read pairs, that after trimming are shorter than specified in the option "Minimum read length after adapter trim" and have been discarded.
- Read pairs trimmed by adapter trimming Read pairs that have been trimmed and are longer than "Minimum read length after adapter trim".
- R1 reads trimmed by adapter trimming Trimmed R1 reads that are longer than "Minimum read length after adapter trim".
- R2 reads trimmed by adapter trimming Trimmed R2 reads that are longer than "Minimum read length after adapter trim".
- Average read length before adapter trimming Average length of the reads before adapter trimming. If quality trimming was enabled, read length after quality trim is given.
- Average read length after adapter trimming Average read length after adapter trimming. This read length may be longer than Average read length before adapter trimming because short reads can have been discarded.
- Detected R1 adapter The consensus sequence of bases removed from R1 reads.
- Detected R2 adapter The consensus sequence of bases removed from R2 reads.
The plot Read lengths of adapter trimmed reads before / after trimming shows the number of reads as a funtion of read length before and after adapter trimming (figure 3.20).
Figure 3.20: The number of reads as a function of read length before and after adapter trimming.
The plot Lengths of trimmed adapters shows the number and lengths of trimmed adapter sequences (figure 3.21).
Figure 3.21: The number and length of trimmed adapter sequences.
Mapping statistics
- References The number of sequences in the reference genome.
- Input read pairs Total number of read pairs in the fastq files.
- Read pairs remaining after trimming The number of read pairs left after trimming.
- Unmapped read pairs The number of read pairs that could not be mapped to the reference.
- Mapped read pairs The number of mapped read pairs including specific, non-specific and broken read pairs.
- Mapped proper read pairs Read pairs that are mapped as pairs. The percentage is calculated relative to "Mapped read pairs".
- Mapped broken read pairs Mapped read pairs where the distance between the individual reads in the pair exceeded the expected distance for paired reads, or where only one of the reads in the pair was mapped. The percentage is calculated relative to "Mapped read pairs".
- Mapped specific proper read pairs Read pairs that are mapped as pairs and are specific. The percentage is calculated relative to "Mapped read pairs".
- Mapped non-specific proper read pairs Read pairs that are mapped as pairs, but are non-specific. The percentage is calculated relative to "Mapped read pairs".
Insert size distribution
Plot showing the distribution of insert sizes in specific proper read pairs. The insert is defined as the distance between the 5' ends of R1 and R2. If reads are quality trimmed from the 5' end or are trimmed for UMI and common sequence, the removed bases are not included when calculating the insert size.
Deduplication
- Mapped read pairs The number of mapped read pairs including specific, non-specific and broken read pairs before deduplication.
- Duplicate read pairs Read pairs considered PCR duplicates.
- Read pairs after deduplication The number of mapped read pairs including specific, non-specific and broken read pairs.
- Proper read pairs after deduplication Read pairs that are mapped as pairs. The percentage is calculated relative to "Read pairs after deduplication".
- Broken read pairs after deduplication Mapped read pairs where the distance between the individual reads in the pair exceeded the expected distance for paired reads, or where only one of the reads in the pair was mapped. The percentage is calculated relative to "Read pairs after deduplication".
- Specific proper read pairs after deduplication Read pairs that are mapped as pairs and are specific. The percentage is calculated relative to "Read pairs after deduplication".
- Non-specific proper read pairs after deduplication Read pairs that are mapped as pairs, but are non-specific. The percentage is calculated relative to "Read pairs after deduplication".
UMI
- Mapped read pairs The number of mapped read pairs including specific, non-specific and broken read pairs before UMI grouping.
- UMI read pairs Read pairs after UMI grouping.
- Singleton UMI read pairs UMI read pairs generated from only one read pair. The percentage is calculated relative to "UMI read pairs".
- Simplex UMI read pairs UMI read pairs where input reads all originate from the same strand. Singleton UMI read pairs are a subset of the simplex UMI read pairs. The percentage is calculated relative to "UMI read pairs".
- Duplex UMI read pairs UMI read pairs that are based on input reads from both strands. The percentage is calculated relative to "UMI read pairs".
- Proper UMI read pairs UMI read pairs that are mapped as pairs. The percentage is calculated relative to "UMI read pairs".
- Broken UMI read pairs UMI read pairs where the distance between the individual reads in the pair exceeded the expected distance for paired reads, or where only one of the reads in the pair was mapped. The percentage is calculated relative to "UMI read pairs".
- Specific proper UMI read pairs UMI read pairs that are mapped as pairs and are specific. The percentage is calculated relative to "UMI read pairs".
- Non-specific proper UMI read pairs UMI read pairs that are mapped as pairs, but are non-specific. The percentage is calculated relative to "UMI read pairs".
- Average number of reads per UMI The average number of read pairs per UMI read pair.
- Median number of reads per UMI The median number of read pairs per UMI read pair.
- Average number of reads per duplex UMI The average number of read pairs per duplex UMI read pair.
- Median number of reads per duplex UMI The median number of read pairs per duplex UMI read pair.
When calculating average and median number of read pairs per UMI, broken pairs are not included.
The plot Reads by group size shows the number of input reads distributed by the size of the UMI groups that they have been grouped to.
The plot Groups by group size shows the number of UMI groups distributed by the UMI group sizes.
The plot Reads by group size (duplex) shows the number of input reads distributed by the size of the duplex UMI groups that they have been grouped to. If reads are not assigned to a duplex UMI group, they are not represented in this plot.
The plot Groups by group size (duplex) shows the number of duplex UMI groups distributed by the duplex UMI group sizes.
Realignment
- Realigned regions The number of regions that were subjected to local realignment, e.g. regions with long unaligned ends.
- Combined length of realigned regions The combined length of the locally realigned regions.
- Reassembled regions The number of regions that were subjected to reassembly, e.g. regions with significant unaligned end breakpoints.
- Combined length of reassembled regions The combined length of the reassembled regions.
Primer trimming
- Mapped read pairs The number of mapped read pairs including specific, non-specific and broken read pairs before UMI grouping.
- Discard read pairs without primer An option used when trimming for primer sequence, can be Yes or No.
- Read pairs without primers Read pairs that could not be assigned to a primer. If "Discard read pairs without primer" is set to yes, these reads will be discarded.
- Primer not found Read pairs not overlapping a primer.
- Inside alignment Read pairs overlapping a primer, but the primer is inside the alignment, not at the end.
- Not enough overlap Read pairs overlapping a primer, but the overlap is less than required as defined in the wizard step "Minimum primer overlap (%)".
- Read too short Read pairs where the remaining sequence of read 1 or read 2 is shorter than the threshold defined in the wizard step "Minimum read length after primer trim".
- Too many read mismatches Read pairs with at least 2 mismatches between the overlapping parts of the read and the primer.
- No primers on chromosome Read pairs mapped to chromosomes where there are no primers.
- Read pairs after primer trimming Read pairs remaining after primer trimming.
- Proper read pairs after primer trimming Read pairs that are mapped as pairs. The percentage is calculated relative to "Read pairs after primer trimming".
- Broken read pairs after primer trimming Mapped read pairs where the distance between the individual reads in the pair exceeded the expected distance for paired reads, or where only one of the reads in the pair was mapped. The percentage is calculated relative to "Read pairs after primer trimming".
- Specific proper read pairs after primer trimming Read pairs that are mapped as pairs and are specific. The percentage is calculated relative to "Read pairs after primer trimming".
- Non-specific proper read pairs after primer trimming Read pairs that are mapped as pairs, but are non-specific. The percentage is calculated relative to "Read pairs after primer trimming".
Variant detection
- Ignored bases due to complex regions The number of bases where it was not possible to detect variants due to high complexity of the region.
- Ignored intervals due to complex regions The number of intervals where it was not possible to detect variants due to high complexity of the region.