Somatic variant detection
Based on the read mapping, somatic variants are identified at positions where the read alignment supports a significant difference to the reference genome.
This is achieved by a significance assessment relative to the global error rate which is supplemented by a significance assessment relative to the local error rate as estimated from the data in the local vicinity of the variant. Furthermore, variants are assessed for strand imbalance significance and an additional assessment of significance of variants in low complexity contexts.
In contrast to the germline variant caller (Germline variant detection), the somatic variant caller makes no assumptions about the ploidy of a sample, and thus allows for sensitive detection of variant alleles at any, and low, frequencies.
Variant types
LightSpeed Fastq to Somatic Variants reports SNPs, MNVs and InDels and replacements provided that the variants are contained within at least one paired end read.
Variant annotations
Variants identified by LightSpeed Fastq to Somatic Variants are annotated with the following basic information: Chromosome, Region, Type, Reference, Allele, Reference allele, Length, Zygosity, Count, Coverage, Frequency, Forward read count, Reverse read count, Forward read coverage, Reverse read coverage, Forward/reverse balance and Genotype.
Read more about these general variant annotations here: https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Variant_tracks.html.
In addition, the following LightSpeed specific annotations are available:
- Average quality The average base quality score of the bases supporting a variant. The average quality score is calculated by adding the Q scores of the nucleotides supporting the variant and dividing this sum by the number of nucleotides supporting the variant. For deletions, the average quality score reported is the lowest average quality of the two bases neighboring the deleted one. For insertions, the average quality is calculated for each of the inserted bases in the reads supporting the insertion, and the minimum of the average base qualities is reported. Average quality is only calculated for non-reference alleles, for reference alleles no average quality is reported.
- p-value - global error rate p-value from binomial test given count, coverage and an error rate of 0.005. Note that if UMIs are utilized, i.e., in the UMI step a UMI preset has been selected or a custom read structure with UMIs has been specified (see LightSpeed Fastq to Somatic Variants), an error rate of 0.004 is used.
- p-value - global error rate (phred scaled) Log transformed p-value - global error rate.
- p-value - local error rate The minimum p-value from two individual tests: 1. A binomial test given forward count, forward coverage and a local error rate for forward reads estimated from the data. 2. A binomial test given reverse count, reverse coverage and a local error rate for reverse reads estimated from the data.
- p-value - low complexity p-value from binomial test given count and coverage. This p-value is only calculated for variants that are located in positions where two upstream and two downstream reference symbols are identical to the variant. For sites not living up to this criteria, a p-value of 0 is reported.
- Homopolymer/STR Yes/No annotation. Yes, if the variant meets minimum repeat count, minimum repeat region length and maximum repeat element length specified in the wizard when calling variants. No, if one or more of the thresholds are not met.
- Repeat count The number of repeats excluding the variant. For example if a reference allele "AAAA" is called, and a low frequent stutter insertion allele is called "AAAAA", the repeat unit is 1 and the repeat count is 4.
- Repeat unit length The length of a repeat unit. If the repeat is a homopolymer, the unit length is 1.
- Strand balance score 1 - (p-value from binomial test given forward count, count, and forward count/coverage).
- Inferred from unaligned ends Yes/no annotation indicating if the variant is a tandem duplication inferred from unaligned ends during detection of structural variants.
- Subtype Annotation indicating that an insertion is a tandem duplication. This annotation is added to tandem duplications inferred from unaligned ends during detection of structural variants, but also to insertions called by the standard variant caller that perfectly match a tandem duplication called during structural variant detection.
- Nearby similar called variant Annotation indicating if tandem duplications inferred from unaligned ends during structural variant detection resemble, but are not identical to an insertion called by the standard somatic variant detection.
If the variants are called from UMI reads, additional UMI specific annotations will be added, see UMI grouping.