Limitations
Data
LightSpeed is developed for and has been optimized on Illumina paired-end short read sequencing data. Paired-end sequencing data from other platforms utilizing the same data structure and similar read lengths can be expected to perform equally well with LightSpeed unless the background error-rate is markedly different. Analysis of other types of sequencing reads may not result in similar processing times or variant calls of an equivalent quality. Reads that are longer than 800 base pairs cannot be processed.
Variant detection
The germline variant detection algorithm in LightSpeed is based on a model expecting diploid genomes. Therefore, LightSpeed cannot be expected to accurately detect germline variants in genomes with other ploidies. In addition, alternate ploidies of sex chromosomes are not considered in the variant detection algorithm.Somatic variant detection with LightSpeed is possible for variants down to a variant allele frequency of 0.1%. Variants below this frequency will not be considered. However, in order to ensure high accuracy in variant calling, we recommend only calling variants down to a variant allele frequency of approx. 1%.
Reference sequence
LightSpeed considers all chromosomes to be linear. Hence, for read mapping, circular chromosomes are linearized with position 1 starting at the junction of the chromosome. No reads will be mapped accross the junction of circular chromosomes.
UMI grouping
- The maximum number of reads used for creating a UMI consensus read is 100,000. Therefore, UMI groups with more than 100,000 reads will be merged into more than one consensus UMI read.
- LightSpeed UMI grouping requires that reads have similar mapping positions. In data from single primer extension protocols, such as many primer based QIAseq protocols, read pairs representing the same DNA fragment with the same UMI sequence can originate from different primers. This can happen if primers in the same direction are located near each other, making it possible for a downstream primer to amplify a PCR product generated from an upstream primer. LightSpeed will not group reads originating from different primers.
- When UMIs are used to group reads, the sequence is compared base by base. If an insertion or deletion is present in the beginning of a UMI sequence, this will likely prevent the reads from being grouped because all bases after the variant will be mismatches.
Output naming support
The LightSpeed tools support custom names for workflow results, however, not all CLC Genomics Workbench placeholders for workflow output elements are supported. Specifically, the following are supported:- {input:1} or {2:1} The name of the first input to the workflow. This is the recommended output naming.
- {name} or {1} The default name for that output from that tool, i.e. the name that would be used if the tool was run outside a workflow context.
- {metadata} or {3} The batch unit identifier for workflows executed in batch mode. Depending on how the workflow was configured at launch, this value may be obtained from metadata. Workflows not executed in batch mode or without Iterate elements are not supported with this placeholder as the value will be identical to that substituted using {input} or {2}.
- {user} The username of the person who launched the job.
- {host} The name of the machine the job is run on.
- {year}, {month}, {day}, {hour}, {minute}, and {second} Timestamp information based on the time an output is created. Using these placeholders, items generated by a workflow at different times can have different file names.