LightSpeed Fastq to Germline Variants
The LightSpeed Fastq to Germline Variants tool is designed to provide variant calls from raw sequencing data within a very short timeframe.
The tool can perform read trimming, mapping, deduplication, local realignment and germline variant calling. For a description of each step, see LightSpeed Methods.
LightSpeed Fastq to Germline Variants can only analyze one sample per analysis start. To analyze samples in batch, LightSpeed Fastq to Germline Variants must be included in a workflow. Template workflows for LigthSpeed analysis are available (see Template Workflows), but it is also possible to create custom workflows. Read about workflows here http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Workflows.html.
To run the LightSpeed tool go to:
Tools | LightSpeed () | LightSpeed Fastq to Germline Variants (
)
If you are connected to a CLC Server via your Workbench, you will be asked where you would like to run the analysis. We recommend that you run the analysis on a CLC Server when possible.
In the first wizard step, specify fastq files and a reference sequence (figure 3.1):
- Input data
- Reads (fastq) Fastq files for analysis. At least two fastq files representing R1 and R2 reads must be provided.
- References
- References The reference sequence that reads will be mapped to.
- Reference masking
- No masking Reads are mapped to the full reference sequence.
- Exclude annotated Reads are mapped to the full reference sequence except regions specified in the masking track.
- Include annotated only Reads are only mapped to the regions specified in the masking track.
- Masking track The track specifying the masking regions.
Figure 3.1: Input fastq files and references, and, optionally, a track for reference masking.
Next, options are available for trimming (figure 3.2):
- Trimming
- Quality trim Reads are trimmed for low quality nucleotides.
- Minimum read length after quality trim Trimmed reads shorter than this length are removed.
- Adapter trim Reads are trimmed for read-through adapter sequence.
- Minimum read length after adapter trim Trimmed reads shorter than this length are removed.
Figure 3.2: Options for trimming.
Next, options are available for UMI and duplicate reads (figure 3.3):
- UMI
- UMI preset Set UMI status of reads. Use the custom setting to modify all UMI settings.
- UMI length (Read 1) The number of nucleotides at the start of read 1 that are part of the UMI.
- Common sequence length (Read 1) The number of nucleotides between the UMI barcode and the fragment of the first reads from paired-end reads. These nucleotides are discarded.
- UMI length (Read 2) The number of nucleotides at the start of read 2 that are part of the UMI.
- Common sequence length (Read 2) The number of nucleotides between the UMI barcode and the fragment of the first reads from paired-end reads. These nucleotides are discarded.
- Minimum UMI group size Discard UMI reads created from fewer than this number of input read pairs. A UMI group is a merged read from multiple input read pairs with the same UMI barcode and mapping to the same genomic position.
- Maximum UMI differences Add input read pairs to the same UMI group when their UMI barcodes nucleotides have at most this number of differences. One difference is a single nucleotide mismatch, insertion, or deletion.
- UMI window size Add input read pairs to the same UMI group when they map to genomic positions at most this number of bases apart.
- Mapped read handling
- Discard duplicate mapped reads Reads likely representing PCR duplicates are collapsed. This option is disabled when UMIs are used to group reads.
Figure 3.3: Options for UMI and duplicate reads.
Next, options are available for primer trimming (figure 3.4):
- Trim primers
- No primer trim Disable the trim primers step.
- Start of read 1 Primers are at the start of read 1.
- Start of read 2 Primers are at the start of read 2.
- Primers track Annotation track with location and strand of primers. Unalign parts of mapped reads that overlap a primer.
- Discard reads without primer Discard reads that do not overlap with a primer in the primers track.
- Additional bases to trim Unalign this number of additional mapped bases in reads matching a primer. Bases are unaligned at the beginning of the mapped read downstream from the primer.
- Minimum primer overlap (%) Reads overlap a primer when the expected part of the read (start of read 1 or read 2) maps to a genomic location that overlaps at least this percentage of a primer.
Figure 3.4: Options for primer trimming.
Next, options are available for variant detection (figure 3.5):
- Variant detection
- Restrict calling to target regions Optional. A track defining where variants are called.
- Ignore non-specific matches Reads that map equally well to more than one genomic position, are not used for variant calling.
- Structural variant detection
- Lenient inversion detection Enable lenient inversion detection to allow detection of inversions which only has read support in one direction on each of the breakpoints. This is recommended for targeted data. Enabling this option can increase processing time and can result in detection of more false positive inversions.
Figure 3.5: Options for variant detection.
In the final wizard step, choose which outputs should be generated and whether results should be saved or opened. If a reads track is selected as output, runtime will increas.
Subsections