To run the Compare Variants Across Samples workflow, go to
Toolbox | Template Workflows () | Microbial Workflows () | Typing and Epidemiology () | Compare Variants Across Samples ()
An overview of the workflow can be seen in figure 13.28.
- Select two or more read sets as input (figure 13.29). The workflow uses internal batching and creates an analysis for each sample as well as a combined variant track and SNP tree.
- Select the reference to use (figure 13.30). The reference should match all the samples selected.
- Select a CDS track associated with the reference (figure 13.31).
- In the Result handling window, pressing the button Preview All Parameters allows you to preview - but not change - all parameters.
Saving the workflow output will generate the files shown in (figure 13.32) and optionally, a workflow result metadata table.
The workflow generates outputs for each batch analysis run as well as a folder for each sample. For each sample, the following is output:
- Annotated variant track: output from the Low Frequency Variant Detection tool after coverage and quality filtering. Note that it is possible to export multiple variant track files from monoploid data into a single VCF file with the Multi-VCF exporter. This exporter becomes available when installing the CLC Microbial Genomics Module. All variant track files must have the same reference genome for the Multi-VCF export to work.
- Amino acid track: amino acid track including amino acid changes resulting from the called variants.
- Read mapping: output from the Local Realignment tool, mapping of the reads to the specified reference. For increased sensitivity, duplicate mapped reads are removed before local realignment.
- Track list: output from the Create Track List tool. The track list combines the read mapping, variant, amino acid and CDS tracks. An example can be seen in figure 13.33.
For each batch analysis run, the following outputs are generated:
- Variant track list for all samples: output from the Create Track List tool. The track combines the variant tracks for all analyzed samples.
- Combined QC report: a combined report built from QC for sequencing reads, Read mapping summary and QC for read mapping. This report contains a summary of all analyzed samples.
- SNP tree report: summarizes the consequence of the applied filtering settings in the Create SNP tree tool, as well as a summary of ignored positions attributed to the different read mappings.
- SNP matrix: a matrix containing the pairwise number of SNP differences between all pairs of samples included in the analysis (see figure 13.34).
- SNP tree: the output tree built from the SNPs called in all samples (see figure 13.35). A number of different visualizations are available, see Visualization of SNP Tree including metadata and analysis result metadata.
Here, the leaf nodes have been colored according to geographic location of the collected samples.
For more information on the tree tools, see Create SNP Tree.