QIAGEN Bioinformatics Manuals

Compare Variants Across Samples

Compare Variants Across Samples can be used to compare samples originating from strains or species sharing a common reference. Input should be sequence lists of trimmed reads for which host reads have been removed, e.g. using Taxonomic Profiling, see Taxonomic Profiling.

As the workflow removes duplicate mapped reads, amplicon data is not recommended as input. However, the workflow can be modified to work on amplicon data by opening a copy of the workflow, removing the Remove Duplicate Mapped Reads tool and saving the modified workflow.

To run the Compare Variants Across Samples workflow, go to

Workflows | Template Workflows () | Microbial Workflows () | Typing and Epidemiology () | Compare Variants Across Samples ()

Select two or more samples as input.
Select the reference track to use. The reference should match all the samples selected.
Select a CDS track associated with the reference.
Define batch units. For details, see Running part of a workflow multiple times.
Check that batching is as intended.
In the "Create Sample Report" step various summary items have been set. These are guidelines to help evaluate the quality of the results (see Create Sample Report).
In the Result handling window, pressing the button Preview All Parameters allows you to preview - but not change - all parameters. Choose to save the results (we recommend to create a new folder for it) and click Finish.

The output will be saved in the location you chose.

The batch-specific outputs provided by this workflow are:

Sample report. The sample report is curated to contain the most important information for analysis interpretation. All full reports are linked throughout the Sample report or can be found in the QC & Reports folder. The Sample report icon will be colored based on whether Summary item thresholds were met. See the "Quality control" section in the sample report for specifics.
Track list. Collection of all the tracks in the "Tracks" folder, and the input Reference and CDS tracks.
Tracks. Folder containing all tracks output during analysis.
- Annotated variant track. Output from the Low Frequency Variant Detection tool after coverage and quality filtering. Note: Multiple variant track files from monoploid data that are based on the same reference genome can be exported to a single VCF file using the Multi-VCF exporter.
- Amino acid track. Amino acid track including amino acid changes resulting from the called variants.
- Read mapping. Mapping of the reads to the specified reference. For increased sensitivity, duplicate mapped reads are removed before local realignment.
QC & Reports. Folder containing the individual reports generated during the analysis.
- All reports from the sample report are found here in their full length.

The combined outputs provided by this workflow are:

Combined report. Combined report of all sample reports. The combined report contains all quality control information and analysis results. The combined report icon will be colored based on whether Summary item thresholds were met in each sample. See the "Quality control" section in the combined report for specifics.
Variant track list for all samples. The track combines the variant tracks for all analyzed samples.
SNP tree report. Summarizes the applied filtering settings in the Create SNP tree tool, as well as a summary of ignored positions attributed to the different read mappings.
SNP Matrix. A matrix containing the pairwise number of SNP differences between all pairs of samples included in the analysis.
SNP tree. The output tree built from the SNPs called in all samples. A number of different visualizations are available, see SNP Tree.

The Combined report should be inspected in order to determine whether the quality of the sequencing reads and the analysis results are acceptable.

For more information on the Create SNP Tree tool, see Create SNP Tree.

Browse the manual

Compare Variants Across Samples