Create SNP Tree

The Create SNP Tree tool is inspired by [Kaas et al., 2014]. There are two ways to initiate creation of a SNP tree: from the Result Metadata Table (see subsection 9.4) or by running the tool from the Toolbox. Note that you can only create a SNP tree if you have identified a common reference for the different stains you are trying to type, and used it for read mapping and variant calling for each of these samples.

To create a SNP tree from the Toolbox:

Toolbox | Microbial Genomics Module () | Typing and Epidemiology (beta) () | Trees () | Create SNP Tree ()

Select the relevant read mappings as shown in figure 14.1

Figure 14.1: Select read mappings to be included in the SNP tree analysis.
Alternatively, select data recursively by right-clicking on the folder name and selecting Add folder contents (recursively) (figure 14.2), but remember to double check that files relevant for the downstream analysis are selected. An efficient alternative to these methods is to use the Quick filtering functionality from the Metadata Result Table to filter easily the data and initiate the SNP tree creation.

Figure 14.3: For selection of all sequence files in a folder, right click and select Add folder contents (recursively).
Select the variant tracks you want to use (figure 14.3). The variant tracks determines which positions to include in the SNP tree. The variant tracks needs to have the same reference as the previously selected read mappings. Under normal circumstances you would select one variant track for each read mapping given in the input step, but that is not a requirement.

Figure 14.2: Select variant tracks and specify relevant parameters before generation of a SNP tree.
The following Parameters may be specified before the generation of the SNP tree (see figure 14.3):
- SNV parameters
  - Variant tracks. Select the variant tracks you want to use. The variant tracks determines which positions to include in the SNP tree.
  - Include MNVs or not, along with SNVs when building the SNP tree.
  - Minimum coverage required in each sample on a given position. The position is skipped if at least one sample has coverage below the specified threshold.
  - Minimum coverage percentage of average required on a given position. The position is skipped if at least one sample has coverage below this percentage of its own average coverage.
  - Prune distance specifies the minimum number of nucleotides between unfiltered positions. If a position is within this distance of a previously used position it will be filtered.
  - Minimum z-score required. Defining as the number of most prevalent nucleotide at a position and as the coverage subtracting , the z-score is calculated as $z = \frac{x-y}{\sqrt{x+y}}$ . If the calculated z-score for a given position is less than the specified minimum value the position is filtered.
- Result metadata
  - Result metadata Table. Specify location of the Result metadata table file.
- Tree view
  - Tree view settings. Select a standard tree setting (i.e., None, K-mer Tree Default or SNP Tree Default) or your own custom tree setting. Read more on Tree Settings in general: http://www.clcsupport.com/clcgenomicsworkbench/current/index.php?manual=Tree_Settings.html.

Subsections

Browse the manual

Create SNP Tree