Calculate Unique Molecular Index Groups

The Calculate Unique Molecular Index Groups tool annotates the mapped reads with a "Unique Molecular Index group ID", that is identical for reads that are determined to belong to the same UMI.

Calculate Unique Molecular Index Groups is available under the Tools menu at:

        Tools | Biomedical Genomics Analysis (Image biomedical_folder_closed_16_n_p) | UMI Tools (Image qiaseqv3_folder_open_16_h_p) | Calculate Unique Molecular Index Groups (Image calculate_bcgroups_16_h_p)

In the first dialog (figure 4.2), select a read mapping of reads that were previously annotated with UMI annotations.

Image readmappingannotatebarcode
Figure 4.2: Select a read mapping made from reads whose UMI was removed and annotated on the sequences.

The grouping of reads into UMI groups works as follows:

  1. The tool groups reads that
    • start at the same position based on the end of the read to which the UMI is ligated. This can either be defined in the Remove and Annotate with Unique Molecular Index tool or directly in the wizard, (If the UMI was removed from the start of read 2 using the Remove and Annotate with Unique Molecular Index tool, this tool considers grouping reads where the start of read 2 map to the same position)
    • are from the same strand, and
    • have identical UMIs.
    The tool then merges smaller groups into larger groups if
  2. Their start positions are sufficiently close as defined by the Window size parameter.
  3. Their UMIs are similar enough as defined by the Fuzzy match Unique Molecular Indices parameter.

    Merging is only done if the larger group is sufficiently large compared to the smaller group as defined by the parameters described below. If a smaller group can be merged into multiple larger groups that are equally good in terms of similarity of UMI and start position as well as group size, the group will not be merged.

Duplex groups are created if input reads were defined as duplex data in the Remove and Annotate with Unique Molecular Index tool, or if the UMI location setting has been set to duplex. Duplex groups consist of two paired end UMI groups from different strands of the original fragment. Two UMI groups, A and B, are grouped to a duplex group when:

  1. Both are paired reads.
  2. One must consist of forward paired end reads (referred to as group A) and the other must consist of reverse paired end reads (referred to as group B)
  3. The genomic positions of read 1 in group A and read 2 in group B are the same, and the genomic positions of read 2 in group A and read 1 in group B are the same.
  4. The UMI is the same for read 1 of group A and read 2 of group B, and the UMI is the same for read 2 of group A and read 1 of group B.

It is possible to change the following parameters (figure 4.3):

Image fuzzyparameters
Figure 4.3: Select a read mapping made from reads whose UMI was removed and annotated on the sequences.

Click Next to choose whether to Open or Save the resulting read mapping of reads which now have a "UMI group ID" annotation.

A report can also be generated. It contains:

Note: When the group sizes (the number of reads in UMI groups) are very large (in most cases more than 10 reads in a UMI group is not desirable), this can indicate problems, such as quality issues with the sample. It can also indicate that the sequencing depth could be reduced.