UMI group sizes
The tools Calculate Unique Molecular Index Groups, Create UMI Reads from Reads and Create UMI Reads for miRNA all find UMI groups, i.e. reads originating from the same fragment.
An important set of QC metrics which is common for all three tools regards the sizes of the UMI groups.
The reports from the tools all contain a Group table with the following information:
- Output groups: The total number of UMI groups
- Singleton groups: The number of singleton UMI groups
- Average, Median and Standard deviation of reads per group
- Reads in largest group
- Reads by group size. "Group size" is the number of raw reads in a UMI group. For each read, its group size is recorded and these values are then sorted. The group sizes for a set of percentiles are reported
- Groups with sizes >=x (% of groups) (% of reads): A series of values reporting the number of groups containing at least a particular number of reads, followed by the percentage of UMI groups this represents and the percentage of all reads included in these groups
In addition, the following plots are also available:
- Reads by group size. The first plot shows the number of reads in groups by group sizes. The second plot includes only groups with fewer than 50 reads
- Group Sizes graphs. The first plot shows the sizes of all UMI groups. The second includes the sizes of only groups with fewer than 50 reads
For most applications the ideal UMI group size will be around 2-4, larger UMI groups tend to have diminishing returns for the increased sequencing budget.
Please refer to the kit handbook to see the suggested UMI group size for your application.