The clonotype identification algorithm

The algorithm for identifying the clonotypes is composed of three sequential steps described below.

Assembly

All reads originating from the same barcode are collected and:

The assemble summary reports:

Trimming

Prior to clonotype identification, the contigs are trimmed with the following settings:

The trimming summary reports the average length of the contigs before and after trimming, and how many barcodes and contigs remain after trimming.

Clonotype identification

Clonotyping a contig consists of identifying which V, D, J and C segments from the reference data are used, and extracting the CDR3 region found between the conserved amino acids.

The identification of the segments is done by mapping the contigs against the references provided in "Reference segments".

Depending on the length and diversity of the segment that is covered by the contig, it might not be possible to unambiguously detect the segment. In this case, all possible segments are reported.

The V and J segments are required for successfully clonotyping a read, because otherwise the CDR3 cannot be determined.

The D and C segments are optional. Note that the (lack of) identification of these two segment types can lead to the tool reporting clonotypes as the same or different clonotypes:

  • If two cells have the same assigned V and J segments and share the CDR3 sequence, they would typically be considered to have the same clonotype. However, if for one cell the C segment is successfully identified, but the contigs for the other cell did not cover the C segment, their two clonotypes will be reported separately.
  • If two contigs for the same cell have the same assigned V and J segments and a CDR3 sequence that is almost the same, they would typically be merged and be considered to have the same clonotype (see below). However, due to the non-identical CDR3 sequence, one contig might have a D segment assigned, while the other might not, hence the two clonotypes will be considered to be distinct.

After the initial clonotyping of the contigs, merging of clonotypes identified for the same barcode is performed as follows: