The clonotype identification algorithm

The algorithm for identifying the clonotypes is composed of three sequential steps described below.

Assembly

All reads originating from the same barcode are collected and:

The assemble summary reports:

Trimming the C gene segments

The C gene segments need trimming prior to clonotype identification. The contigs are therefore trimmed with the following settings:

The trimming summary reports the average length of the contigs before and after trimming, and how many barcodes and contigs remain after trimming.

Clonotype identification

T-cell receptors come in two varieties, either $ \alpha$ + $ \beta$ or $ \delta$ + $ \gamma$ T-cell receptors, with the $ \alpha$ + $ \beta$ type being the far most abundant. Each chain is encoded by a gene that undergoes somatic recombination. In the somatic recombination process, gene segments are joined together with random nucleotides added at the junction sites. $ \alpha$ and $ \gamma$ chains are the result of V and J gene segments recombination, while $ \beta$ and $ \delta$ are the result of V, D and J gene segments recombination. During the recombination, the gene segments are joined together.

The V gene segment contains a conserved cysteine amino-acid marking the beginning of the CDR3 region and the J gene segment contains a conserved phenylalanine amino-acid marking the end of the CDR3 region. The CDR3 region is highly variable.

Clonotyping a contig consists of identifying which V and J gene segments are used and extracting the CDR3 region. The D gene segments are not identified here.

The identification of V and J gene segments is done by mapping the contigs against references containing all V or all J gene segments, provided in "Reference segments".

Depending on the length of the gene segment that is covered by the contig and the diversity of the gene segment, it might not be possible to unambiguously detect the gene segment. In this case, all possible gene segments are reported.

After the initial clonotyping of the contigs, merging of clonotypes is performed as follows: