Subsections


Immune Repertoire Analysis

Using RNA sequencing data as input, the Immune Repertoire Analysis tool can be used to characterize the T cell receptor repertoire.

To run the tool, T cell receptor reference data for V- and J genes and a trim adapter list containing a list of constant (C) regions for all chain types is needed. This data is available in the Reference Data Library provided by QIAGEN.

Due to licensing considerations, the reference data required to run this tool could not be made available at the time of the Biomedical Genomics Analysis 20.1 release. To check if the reference data is available when you plan to run this workflow, please open the Reference Data Manager by clicking on the References button in the top toolbar. Click on the QIAGEN Sets tab and look for these reference sets:

  • QIAseq Immune Repertoire Analysis IMGT Reference Sequences for analysis of human data
  • QIAseq Immune Repertoire Analysis Mouse IMGT Reference Sequences for analysis of mouse data
If these are available, this tool can be run. If they are not, please register your interest in this area by emailing our Support team at [email protected].

To download the reference data, open the Reference Data Manager by clicking on the References button, located on the right hand side of the Workbench toolbar.

Click on the "QIAGEN Sets" tab in the Reference Data Manager. For analysis of human data, select the Reference Data Set called "QIAseq Immune Repertoire Analysis IMGT Reference Sequences". For analysis of mouse data, select the Reference Data Set called "QIAseq Immune Repertoire Analysis Mouse IMGT Reference Sequences".

See section http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=QIAGEN_Sets.html for further information about downloading QIAGEN reference data sets using the Reference Data Manager.

Note: Before analyzing RNA sequencing data, the constant region must be trimmed off the reads using the Trim Reads tool and the relevant trim adapter list. Two trim adapter list containing sequences for the constant regions of the T cell receptor are provided: one for human data (homo_sapiens_tcr_constant_trimming) and one for mouse data (mus_musculus_tcr_constant_trimming). Details about the Trim Reads tool can be found at http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Trim_Reads.html.

To run Immune Repertoire Analysis go to the Toolbox and select:

        Tools | QIAseq Panel Expert Tools (Image qiaseq_expert_folder_closed_16_n_p) | QIAseq Immune Repertoire Expert Tools (Image immune_rept_folderclosed_16_n_p) | Immune Repertoire Analysis (Image immune_rept_table_16_n_p)

This opens a dialog where the RNA sequencing reads to analyze can be selected. These reads must already have been trimmed to remove the constant (C) region from the reads, as described above.

When the trimmed RNA reads have been selected, click Next to get to the step shown in figure 7.2, where most of the configuration is done.

Image ImmuneRepertoireAnalysisToolWizard

Figure 7.2: This dialog allows selection of TCR reference sequences as well as adjustment of the clustering and mapping options.

The options available are:

V and J segments

Click on the button labeled Next to run the analysis with default settings. Alternatively, the following options are available for adjustments:

Clustering options

Mapping options


Output from the Immune Repertoire Analysis tool

Two different outputs are produced from the Immune Repertoire Analysis tool:

  1. TCR analysis report A report that summarizes statistics of the detected T cell repertoire

  2. Clonotypes A table presenting chain type, V- and J gene segment, CDR3 nucleotide sequence and length, CDR3 amino acid sequence, count and a column indicating whether the detected sequence is productive or not.

TCR analysis report

The TCR analysis report includes the following information:

  1. Summary A summary table showing the total number of input reads along with information about how many reads were successfully clonotyped, which chain type the reads belong to, and the number of unique clonotypes before and after merging.

  2. Diversity indices For each chain type there is a table containing diversity indices. It is likely that some rare clonotypes are missing in the sequencing data, the extrapolated diversity indices gives a projection of how many additional clonotypes there are and what the diversity would have been if the sample had been sequenced deep enough to represent all clonotypes.

    • Total number: The number of different clonotypes detected.

    • Extrapolated diversity (chaoE): The extrapolated number of detected clonotypes by the method described in [].

    • Lorenz curve at 50% of total: The percent of clonotypes that account for 50% of the total read count. Also sometimes denoted as D50.

    • Inverse Simpson's index: Let $ c_i$ denote the read count for the $ i$th clonotype and let $ n = \sum_i c_i$. Then the inverse Simpon's index is defined as:

      $\displaystyle \sum_i \frac{1}{c_i / n}
$

    • Extrapolated Inverse Simpson's index (chaoE): The extrapolated inverse Simpson's index by the method described in [].

    • Shannon-Wiener index: With $ c_i$ and $ n$ defined as above, the Shannon-Wiener index is defined as:

      $\displaystyle \sum_i \frac{c_i}{n} \log \left(\frac{c_i}{n}\right)
$

      Note that the logarithm is the natural logarithm. To convert to base 2 logarithm the index can be multiplied by $ \log_2(e) \approx 1.443$

    • Extrapolated Shannon-Wiener index (chaoE): The extrapolated Shannon-Wiener index by the method described in [].

  3. CDR3 length A plot for each chain type shows the length distribution of the CDR3 nucleotide sequences. Peaks are expected every 3 nt. due to repertoires consisting predominantly of in-frame CDR3 sequences, see figure 7.3.

    Image immune_cdr3_length
    Figure 7.3: CDR3 length distribution plot for the tex2html_wrap_inline$&alpha#alpha;$-chain with peaks every 3 nt.

  4. V and J usage Histograms for each chain and segment type showing the frequency of each of the detected V- and J segments, respectively. Double clicking the plot, opens the plot in a new window. A table view can be selected from the bottom pane, showing counts for all segments, see figure 7.4.

    Image immune_v_segment_usage_table
    Figure 7.4: A full table showing segment usage can be obtained by double clicking the segment usage plot and selecting table view.

  5. Cumulative frequencies of clonotypes A plot for each chain type shows the cumulative frequencies of the identified clonotypes ordered by descending read count. If the curve is steep in the beginning and then flattens, this indicates that a few clonotypes accounts for most the reads. If the curve on the other hand is more linear, this indicates a more even distribution of reads among the clonotypes.

  6. Productive summary The percentage distribution for each chain type of CDR3 nucleotide sequences that are productive, out-of-frame or contain a premature stop codon

Clonotypes table

The Clonotypes table includes the following columns:

Note All plots can be opened in table view by double-clicking on the plot and clicking on the table icon in the lower left corner.