Single Cell RNA-Seq Analysis

Single Cell RNA-Seq Analysis can be found in the Toolbox here:

        Gene Expression (Image sc_gene_expression_folder_open_16_n_p) | Cell Preparation (Image sc_cellprep_folder_open_16_n_p) | Single Cell RNA-Seq Analysis (Image sc_rna_seq_16_n_p)

The tool takes as input one or more sequence lists (Image seq_list_nucleotide) of reads that have been annotated using Annotate Reads with Cell and UMI. It outputs an Expression Matrix with spliced and unspliced counts (Image expr_matrix_spliced_unspliced_16_n_p) for gene expressions, and optionally an Expression Matrix (Image expression_matrix_track_16_n_p) for transcript expressions, a report, and unmapped reads.

Sample: All input sequence lists must originate from the same sample, which is set when executing the Annotate Reads with Cell and UMI tool (see Annotate Reads with Cell and UMI). This is because Single Cell RNA-Seq Analysis assumes that reads with the same cell barcode that are present in different inputs represent the same cell. The wizard does not allow executing the tool with inputs that are annotated with different samples.

It is important to provide all the data for a sample to Single Cell RNA-Seq Analysis at the same time. For example, if one sample was sequenced on 4 lanes of an Illumina sequencer, then all 4 lanes should be supplied together. This allows reads originating from the same cell, but coming from different lanes, to be analyzed jointly such that amplification duplicates are detected using UMIs and only give one count in the output Expression Matrix.

Matrix with spliced and unspliced counts: The Expression Matrix with spliced and unspliced counts (Image expr_matrix_spliced_unspliced_16_n_p) is an extension of the Expression Matrix (Image expression_matrix_track_16_n_p) containing separate information about the spliced and unspliced reads for each cell and gene. Reads mapping to transcripts are counted towards the spliced expression of a gene, while reads mapping to a gene but not a transcript, such as introns of known transcripts, or upstream/downstream of known transcripts, are counted towards the unspliced expression. The Expression Matrix with spliced and unspliced counts (Image expr_matrix_spliced_unspliced_16_n_p) can be used as input to any tool that accepts an Expression Matrix (Image expression_matrix_track_16_n_p).

Filtering: The output matrix should be filtered by QC for Single Cell before being used in any other tool in the CLC Single Cell Analysis Module. This is because sequencing errors often lead to many barcodes that have few counts, and which do not represent real cells. If no filtering is performed, the large number of barcodes can cause downstream tools to run extremely slowly and results can be negatively affected by the added noise.

Barcode whitelists: In some protocols, the set of valid barcodes is known in advance, and available as a barcode whitelist. In CLC Single Cell Analysis Module, it is not possible to directly use such a list. Instead, QC for Single Cell is usually able to detect the barcodes that correspond to cells using the Empty droplets filter (see Empty droplets filter), and to prevent specific barcodes from being filtered away (see Choosing barcodes to retain).

The tool requires a genome - supplied as References, and both a Gene track and a corresponding mRNA track. These data can obtained in two ways:

The following additional options are available:



Subsections