Single Cell RNA-Seq Analysis

Single Cell RNA-Seq Analysis can be found in the Toolbox here:

        Cell Preparation (Image sc_cellprep_folder_open_16_n_p) | Single Cell RNA-Seq Analysis (Image sc_rna_seq_16_n_p)

The tool takes as input one or more lists of reads that have been annotated using Annotate Reads with Cell and UMI. It outputs an Expression Matrix (Image expression_matrix_track_16_n_p) for gene expressions, and optionally an Expression Matrix for transcript expressions and a report.

Note: The output Expression Matrix should be filtered by QC for Single Cell before being used in any other tool in the CLC Single Cell Analysis Module. This is because sequencing error often leads to many barcodes that have few counts, and which do not represent real cells. If no filtering is performed, the large number of barcodes can cause downstream tools to run extremely slowly and results can be negatively affected by the added noise.

Barcode whitelists: In some protocols, the set of valid barcodes is known in advance, and available as a barcode whitelist. In CLC Single Cell Analysis Module, it is not possible to directly use such a list. Instead, QC for Single Cell is usually able to detect the barcodes that correspond to cells using the Empty droplets filter, and to prevent specific barcodes from being filtered away (see Choosing barcodes to retain).

It is important to provide all the data for a sample to the tool at the same time. For example, if one sample was sequenced on 4 lanes of an Illumina sequencer, then all 4 lanes should be supplied together. This allows reads originating from the same cell with the same UMI, but coming from different lanes, to be detected as amplification duplicates, such that they only give one count in the output Expression Matrix.

The tool requires a genome - supplied as References, and both a Gene track and a corresponding mRNA track. These data can obtained in two ways:

The following additional options are available:



Subsections