Perform Single Cell Analysis from Expression Matrix

The workflow Perform Single Cell Analysis from Expression Matrix takes one or more Expression Matrix (Image expression_matrix_track_16_n_p) as input and performs quality control, normalization, clustering and cell type prediction. The workflow uses iterate functionality and allows for a combined analysis of multiple samples to produce a single Dimensionality Reduction Plot (Image singlecellplot_16_n_p) associated with the automatic clusters, predicted cell types and additional cell annotations, a Heat Map (Image heatmap_16_n_p) and a Dot Plot (Image sc_dot_plot2_16_n_p) with the predicted cell types as cluster information.

The workflow can be found in the toolbox here:

        Workflows (Image sc_workflow_folder_open_16_n_p) | Perform Single Cell Analysis from Expression Matrix (Image singlecell_from_exprmatrix_16_n_p)

If you are connected to a CLC Server via your Workbench, you will be asked where you would like to run the analysis. We recommend that you run the analysis on a CLC Server when possible.

Choose either one or more Expression Matrix (Image expression_matrix_track_16_n_p) or Select files for import and select the expression matrix format that is compatible with the selected input. Read more about the import options in Data import.

The workflow offers a number of options described below. Note that not all parameters can be configured. Open parameters indicate places where customization may be necessary for different samples, but default settings are suitable in most cases.

The workflow can be run using Single Cell hg38 (Ensembl) or Single Cell Mouse (Ensembl) reference data (see The Reference Data Manager). It is also possible to create a custom reference data set, for instance for analyzing other species or using other gene and mRNA annotations, see  https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Custom_Sets.html. A custom Cell Type Classifier (Image cell_type_classifier_16_n_p) can also be used, however the gene annotations used for training the classifier and those in the reference data set should be matching, see Features used for training and prediction.

The workflow allows the analysis of multiple samples and you can specify metadata during workflow execution. This is converted to cell annotations and can be used for coloring the cells in the Dimensionality Reduction Plot. However, the workflow expects each sample to be present in just one matrix, and attempting to define batch units containing more than one matrix will lead to a failure during execution. For more details on configuring workflow executing with metadata, see  https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Batching_part_workflow.html. Make sure to inspect the batch overview to check that the analysis will be performed correctly.

For quality control a number of options exist. The option to remove empty droplets is not suitable for protocols that do not use droplets, and removing barcodes with low number of reads or expressed features might be more appropriate. Quality Control (QC) uses the number of reads mapped to the mitochondria, and for this the name of the mitochondria chromosome needs to be provided. The default value is often the correct name. After quality control, the matrices are collected and normalized jointly. Note that batch correction is not performed. Read more about QC and normalization in Cell preparation.

For clustering and creation of the Dimensionality Reduction Plot plot, it is possible to restrict analysis to highly variable genes. The data is then projected to a lower dimensional space using PCA. You can read about this feature in Feature selection and PCA.

The expression plots (Heat Map and Dot Plot) group the cells based on the predicted cell types, but the grouping can be changed to use the automatic clusters by changing the text to e.g. "Leiden (resolution=1.0)". All resolutions from 0.1 to 1.5 are produced.



Subsections