Cell format in importers
All importers in the CLC Single Cell Analysis Module import information about cells. Cells are identified by a combination of their barcode, e.g. "AAGCT", and their sample name.
Importers share the following common options:
- Cell format. This option allows the barcode and the sample name to be extracted separately from the name of the cell. By default, the name of the cell is used as the barcode. For matrix importers, the sample name is set to the name of the imported file. For the remaining importers, the sample name has to be provided through one of multiple options.
The cell format is specified by using a mixture of keywords (see figure 2.8) and text, see table 2.1 and figure 2.9 for examples.
Figure 2.8: Keywords that can be used to specify how to extract the barcode and sample name for a cell.Table 2.1: Examples of cell formats and the resulting samples and barcodes. * The sample is obtained either from the name of the imported file, for matrices, or from the other sample options. ** This example assumes a matrix file named "demo.h5". Cell format Name of the cell Sample Barcode {barcode}-1 AAGCT-1 - * AAGCT {sample}-{barcode} demo-AAGCT demo AAGCT {sample}-{barcode} demo-AAGCT demo AAGCT {sample}-{barcode} de-mo-AAGCT de mo-AAGCT {sample}-{barcode:trailing} de-mo-AAGCT de-mo AAGCT {barcode:1}-{sample}-{barcode:2} AA-demo-GCT demo AAGCT {barcode}-{sampleSuffix} AAGCT-1 demo-1 ** AAGCT
Figure 2.9: The top panel shows the results of importing a matrix file with Cell format = {barcode}. After import, the sample name is the name of the file that was imported, and the barcode is the entire name of the cell. In the bottom panel, Cell format = SRX41800{sample}_filter.{barcode}. Here, the sample name and the barcode are extracted from the name of the cell, and other parts of the name are discarded. - Sample (Optional). This can be used for specifying a custom sample name. It should only be used when the file contains just one sample. It overrides the default sample name.
This is relevant e.g. when jointly analyzing an imported Expression Matrix and Peak Count Matrix, where cells must have the same sample name.
Importers contain a Preview cells section showing the parsing of cell names into sample and barcode, as shown in figure 2.10.
Figure 2.10: Previewing how the cell name (input barcode) resolves to sample and barcode.
This can be helpful for ensuring the provided cell format matches the input. Figure 2.11 shows an example where the sample and barcode have clearly been swapped, while in figure 2.12, the sample and barcode cannot be identified for one of the cells, because the cell name does not match the cell pattern.
Figure 2.11: Preview where the sample and barcode have been swapped
Figure 2.12: Preview where a cell does not match the pattern. The tooltip contains a detailed error message.
If the configuration in the wizard is invalid, the preview may fail to determine the sample and/or barcode for all cells, as shown in figure 2.13.
Figure 2.13: Preview where the sample cannot be determined. The tooltip indicated why and it will typically match a validation error from the wizard. Here, the sample is specified in two ways. The barcode can still be determined.
The preview can be disabled if not needed. This is useful for input files that are large, where generating the preview may take some time.