Other formats
MEX importer
This importer requires the following files to be supplied:
- Barcodes file. A file with the extension .tsv, conventionally barcodes.tsv. It contains tab-separated columns, and has one row per barcode. It can optionally contain a header. The barcodes are read from the first column. Empty lines are ignored.
Use the Cell format option to control how the barcodes should be interpreted - for example if it also includes information about the sample.
- Feature file. A file with the extension .tsv, conventionally features.tsv or genes.tsv. It contains one row per feature. It can optionally contain a header. Empty lines are ignored.
It contains multiple tab-separated columns:
- One column: the feature name.
- Two columns: the feature identifier and name.
- Three columns: the feature identifier, name, and type. Of the commonly used feature types, "Gene Expression", "Transcript Expression", and "Spike-in" are the most important ones. Other features, such as "Antibody Capture" will be silently ignored by most tools.
For 10x Multiome files there will be six columns. The last three consist of genome coordinates and will be ignored. Lines with feature type "Peaks" will also be ignored. They should instead be imported as a Peak Count Matrix (see Import Peak Count Matrix).
- Matrix file(s). File(s) with the extension .mtx in the Matrix Market Exchange Coordinate Format, see https://math.nist.gov/MatrixMarket/formats.html for details. Features must be in the first dimension (rows) and cells in the second (columns).
Expressions and/or spliced and unspliced counts can be imported using:
- Matrix file for expressions, conventionally named matrix.mtx.
- Matrix file (spliced) for spliced counts, conventionally named spliced.mtx.
- Matrix file (unspliced) for unspliced counts, conventionally named unspliced.mtx.
See Options for importing spliced and unspliced counts for more details.
Additional options are:
- Name. The name of the imported matrix. If Cell format is not configured to parse a sample name from each barcode in the barcodes file, then this will also be the sample name for all the imported barcodes.
- Files are in same directory. This option is provided for convenience and works for local files. When checked, if any file option is updated to a file in a new directory, the other files are automatically updated, if files with the conventional names can be found in the directory.
MEX archive importer
The MEX archive importer is provided for convenience. A .zip, .tar or .tar.gz archive file can be provided in the Archive file containing the files required by the MEX importer. In order to uniquely identify each file, these must have a specific name:
- Barcodes file must be named barcodes.tsv
- Feature file must either be named features.tsv or genes.tsv
- Matrix file must be named matrix.mtx
- Matrix file (spliced) must be named spliced.mtx
- Matrix file (unspliced) must be named unspliced.mtx
The importer can be configured to either import an Expression Matrix (), or an Expression Matrix with spliced and unspliced counts (). For the first option, `Import expressions' must be enabled, while for the second option, `Import spliced/unspliced' must be enabled.
Either the `Matrix file', or `Matrix file (spliced)' and Matrix file (unspliced)' can be missing from the archive, depending on how the importer has been configured.
For all three .mtx files, the features must be in the first dimension (rows) and cells in the second (columns). See https://math.nist.gov/MatrixMarket/formats.html for details of the Matrix Market Exchange Coordinate Format.
See Options for importing spliced and unspliced counts for more details on how the total expression is calculated.
Parse Bio MTX importer
This importer requires three files to be supplied:
- Cell metadata file. A file with the extension .csv and comma separated columns, with one row per barcode. It must contain headers. The following options relating to the cell metadata file are available:
- Barcode column. The name of the column containing the barcodes.
- Cell metadata has sample name. If checked, the sample name is read from the file. Otherwise, the sample name is defined by the general options (see Cell format in importers).
- Sample column (Optional). The name of the column containing the sample names.
- Feature file. A file with the extension .csv and comma separated columns, with one row per feature. The following options relating to the feature file are available:
- Feature id column (Optional). The name of the column containing the feature identifiers (e.g., ENSG00000243485 for ENSEMBL).
- Feature name column.The name of the column containing the feature names.
- Matrix file. A file containing the expression with the extension .mtx in the Matrix Market Exchange Coordinate Format. Cells must be in the first dimension (rows) and features in the second (columns). See https://math.nist.gov/MatrixMarket/formats.html for details of the Matrix Market Exchange Coordinate Format.
Batches and samples: QC for Single Cell runs separately for each sample detected in the input Expression Matrix. This might not be appropriate for Parse Biosciences data, where samples are sequenced together in one batch. If the matrix to be imported is not filtered, we recommend to:
|
Plain Text Table importer
This importer supports import of text data in a full plain text table format.
- Expression matrix. A single file to be imported.
- Table layout. Choose whether the table has cells in columns and features in rows, or is transposed such that features are in columns and cells are in rows.
- Separator. Choose the column separator.
Working with spreadsheets Be careful to check that all the data is present before import if the file originates from a spreadsheet program. Such programs often impose limits on the number of rows and columns. |