HDF5 formats

AnnData, Cell Ranger HDF5, h5Seurat and Loom are HDF5 formats, with specific requirements regarding structure of the data. An HDF5 file is organized in a hierarchical structure with:

Metadata for groups and datasets is stored in associated attribute lists. Groups and datasets can often be themselves semantically interpreted as attributes.

All HDF5 importers contain an Expression matrix option, used for specifying the HDF5 file to be imported.

AnnData importer

The expression matrix in an AnnData (h5ad) is in a sparse dataset `X', while features and cells are described using the `var' and `obs' groups, respectively. See https://anndata.readthedocs.io/ for more details.

The `_index' attribute on group `obs' defines the cell identification, and the interpretation of this is specified by the Cell format.

h5Seurat importer

A h5seurat file may contain multiple assays and each assay may contain multiple expression matrices, e.g., counts and normalized expressions. The matrices can be sparse or dense. See https://mojaveazure.github.io/seurat-disk/articles/h5Seurat-spec.html for more details.

Only one assay and matrix can be imported at a time. The h5Seurat importer expects the format version 4.0.0.

The `cell.names' attribute contains the cell identification, and the interpretation of this is specified by the Cell format. If the sample is not set through Cell format or Sample, the sample for each cell is read from the `orig.ident' attribute on group `meta.data'.

The gene or transcript names are read from the `features' attribute of the selected assay.

Loom importer

A Loom file has an internal structure consisting of a main matrix, optional `layers' of the same size as the main matrix and row and column attributes (describing features and cells, respectively). See https://linnarssonlab.org/loompy/format/index.html for details on the format.

The Loom importer expects the Loom format version 3.0.0.