Metadata refers to information about data. In the context of the CLC Genomics Workbench, this usually means information about samples. For example a set of reads could come from a particular specimen at a particular time point with particular characteristics. The specimen, time and characteristics would be metadata for that set of reads.

Examples in this chapter refer to tools present in the CLC Genomics Workbench, but the principles apply to other CLC Workbenches.

What is metadata used for?

Core uses of metadata in CLC software include:

Metadata tables

An example of a CLC Metadata Table in the CLC Genomics Workbench is shown in figure 13.1. Each column represents a property of a sample (e.g., identifier, height, age, treatment) and each row contains information relevant to a sample. A single column can be designated the key column. That column must contain unique entries.

Image metadata_table
Figure 13.1: A simple metadata table, with the key column highlighted in blue.

Each row can have associations with one or more data elements, such as sequence lists, expression tracks, variant tracks, etc. Associating data elements with relevant metadata rows, automatically or manually, is covered in Associating data elements with metadata

Information from an Excel, CSV or TSV format file can be imported into a CLC Metadata Table, as described in Importing metadata. CLC Metadata Tables are also generated by workflows, as described in Workflow outputs and workflow result metadata tables.

A template workflow for importing sequence data with associated metadata can be found in the Preparing Raw Data folder in the Template Workflows section of the Toolbox (see Import with Metadata).

Metadata Elements table

Data elements with an association to a row in a CLC Metadata Table can be listed by selecting the rows of interest and clicking on the Find Associated Data button. The Metadata Elements table opens, with a table of information about elements with associations to the selected rows (figure 13.2).

Image metadataelementstable
Figure 13.2: A CLC Metadata Table and corresponding Metadata Elements table showing elements associated with sample 27T.

When a data element is associated with a metadata row, the outputs of analyses involving that data usually inherit the metadata association automatically. For example, if a sequence list with an association to a CLC Metadata Table row is used as input to analyses, results of these analyses may also be associated with that row (figure 13.2).

Inheritance of associations to metadata requires that a single association can be unambiguously identified for an output when a tool is run. If an output is derived from two ore more inputs with different metadata associations, then no association will be inherited.