Create Annotated Sequence List

The Create Annotated Sequence List tool can be used for merging sequences and sequence lists into a single sequence list and/or for annotating the individual sequences with metadata annotations. Metadata annotations are the type of annotations that are visible in columns of the table view of a sequence list, i.e. these annotations are applicable to the whole sequence. This tool can be used to create a variety of databases, e.g. taxonomic and amplicon-based reference databases, gene and resistance databases and many more.

The tool takes any sequence or sequence list as input and outputs a sequence list of nucleotide and/or protein sequences, depending on the input. All nucleotide sequences from the input will be collected into a single output nucleotide sequence list and all protein sequences from the input will be collected into a single protein sequence list.

To run the tool, go to

Tools (Image utilities_open_16_n_p) | Create Annotated Sequence List (Image create_annotated_sequence_list_16_n_p).

After selecting which sequences are to be combined and annotated, the annotation sequence sources and general annotation behavior may be specified.

Image create_annotated_sl_step1
Figure 22.2: The General Annotation Settings of the Create Annotated Sequence List tool. Sometimes there are empty values in the columns of the incoming data, e.g. in external spreadsheet files or in metadata tables. Whether to overwrite existing values with empty values or rather leave them as they are currently specified can be controlled in the Overwrite with empty values section by selecting Update with empty values or Ignore empty values, respectively.

Optionally annotations may be set for the whole sequence list, e.g. if the sequence list contains sequences clustered at a specific sequence similarity, this may be set under Clustering similarity fraction after checking Set Clustering similarity fraction annotation. These options are for example required when constructing an amplicon-based reference database.

The next wizard step makes it possible to optionally specify an Excel spreadsheet or plain text file with annotations for the sequences. In the Import section an Excel or plain text file can be specified. As soon as a file has been selected it will appear in the preview table in the bottom of the wizard. Excel files are straight-forwardly recognized, while it may be necessary to set the Encoding, Separator and Quote symbols for plain text files in order to parse them, the preview will be updated according to how the file is read with the specified settings. Note that the tool matches names found in a column called "Name" with the names of the sequences to transfer the specified values, which means that a column called "Name" has to be specified.

Image create_annotated_sl_step2
Figure 22.3: Options for annotating a sequence list with an external file.

The preview table in the bottom will be limited to 100 lines or 1MB of the input file and the first line can be modified to specify how the columns should be mapped to metadata annotations. A column named "Name" is required to be able to match a table row with a sequence, other columns may have a special meaning, indicated by the color of the column in the preview. Possible colors are

The most important header names with special meaning are



Subsections