Update Sequence Attributes in Lists

Update Sequence Attributes in Lists updates information about sequences in a Sequence List. Information can be added to existing attributes and new attribute types can be added.

The tool takes attribute information from an Excel file (.xls/xlsx), a comma separated text file (.csv), or a tab separated text file (.tsv). Each sequence is updated with the relevant information by matching the content of a particular column in the file, specified when launching the tool, with the contents of a column of the same name in the Sequence List.

The columns to take information from are specified when launching the tool. Column names are used as attribute names. If a column name matches an existing attribute in the Sequence List, the information from that column can be added to the existing attribute (details below). When a column name does not match an existing attribute, a new attribute is added to the Sequence List.

Additional notes:

To launch the Update Sequence Attributes in Lists tool, go to:

        Toolbox | Utility Tools (Image utilities_closed_16_n_p) | Sequence Lists (Image sequence_lists_folder_closed_16_n_p) |Update Sequence Attributes in Lists (Image update_sequence_list_attributes_16_n_p)

and select one or more Sequence Lists of the same type (nucleotide or peptide) as input (figure 27.14).

Note: Sequences in all inputs provided will be worked upon as a single entity. A single Sequence List containing all sequences is output.

Image input-to-update-seq-attrs
Figure 27.14: Select one or more Sequence Lists as input.

In the Settings wizard step, the file containing attribute information is specified, along with details about how to handle that information (figure 27.15).

Image seq-attrs-to-update
Figure 27.15: Information in the attribute file will be matched with the relevant sequence based on contents of the Name column in the file and in the Sequence List. Five columns containing relevant attribute information have been selected. The option to overwrite existing information has been left unchecked.

Attribute information source fields

Configure settings checkboxes

The result of the choices made in the Settings step are reflected in the Preview wizard step (figure 27.16). In the upper pane is a list of the attribute types to be updated or added, as well as the attribute to be used to match sequences with the relevant information. How particular columns will be handled is indicated in the "Content handling" column, including whether validation will be applied. The columns subject to validation checks are described later in this section.

Shown in the lower pane is a small subset of the incoming information from the attribute file, based on the choices made in the Settings wizard step. Click on the "Previous" button to go back to that step if anything needs to be adjusted.

Image seq-attr-validation
Figure 27.16: The Preview wizard steps shows information about how columns from the attribute file will be handled, and whether any problems were detected. Where validation checks are carried out, if any had failed, a yellow exclamation mark in the bottom pane would be shown for that column. Here, all entries pass. The "Other" column is not subject to validation checks. Only one sequence in the list is being updated in this example.

Column headings and value validation

Certain column names are recognized by the software and validation rules are applied to these. When the contents pass the validation checks, entries in those columns may be further processed.

In most cases, this further processing involves adding hyperlinks to online data resources. However, the contents of columns with the following names trigger different handling:

Other columns where contents are validated are those with the headings listed below. If a value in such a column cannot be validated, it is not added nor used to update attributes.

If you wish to add information of this type but do not want this level of validation applied, use a heading other than the ones listed below.

Updating Location-specific attributes

Location-specific attributes can be created, which are the present for all elements created in that CLC File Location. Such attributes can be updated using the Update Sequence Attributes in Lists tool.

Of note when working with such attributes:

Location-specific attributes are described in Customized attributes on data locations.