Update Sequence Attributes in Lists

Update Sequence Attributes in Lists updates information pertinent to sequences within a sequence list. For example, descriptions can be updated, or new information types can be added. The attribute information to add to the sequences within a sequence list is provided via an Excel file. The attribute/column to use to match upon, so that information is the relevant row is added to a particular sequence, is specified when launching the tool.

Attribute information for each sequence can be seen when viewing a sequence list in table view. Many attributes can be directly edited, or updated using the Update Sequence Attributes in Lists tool. Some, however, cannot be, for example the length or the start of the sequence, as these are characteristics of the sequence itself.

Individual values can be updated manually by right-clicking in the relevant cell and choosing to edit that attribute (figure 25.3). Working with editable attributes in tables is described in Working with tables.

Alternatively, right click on an individual sequence in the sequence list and choose to open that sequence. Then navigate to the Element info view and change attribute values there. Changes made in the Element info view are reflected immediately in the open sequence list.

For updating information for many sequences, the Update Sequence Attributes in Lists is recommended.

Image manual-attribute-update
Figure 25.3: Attributes on individual sequences in a sequence list can be updated. Right click in the relevant cell in table view, and choose to edit that attribute.

To launch the Update Sequence Attributes in Lists tool, go to:

        Toolbox | Utility Tools (Image utilities_closed_16_n_p) | Sequence Lists (Image sequence_lists_folder_closed_16_n_p) |Update Sequence Attributes in Lists (Image update_sequence_list_attributes_16_n_p)

and select a sequence list as input (figure 25.4).

Multiple sequence lists of the same type (nucleotide or peptide) can be selected as input, however please note that sequences in all lists are considered together as a single input, and a single sequence list will be output, containing all sequences of the input lists.

Image input-to-update-seq-attrs
Figure 25.4: Select a sequence list as input to the tool.

In the second wizard step, the source of the attribute information is specified, along with details about how to handle that information.

Image seq-attrs-to-update
Figure 25.5: Attributes from 5 columns in the specified file will be added or updated. Existing information will not be overwritten. If one of the specified columsn is called TaxID, then a 7-step taxonomy will be downloaded from the NCBI and added to an attribute called Taxonomy.

Attribute information source

Configure settings

The next step provides a preview of the updates that will be made. In the upper pane, a list of the attribute types to be considered is listed. For certain attribute types, recognized by particular column names, validation rules are applied. For example, a column named GO-terms is expected to contain terms in the format, GO:<id>, e.g. GO:0046782. For these, the attribute values, as seen in table view, will be hyperlinked to the relevant GO entry online at http://amigo.geneontology.org.

This list of column headings recognized in this way, and how the values in those columns is handled, is described below.

In the bottom pane, attribute values that will be added are shown for a small subset of sequences. If these are not as expected, clicking on the "Previous" button takes you back to the previous step, where the configuration can be updated.

Image seq-attr-validation
Figure 25.6: Attributes from several columns are subject to validation checks. If any had failed the check, a yellow exclamation mark in the bottom pane would be shown for that column. Here, all entries pass. The "Other" column is not subject to validation checks. Only one sequence in the list is being updated in this example.

Column headings and value validation

Certain column headings are recognized, and if the contents pass validation rules, the entries are handled by the software, generally adding hyperlinks to an online data resources.

Two columns subject to validation have additional handling:

Other columns where contents are validated are those with the headings listed below. If a value in such a column cannot be validated, it is not added nor used to update attributes.

If you wish to add information of this type but do not want this level of validation applied, use a heading other than the ones listed below.

Updating Location-specific attributes

Location-specific attributes can be created, which are the present for all elements created in that CLC File Location. Such attributes can be updated using the Update Sequence Attributes in Lists tool.

Of note when working with such attributes:

Location-specific attributes are described in Customized attributes on data locations.