Split Sequence List
The Split Sequence List tool can be used to split a sequence list based on the metadata annotation entries for the individual sequences or into a preselected number of sequence lists.
To run the tool, go to
Tools () | Split Sequence List ().
After selecting the input sequence, select the Split mode (see figure 22.1):
- Annotation value based: Split the sequence list into one sequence list per value of a selected metadata annotation column.
- Partition amount based: Split the sequence list into a specified number of partitions.
When the "Annotation value based" splitting mode is chosen a number of options are available to control the behavior:
- Annotation column: The column from the table view of a sequence list to be used for separating the sequence list.
- Only specified values: If this option is selected, sequence lists will be produced only by collecting all sequences with the specified values into separate sequence lists. Otherwise, sequence lists will be produced for all available values in the specified annotation column.
- Collect sequences with undefined values: If this option is selected, all sequences without a value in the selected annotation column will be collected into a separate sequence list, otherwise they are ignored.
- Separator: A separator for the "Annotation values" field to be able to set multiple values. Note that the value specified here has no effect on the "Annotation values file".
- Annotation values: This option is only available if "Only specified values" has been selected. The field can be populated using the pick list button on the right, and individual values are separated by the specified separator. A separator may not be contained in any of the annotation values, otherwise no sequence list will be produced as the annotation value will be split and thus becomes invalid. Note that collecting the "Annotation values" may take a very long time and can be interrupted, therefore free text can be entered into this field given that the desired annotation value is known.
- Annotation values file: A file with the annotation values to produce lists for may be specified. An annotation value file contains the annotation values separated by newline characters and is independent of the specified separator above. This option is useful if the same annotation values are reused often or in a workflow setting. Note, that if annotation values are specified both by a file and by the "Annotation values" field, sequence lists will be produced for the union of all specified annotation values.
When the "Partition amount based" separation mode is chosen, two options are available:
- Number of partitions: The sequence list will be split into the requested number of lists, with the sequences being distributed evenly between the lists.
- Randomize: If this option is chosen, the order of sequences will be randomized in the produced sequence lists.
Optionally, the tool produces a metadata table for all produced sequence lists containing all the columns for which consistent annotation values exist.
Figure 22.1: The Split Sequence List options.
Note that the Split Sequence List tool can be used in a workflow and be connected to a subequent iterate statement, such that the resulting lists can be analyzed individually. One limitation is that such iterates cannot be nested, i.e. a sequence list can always only be split with respect to one column at a time during a workflow.