Mask Low-Complexity Regions

The Mask Low-Complexity Regions tool can be used to identify and mask repetitive regions in sequences. In some cases this can remove erroneous matches: for instance, when doing taxonomic profiling, a read with a highly repetitive sequence is likely to match a reference genome purely by chance.

The tool takes any sequence or sequence list as input (including reads and genomes). It will accept both nucleotide and protein sequence input.

To run the tool, choose

Tools (Image utilities_open_16_n_p) | Mask Low-Complexity Regions (Image mask_low_complexity_16_n_p).

The following general options are available (figure 22.4):

Image mask_options
Figure 22.4: The Mask Low-Complexity Regions options.

The Sequence filtering options make it possible to specify whether some or all input sequences should be output:

Finally, the Sequence modifications options determine how the output sequences are marked:

The tool optionally outputs a report with statistics on the detected regions. The report is described in details below:



Subsections