The clc_extract_consensus Program

This program makes it possible to extract a consensus sequence from a read mapping. It operates on a cas file produced by the reference assembly programs.

The clc_extract_consensus program makes a consensus sequence file containing all the original data but with changes made to the references to reflect the information recovered by the read mapping. The program can furthermore be run such that it outputs a list of regions in the reference with zero coverage in the read mapping. The following options control how the consensus sequence is constructed.

The -r option determines how conflicts between reads should be resolved in the consensus sequence. The default is a simple vote (the majority of the read bases determine the consensus base), but it is also possible to output ambiguity characters (this will cause sequencing errors to be reflected in the consensus sequence) or to report the positions as containing unknown bases (using N's).

The -c option can be used to specify the minimum coverage that will make a difference with respect to the reference valid and report it. The default for this option is 2.

The -z option controls how positions with zero coverage are reported in the consensus sequence. The default is to report the base from the reference sequence, but it is also possible to report the position as an unknown base (N) or to simply remove the position.

Using the -i option will make the program ignore all indels completely while constructiong the consensus sequence.

The following three options control the input and output of the program

Use the -a option to specify the input cas file to be analyzed. This option is a required option.

Use the -o option to specify where the output fasta file should be placed. This option is a required option.

Finally, use the -w option if you want a list of zero coverage regions output to sceeen.

See Options for All Programs for further details.