The clc_find_variations Program

This program makes it possible to detect variants between a reference sequence and the reads. It operates on a cas file produced by the reference assembly programs.

It makes a new consensus sequence file containing all the original data but with changes made so the references reflect the read sequences of an assembly. The new consensus file is always in fasta format. It is also possible to run the program so it only prints a list of differences instead of actually making a new file.

There is an option `-c' to determine minimum coverage for read differences to be reported.

The -r option will determine how conflicts in the reads should be resolved in the consensus sequence. The default is a simple vote (the majority of the reads determine the consensus base), but it is also possible to get ambiguity characters as well (note that this will mean that sequencing errors will also reflect in the consensus sequence, so it should be used with caution.

Using the w option, the program will output a list of zero-coverage regions in the assembly.

If you wish to see the reads matched to the new reference sequences, a new round of reference assembly has to be performed. The reason for this is that the changes to the references may significantly change the optimal locations of the reads in the changed regions. So a complete new reference assembly is necessary. Sometimes the new read alignments may suggest a few more changes to the reference sequences, so another run of clc_find_variations may be in order.

There is also an option `-i' that will ignore insertions and deletions completely. This can be an advantage when looking for variations in data sets from sequencing platforms producing many indel sequencing errors.

See Options for All Programs for further details.