High-throughput sequencing technologies enable rapid full-genome
sequencing of genomes. However, short read lengths and repetitive
sequences often complicate full genome assembly and result in
fragmented assemblies. CLC Genome Finishing Module has been developed to help
finishing small genomes such as bacterial genomes in order to reduce
the extensive work load previously associated with genome finishing
and to facilitate as many steps in the procedure as possible.
CLC Genome Finishing Module is an add-on module to the CLC Genomics Workbench with a number of new tools that can be used in different combinations. The individual tools are listed below and described in detail in the following chapters.
- Align Contigs. Aligns contigs to a reference sequence or, in the absence of a reference, to the contigs themselves.
- Analyze Contigs. Analyzes the contig read mappings for possible misassemblies, single strandedness, coverage, broken pairs, and unaligned ends.
- Annotate from Reference. Transfers annotations to contigs from one or more already annotated references.
- Collect Paired Reads Statistics. Detects paired reads that map to separate contigs.
- Create Amplicons. Tool for placing amplicon annotations on sequences. Used before the Primer Creator to subdivide regions of interest into fragments of suitable sizes.
- Create Primers. Automated primer design for re-sequencing purposes.
- Add Reads to Contigs. Allows addition of additional sequence data to existing contigs.
- Sample Reads. Allows a user defined reduction of the number of reads.
- Find Sequence. Tool to search for names, sequences or annotations in sequencing data.
- Reassemble Regions. Reassembly of selected regions in contigs. Useful for solving small misassemblies.
- Extend Contigs. Extends contigs with existing reads.
- Join Contigs. An automated way of joining contigs.
- Remove Extension of Contigs. Allows the user to remove the extensions from the contigs after the extended contigs have been joined.
- Import PacBio Reads. An automated way to import the 2 file formats conatining PacBio reads.
- Correct PacBio Reads (beta). Corrects sequencing errors and detects and resolves untrimmed adapter sequences and chimeric reads in PacBio SMRT reads.
- De Novo Assemble PacBio Reads (beta). Assembles error-corrected long reads into high-quality contigs.
CLC Genome Finishing Module is constantly under development and a detailed list that includes a description of new features, improvements, bugfixes, and changes for the current version can be found at: