De novo assembly
The de novo assembly algorithm of CLC Genomics Workbench offers comprehensive support for a variety of data formats, including both short and long reads, and mixing of paired reads (both insert size and orientation).
The de novo assembly process has two stages:
- First, simple contig sequences are created by using all the information that are in the read sequences. This is the actual de novo part of the process. These simple contig sequences do not contain any information about which reads the contigs are built from. This part is elaborated in How it works.
- Second, all the reads are mapped using the simple contig sequence as reference. This is done in order to show coverage levels along the contigs and to enable more downstream analysis like SNP detection and creating mapping reports. Note that although a read aligns to a certain position on the contig, it does not mean that the information from this read was used for building the contig, because the mapping of the reads is a completely separate part of the algorithm.
Note: The De Novo Assembly tool was optimized for genomes up to the size and complexity of the human genome. Please contact AdvancedGenomicsSupport@qiagen.com if you would like to use the De Novo assembler with genomes that are larger and more complex than the human genome. We take into account such requests to assist future features prioritization.
Subsections
- Best practices
- How it works
- Resolve repeats using reads
- Automatic paired distance estimation
- Optimization of the graph using paired reads
- AGP export
- Bubble resolution
- Converting the graph to contig sequences
- Summary
- Randomness in the results
- SOLiD data support in de novo assembly
- De novo assembly parameters
- De novo assembly report