De novo assembly parameters

To start the assembly:

        Toolbox | De Novo Sequencing (Image de_novo_sequencing) | De Novo Assembly (Image ngs_novo_assembly)

In this dialog, you can select one or more sequence lists or single sequences.

Click Next to set the parameters for the assembly. This will show a dialog similar to the one in figure 28.18.

Image denovoassembly_step2
Figure 28.18: Setting parameters for the assembly.

At the top, you select the Word size and the Bubble size to be used. The principles of setting the word size are described in How it works. When using automatic calculation, you can see the word size in the History (Image history) of the result files. Please note that the range of word sizes is 12-24 on 32-bit computers and 12-64 on 64-bit computers. The meaning of the bubble size parameter is explained in Bubble resolution. The bubble size used when the setting is automatic is 50 for reads shorter than 110 bp, and for longer reads it is the average read length. The value used is also recorded in the History (Image history) of the result files.

The next option is to specify Guidance only reads. Only the pair information on these reads will be used, and the reads will only contribute in the scaffolding step. The construction of the word table and the graph will not be based on these reads. An example of a use case for this is SOLiD data which has a high error rate when used in base space. By using SOLiD for guidance only, it is possible to make use of the pair information without having the errors complicating the graph.

You can also specify the Minimum contig length when doing de novo assembly. Contigs below this length will not be reported. The default value is 200 bp.

Finally, there is an option to Perform scaffolding. The scaffolding step is explained in greater detail in Optimization of the graph using paired reads. This will also cause scaffolding annotations to be added to the contig sequences (except when you also choose to Update contigs, see below).

When you click Next, you will see the dialog shown in figure 28.19

Image denovoassembly_step3
Figure 28.19: Parameters for mapping reads back to the contigs.

At the top, you choose whether a read mapping should be performed after the initial contig creation. If you choose to do that, you can specify the parameters for the read mapping. These are all explained in The read mapper tool.

At the bottom, you can choose to Update contigs based on mapped reads. This means that the original contig sequences produced from the de novo assembly will be updated to reflect the mapping of the reads (in most cases it will mean no change, but in some cases, the subsequent mapping step leads to new information). In effect, this means that all contig sequences in the output will be supported by at least one read mapped back. Note that if this option is selected, the contig lengths may get below the threshold specified in figure 28.18 because this threshold is applied to the original contig sequences. If the Update contigs based on mapped reads option is not selected, the original contig sequences from the assembler will be preserved completely also in situations where the reads that are mapped back do not support the contig sequences.

If you update the contigs, it means that scaffolding annotations will not be added to the contig sequences (since the contig sequences may change in this process it is not possible to place these annotations correctly). This in turn affects the de novo assembly report which will not have statistics about scaffolding when the update contigs option is selected.