Specifying information for the assembly
Options in this section can alter the results of the assembly. The How it works section of the manual gives further details that are relevant to how these settings may affect an assembly.
-w <n> / --wordsize <n>
Set n to be the word size for the de Bruijn graph. The default is based on the size of the input as described in the How it works section of the manual.
-b <n> / --bubblesize <n>
Set the maximum bubble size for the de Bruijn graph. The default is 50 bases.
-e <file> / --estimatedistances <file>
This setting estimates the distances for paired reads as observed within unscaffolded contigs. These distances are then used in the scaffolding step. If multiple sets of paired data have been input, the distances are estimated separately for each data set. The distances calculated will be saved to the file specified as the argument to this parameter.
When this flag is used, the program will aim to identify tight distance intervals from areas containing a substantial number of the mapped reads for each dataset.
There are situations where it is not possible to estimate accurate paired distances from the data, such as:
- No best candidate interval for distance estimation can be specified as the two best candidate intervals to be used for estimating the distances differ only by a factor of two in the number of pairs they contain.
- The best interval suggests a negative average distance for the paired reads in a dataset.
- More than half the reads have the wrong relative orientation.
- The best interval contains less than 1% of the mapped reads in that dataset.
If it is not possible to estimate an accurate distance from the data for any particular paired read set, then the original paired distance entered as part of the parameter settings associated with the -p flag will be used. Errors and warnings associated with such situations will be written to the file specified with the -e parameter.