Randomness in the results

A side-effect of the very compact data structures needed in order to keep the memory consumption low, is that the results will vary slightly from run to run, using the same data set. When counting the number of occurrences of a word, the assembler does not keep track of the exact number (which would consume a lot of memory) but uses an approximation, which relies on some probability calculations. When using a multi-threaded CPU, the data structure is built in different ways for each run, and this means that the probability calculations for certain parts of the algorithm will be a bit different from run to run. This leads to differences in the results.

It should be noted that the differences are minor and will not affect the overall results. Keep in mind that whether you use CLC bio's assembler or other assemblers, there will never be one correct answer to the problem of de novo assembly. In this perspective, the small differences should not be considered a problem.

Browse the manual

Randomness in the results