The system requirements of CLC Genome Finishing are the same than for CLC Genomics Workbench, except in the cases described below.
Most types of analyses in the Join Contigs tool run in a single thread. An exception is the long reads scaffolding option that utilize the CLC read mapper and is therefore able to use all available cores in a system. As mapping reads to contigs is one of the most time consuming steps when performing long reads scaffolding it is often an advantage to use a machine with many cores for this type of analysis.
The memory requirements for the Join Contigs can exceed the recommended memory requirements for the CLC Genome Finishing. The memory required for joining contigs depends on several factors as described below and it is not possible to predict the maximum memory consumption for an analysis. For most bacterial data sets it will be possible to run the Join Contigs tool on a machine that fulfill the system requirements for the CLC Genome Finishing. Some examples where more memory can be needed:
- Long reads scaffolding using long reads with a high error rate, such as PacBio reads, on a machine with many cores.
- Running the tool on highly fragmented assemblies.
- A large genome.
To help estimate the required memory consumption both for bacterial sized genomes and larger genomes some examples are given below. The memory consumption was measured on a machine with four cores, and the memory consumption for the long reads scaffolding can be larger for machines with more cores.
|E. coli||Long read scaffolding +||273,232 454 reads||5GB|
|(4.6 Mbp)||Reference based scaffolding||avg. length=514bp|
|S. cerevisiae||Paired read scaffolding||22,262,792||5GB|
|(12.5 Mbp)||Illumina reads|
|E. coli||Long read scaffolding||163,478 PacBio reads||8GB|
|(4.6 Mbp)||avg. length=6.5Kbp|
|B. lactucae||Long read scaffolding||6,086,612 PacBio reads||10GB|
|(88 Mbp)||avg. length 2.4Kbp|
The tools for error-correction and de novo assembly of raw PacBio reads from the CLC Genome Finishing Module can generate high quality assemblies in a fraction of the time that is needed by leading alternatives while consuming less than 10 percent of the memory used by alternative solutions (see https://digitalinsights.qiagen.com/wp-content/uploads/2015/07/pac-bio-benchmark-data.png).
To help estimate the required memory consumption some real-world examples are given in the table below.
|Organism||SMRT cells||Memory required|