System requirements

The system requirements of CLC Genome Finishing are the same than for CLC Genomics Workbench, except in the cases described below.

Special requirements for Join Contigs

Most types of analyses in the Join Contigs tool run in a single thread. An exception is the long reads scaffolding option that utilize the CLC read mapper and is therefore able to use all available cores in a system. As mapping reads to contigs is one of the most time consuming steps when performing long reads scaffolding it is often an advantage to use a machine with many cores for this type of analysis.

The memory requirements for the Join Contigs can exceed the recommended memory requirements for the CLC Genome Finishing. The memory required for joining contigs depends on several factors as described below and it is not possible to predict the maximum memory consumption for an analysis. For most bacterial data sets it will be possible to run the Join Contigs tool on a machine that fulfill the system requirements for the CLC Genome Finishing. Some examples where more memory can be needed:

To help estimate the required memory consumption both for bacterial sized genomes and larger genomes some examples are given below. The memory consumption was measured on a machine with four cores, and the memory consumption for the long reads scaffolding can be larger for machines with more cores.

Organism Analysis Reads Memory required
E. coli Long read scaffolding + 273,232 454 reads 5GB
(4.6 Mbp) Reference based scaffolding avg. length=514bp  
S. cerevisiae Paired read scaffolding 22,262,792 5GB
(12.5 Mbp)   Illumina reads  
E. coli Long read scaffolding 163,478 PacBio reads 8GB
(4.6 Mbp)   avg. length=6.5Kbp  
B. lactucae Long read scaffolding 6,086,612 PacBio reads 10GB
(88 Mbp)   avg. length 2.4Kbp  

Special requirements for Correct PacBio Reads (legacy) and De Novo Assemble PacBio Reads (legacy)

The tools for error-correction and de novo assembly of raw PacBio reads from the CLC Genome Finishing Module can generate high quality assemblies in a fraction of the time that is needed by leading alternatives while consuming less than 10 percent of the memory used by alternative solutions (see https://digitalinsights.qiagen.com/wp-content/uploads/2015/07/pac-bio-benchmark-data.png).

To help estimate the required memory consumption some real-world examples are given in the table below.

Organism SMRT cells Memory required
E. coli 1 4GB
S. cerevisiae 11 9GB
C. elegans 11 11GB