System requirements

Memory and CPU settings for mapping reads

For mapping reads to the human genome ( 3.2 gigabases), or genomes of a similar size, 16 GB RAM is required. Smaller systems can be used when mapping to small genomes.

Larger amounts of memory can help the overall speed of the analysis when working with large datasets, but little gain is expected above about 32 GB of RAM.

Increasing the number of cpus can decrease the time a read mapping takes, however performance gain is expected to be limited above approximately 40 threads.

System requirements for de novo assembly

De novo assembly may need more memory than stated above - this depends both on the number of reads, error profile and the complexity and size of the genome.

When assembling PacBio HiFi reads with the De Novo Assemble Long Reads tool, at least 32 GB RAM is recommended.

For examples of the memory usage of various data sets when using the De Novo Assembly tool, see https://resources.qiagenbioinformatics.com/white-papers/White_paper_on_de_novo_assembly_4.pdf.

Other requirements for long read analysis

The following tools for working with long reads require an AMD/Intel CPU that supports AVX2 or an Apple M series CPU:

Note that the Polish Contigs with Reads tool uses Racon [Vaser et al., 2017], which consumes memory proportional to the size of the required input files during polishing.

System requirements for 3D viewers

A graphics card that supports OpenGL 2.0.

Note: 3D rendering is only supported when the CLC Genomics Workbench is installed on the same machine the viewer is opened on. Indirect rendering (such as X11 forwarding through ssh), remote desktop connection/VNC, and running in virtual machines is not supported.

Performance on large systems

The performance of tools that take advantage of multiple cores does not scale linearly with high numbers of cores. With a large system (e.g. >64 cores), a CLC Genomics Server with job nodes running on virtual machines would provide the potential to use more of the compute capacity in a controlled manner than a CLC Genomics Workbench. Jobs submitted to a CLC Genomics Server job node setup can be run in parallel, with appropriate CPU limits configurable for each node. For further details, see https://resources.qiagenbioinformatics.com/manuals/clcserver/current/admin/index.php?manual=Introduction.html.

CLC data location requirements

Requirements for CLC data locations are provided in Adding locations.