Polish with Reads
The Polish with Reads tool facilitates the process of refining a set of sequences with high-quality reads. This enables the creation of hybrid assemblies by first creating an assembly from long, error prone reads, and subsequently using high-quality Illumina reads to polish the contigs.
Before polishing, high-quality reads should be stripped of adapters and lower quality bases. This can be done using Trim Reads (http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Trim_Reads.html).
The tool uses Racon [Vaser et al., 2017] with additional improvements inspired by Minipolish [Wick and Holt, 2019]. Racon uses a divide-and-conquer approach for rapid consensus calling. The partial order alignment (POA) of the reads against the target sequences occurs in non-overlapping windows on the target sequences. This approach has the consequence that Racon may not always use a globally optimal alignment of reads for consensus calling.
Polishing is conducted in two steps. Following each step, a set of corrections, inspired by minipolish [Wick and Holt, 2019], are carried out to improve contig quality:
- Contig ends that were truncated by Racon are reintroduced.
- Circular contigs are rotated by half the sequence length in each of the iterations.
- For circular contigs, the mapping is corrected for reads that span the junction.
To run the Polish with Reads tool, go to:
Toolbox | Long Read Support () | Polish with Reads ()
Select a sequence list containing contigs or long reads.
In the next dialog, set the polishing parameters (figure 6.1):
Figure 6.1: Polish with Reads parameters
- Reads. Select a sequence list of trimmed, high-quality reads to be used for polishing.
- POA window size. The window size for which Racon computes partial order alignments (POA). A larger window size enhances the ability to capture more global structure during the polishing process, but it also increases the memory requirement.
- Include unpolished sequences. Check the checkbox to keep sequences for which polishing was not possible.
- Minimum sequence length. The minimum length of sequences to be included in the output.
Subsections