PacBio De Novo Assembly Pipeline

This template workflow will be retired in a future version of the software. It has been replaced by De Novo Assemble Long Reads and Polish with Short Reads available from the Long Read Support plugin, see http://resources.qiagenbioinformatics.com/manuals/longreadsupport/current/index.php?manual=De_Novo_Assemble_Long_Reads_Polish_with_Short_Reads.html.

The PacBio De Novo Assembly Pipeline (legacy) template workflow is at:

        Toolbox | Legacy Tools (Image legacy_tools) | PacBio De Novo Assembly Pipeline (legacy) (Image longreads_denovo_16_h_p)

Please note that the tools Correct PacBio Reads (legacy) and De Novo Assemble PacBio Reads (legacy) are optimized for the use of PacBio data and readily support data generated with different generations of PacBio chemistry (sequencing reagents). Due to such algorithm-optimizations the use of these tools for other data types is not supported. Moreover, for the tool Correct PacBio Reads (legacy) we are relying on certain methods which are the intellectual property of Pacific Biosciences. The use of the Correct PacBio Reads (legacy) tool or the predefined workflow PacBio De Novo Assembly Pipeline (legacy) with any data other than data generated on a Pacific Biosciences instrument constitutes a violation of the end user license agreement that users of the CLC Genome Finishing Module agree to during installation.

The template workflow takes imported PacBio reads as input and produces a high-quality assembly together with a number of reports that can be used to evaluate the quality of both the input data and the assembly. It consists of seven steps:

  1. Raw PacBio reads import Raw PacBio reads are imported from FASTQ or H5 files (see http://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Import_high_throughput_sequencing_data.html.
  2. Correct PacBio Reads (legacy) Sequencing errors are corrected and chimeric reads and untrimmed adapters are resolved in a subset of the longest reads in the input data set. The corrected reads are output in a file named 'Corrected reads' and a summary of the error-correction is saved in a file named 'Corrected reads - report'. This report can be used to both evaluate the quality of the input reads and to assess the error-correction and assembly parameters.
  3. De Novo Assemble PacBio Reads (legacy) The error-corrected reads are assembled into high-quality contigs.
  4. Map Reads to Contigs The corrected reads are mapped to the contigs in order to be able to run the Join Contigs tool.
  5. Join Contigs Contigs are joined by automatic scaffolding based on the read mapping created above. The final contigs are saved to a file named 'Contig sequences'.
  6. Map Reads to Contigs The corrected reads are mapped to the final contigs in order to be able to run the Analyze Contigs tool. This read mapping can, together with the output from the Analyze Contigs tool, furthermore be used to evaluate the support for each contig and manually identify and resolve possible assembly errors. The read mapping is saved to a file named 'Corrected reads mapped to contigs' and a report that summarizes the read mapping is saved to a file named 'Corrected reads mapped to contigs - report'.
  7. Analyze Contigs The final contigs are analyzed in order to find problematic regions that may need manual curation. A summary of the analysis is saved to a file named 'Contig analysis report' and the problematic regions are reported in a file named 'Contig analysis table'.