PacBio Onso
The PacBio Onso importer is designed to import fastq (.fastq/.fq) files generated by PacBio Onso sequencing technology. Uncompressed files as well as files compressed using gzip (.gz), zip (.zip) or bzip2 (.bz2) can be provided as input. Quality scores are expected to be in the NCBI/Sanger format, see Quality scores in the Illumina platform. The importer processes UMI information from the fastq read headers, see General notes on UMIs.
To launch the PacBio Onso importer, go to:
Import () | PacBio () | PacBio Onso ().
This opens a dialog where files can be selected and import options specified (figure 7.11).
Figure 7.11: Importing data from PacBio Onso.
The General options are:
- Paired reads.
Files will be paired up based on their names, which are assumed to contain _R1 and _R2 (alternatively, _1 and _2), respectively. Other than the R1/R2 (or the 1/2), the file names in a pair are expected to be identical.
Under Paired read information:
- Choose the orientation of the paired reads, either Forward-reverse or Reverse-forward.
- Specify the insert sizes by setting Minimum distance and Maximum distance. Data sets with different insert sizes should be imported separately, with the correct minimum and maximum distance.
Read more about handling paired data in General notes on handling paired data.
- Discard read names. Read names can be discarded to save disk space without affecting analysis results. Keeping read names can be useful in some circumstances, such as when inspecting sequence list contents or when working downstream with subsets of sequences.
- Discard quality scores. Quality scores are visible in read mappings and are used by various tools, e.g. for variant detection. If quality scores are not relevant, use this option to discard them and reduce disk space and memory consumption.
The PacBio Onso options are:
- Join reads from different lanes.
When checked, fastq files from the same sequencing run but from different lanes are imported as a single sequence list.
Lane information is expected in the filenames as "_L<digits>", e.g. "L001" for lane 1. If this patterns occurs more than once in a filename, the last occurrence in the name is used. For example, for a filename "myFile_L001_L1.fastq" the lane information is assumed to be L1.