PacBio Long Reads
The PacBio Long Reads importer is designed to import long reads generated by PacBio sequencing technology.
To launch the PacBio Long Reads importer, go to:
Import () | PacBio () | PacBio Long Reads ().
This opens a dialog where files can be selected and import options specified (figure 7.10).
Note: PacBio HiFi reads are not fully supported. Of particular note: Tools that carry out de novo assembly do not support this data type, and quality scores are capped at 64, affecting tools that make use of such scores, e.g. variant detection tools.
Figure 7.10: Importing data from PacBio.
The following file formats are supported:
- H5 files (.bas.h5/.bax.h5) which contain one of two things. .bas.h5 files produced by instruments prior to PacBio RS II contain sequencing data such as reads and quality scores. .bas.h5 files from more recent PacBio instruments contain a list of .bax.h5 files where the actual sequencing data is stored. When importing H5 files, the user needs to select both the .bas.h5 file and all the accompanying .bax.h5 files belonging to a data set.
- Fastq (
.fastq/.fq
). Uncompressed files as well as files compressed using gzip (.gz), zip (.zip) or bzip2 (.bz2) can be provided as input. Quality scores are expected to be in the NCBI/Sanger format, see Quality scores in the Illumina platform. The importer processes UMI information from the fastq read headers, see General notes on UMIs. - Fasta files (.fasta) which contain sequence data. Compressed Fasta (.fasta.gz) files are also supported.
- SAM or BAM files (.sam/.bam). Mapping information in the file is disregarded.
The General options are:
- Discard read names. Read names can be discarded to save disk space without affecting analysis results. Keeping read names can be useful in some circumstances, such as when inspecting sequence list contents or when working downstream with subsets of sequences.
- Discard quality scores. Quality scores are visible in read mappings and are used by various tools, e.g. for variant detection. If quality scores are not relevant, use this option to discard them and reduce disk space and memory consumption. As PacBio quality scores currently contain very little information, we recommend that you discard them. When importing Fasta files, this option is not available, since Fasta files do not contain quality scores.
The PacBio options are:
- Mark as HiFi reads. If checked, the reads will be recognized as PacBio HiFi sequencing reads as opposed to regular allowing tools to apply HiFi specific settings when relevant.
Click Next and choose how the result of the import should be handled. We recommend choosing Save which will save the results directly to the disk.
Mark imported PacBio reads as HiFi
To mark already imported PacBio reads as PacBio HiFi, change the so-called "Read group" value for the imported sequence list:
- Open the sequence list in the View Area.
- Click on the "Show Element Info" icon () found at the bottom of the window.
- Click on Edit next to Read group.
- In the Platform dropdown, change from PACBIO to PACBIO_HIFI.
- Click on OK.