PacBio Long Reads
The PacBio Long Reads importer is designed to import long reads generated by PacBio sequencing technology.
To launch the PacBio Long Reads importer, go to:
Import () | PacBio () | PacBio Long Reads ().
This opens a dialog where files can be selected and import options specified (figure 7.11).
Figure 7.11: Importing data from PacBio.
The following file formats are supported:
- H5 files (.bas.h5/.bax.h5) which contain one of two things. .bas.h5 files produced by instruments prior to PacBio RS II contain sequencing data such as reads and quality scores. .bas.h5 files from more recent PacBio instruments contain a list of .bax.h5 files where the actual sequencing data is stored. When importing H5 files, the user needs to select both the .bas.h5 file and all the accompanying .bax.h5 files belonging to a data set.
- Fastq (
.fastq/.fq
). Uncompressed files as well as files compressed using gzip (.gz), zip (.zip) or bzip2 (.bz2) can be provided as input. Quality scores are expected to be in the NCBI/Sanger format, see Quality scores in the Illumina platform. The importer processes UMI information from the fastq read headers, see General notes on UMIs. - Fasta files (.fasta) which contain sequence data. Compressed Fasta (.fasta.gz) files are also supported.
- SAM or BAM files (.sam/.bam). Mapping information in the file is disregarded.
The General options are:
- Discard read names. Read names can be discarded to save disk space without affecting analysis results. Keeping read names can be useful in some circumstances, such as when inspecting sequence list contents or when working downstream with subsets of sequences.
- Discard quality scores. Quality scores are visible in read mappings and are used by various tools, e.g. for variant detection. If quality scores are not relevant, use this option to discard them and reduce disk space and memory consumption. As PacBio quality scores currently contain very little information, we recommend that you discard them. When importing Fasta files, this option is not available, since Fasta files do not contain quality scores.
The PacBio options are:
- Mark as HiFi reads. If checked, the reads will be recognized as PacBio HiFi sequencing reads instead of regular reads, enabling tools to apply HiFi specific settings when relevant. If importing SAM/BAM files where the Platform Model is set to HIFI (PM:HIFI), the reads will be imported as HiFi reads, regardless of whether Mark as HiFi reads is checked or not.
Mark imported PacBio reads as HiFi
To mark already imported PacBio reads as PacBio HiFi, change the Read group value:
- Open the sequence list in the View Area.
- Click the "Show Element Info" icon () at the bottom of the window.
- Click Edit next to Read group.
- In the Platform dropdown, change from PACBIO to PACBIO_HIFI.
- Click OK.