Choosing the BGI/MGI import will open the dialog shown in figure 2.1. This data type can also be imported using the on-the-fly import functionality available in workflows, described at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Launching_workflows_individually_in_batches.html.
The following file formats from the BGI and MGI systems can be imported:
- Fastq (
.fastq/.fq). Quality scores are expected to be in the NCBI/Sanger format (https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Quality_scores_in_Illumina_platform.html). Compressed data in gzip format is also supported (.gz).
The General options to the left are:
- Paired reads. The Workbench will pair up files based on the names of the first read. At the bottom of the dialog, you can choose if read 1 and read 2 are Forward-reverse or Reverse-forward. As an example, you could have a data set with two files where the names of the first reads are
@sample1/2. With Forward-reverse ordering the reads from the file with
@sample1/1are forward and the reads from the file with
@sample1/2are reverse. Note that you can specify the insert sizes when importing paired read data. If you have data sets with different insert sizes, you should import each data set individually in order to be able to specify different insert sizes. Read more about paired data at https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=General_notes_on_handling_paired_data.html.
- Discard read names. For high-throughput sequencing data, the naming of the individual reads is often irrelevant given the huge amount of reads. This option allows you to discard this option to save disk space.
- Discard quality scores. Quality scores are visualized in the mapping view and they are used for SNP detection. If this is not relevant for your work, you can choose to Discard quality scores. One of the benefits from discarding quality scores is that you will gain a lot in terms of reduced disk space usage and memory consumption.