QIAGEN Bioinformatics Manuals

Fasta read files

The Fasta importer is designed for high volumes of read data such as high-throughput sequencing data (NGS reads). When using this import option the read names can be included but the descriptions from the fasta files are ignored. This data type can also be imported using the on-the-fly import functionality available in workflows, described in Launching workflows individually and in batches.

For import of other fasta format data, such as reference sequences, please use theStandard Import () as this import format also includes the descriptions. To have a reference in track format, use the Tracks () option and set the "Type of file to import" to FASTA.

The dialog for importing data in fasta format is shown in figure 7.13.

Image importngsdialog-fasta
Figure 7.13: Importing data in fasta format.

Compressed data in gzip format is also supported (.gz).

The General options to the left are:

Paired reads. For paired import, the Workbench expects the forward reads to be in one file and the reverse reads in another. The Workbench will sort the files before import and then assume that the first and second file belong together, and that the third and fourth file belong together etc. At the bottom of the dialog, you can choose whether the ordering of the files is Forward-reverse or Reverse-forward. As an example, you could have a data set with two files: sample1_fwd containing all the forward reads and sample1_rev containing all the reverse reads. In each file, the reads have to match each other, so that the first read in the fwd list should be paired with the first read in the rev list. Note that you can specify the insert sizes when importing paired read data. If you have data sets with different insert sizes, you should import each data set individually in order to be able to specify different insert sizes. Read more about handling paired data.
Discard read names. For high-throughput sequencing data, the naming of the individual reads is often irrelevant given the huge amount of reads. This option allows you to discard this option to save disk space.
Discard quality scores. This option is not relevant for fasta import, since quality scores are not supported.

Click Next to adjust how to handle the results. We recommend choosing Save in order to save the results directly to a folder, since you probably want to save anyway before proceeding with your analysis. There is an option to put the import data into a separate folder. This can be handy for better organizing subsequent analysis results and for batch processing.

Browse the manual

Fasta read files