Tabular mapping files

The CLC Genomics Workbench supports import and export of files in tabular format such as Eland files coming from the Illumina Pipeline. The importer is quite flexible which means that it can be used to import any kind of mapping file in a tab-delimited format where each line in the file represents one read.

The idea behind the importer is that you import the mapping file which includes all the reads and then you specify one or more reference sequences which have already been imported into the Workbench. The Workbench will then combine the two to create mapping results (Image contig) or mapping tables (Image multicontig). To import a tabular mapping file:

        File | Import High-Throughput Sequencing Data (Image ngs_import) | Tabular Mapping Files (Image ngs_assembly_import)

This will open a dialog where you choose the reference sequences to be used as shown in figure 6.16.

Image importngsdialog-eland-step1
Figure 6.16: Defining reference sequences.

Select one or more reference sequence. Note that the name of your reference sequence has to match the reference name specified in the file. Click Next.

Image importngsdialog-eland-step2
Figure 6.17: Defining reference sequences.

In this dialog, select (Image browse) one or more tab delimited files as shown in figure 6.17.

Once the tab delimited file has been selected, you have to specify the following information:

Note that the Workbench looks in the first line of the file to provide a preview when filling in this information.

Click Next to adjust how to handle the results. We recommend choosing Save in order to save the results directly to a folder, since you probably want to save anyway before proceeding with your analysis.

Note that this import operation is very memory-consuming for large data sets.