Assemble sequences to reference

This section describes how to assemble a number of sequence reads into a contig using a reference sequence, a process called read mapping. A reference sequence can be particularly helpful when the objective is to characterize SNP variation in the data.

Note! You can assemble a maximum of 10000 sequences at a time.

To assemble more sequences, you need the CLC Genomics Workbench (see http://www.qiagenbioinformatics.com/products/clc-genomics-workbench/).

To start the assembly:

        Toolbox | Sequencing Data Analysis (Image assemblyfolder)| Assemble Sequences to Reference (Image assembletoreference)

This opens a dialog where you can alter your choice of sequences to assemble. If you have already selected sequences in the Navigation Area, these will be shown in Selected Elements, however you can remove these or add others, by using the arrows to move sequences between the Navigation Area and Selected Elements boxes. You can also add sequence lists.

When the sequences are selected, click Next, and you will see the dialog shown in figure 18.7

Image assembletoreferencestep2
Figure 18.7: Parameters for how the reference should be handled when assembling sequences to a reference sequence.

This dialog gives you the following options for assembling:

When the reference sequence has been selected, click Next, to see the dialog shown in figure 18.8

Image assembletoreferencestep3
Figure 18.8: Options for how the input sequences should be aligned and how nucleotide conflicts should be handled.

In this dialog, you can specify the following options:

Click Finish to start the tool. This will start the assembly process. See View and edit contigs on how to use the resulting contigs.