Assemble sequences to reference

This section describes how to assemble a number of sequence reads into a contig using a reference sequence. A reference sequence can be particularly helpful when the objective is to characterize SNP variation in the data.

To start the assembly:

        select sequences to assemble | Toolbox in the Menu Bar | Sequencing Data Analysis (Image assemblyfolder)| Assemble Sequences to Reference (Image assembletoreference)

This opens a dialog where you can alter your choice of sequences that you wish to assemble. You can also add sequence lists.

Note! You can assemble a maximum of 2000 sequences at a time.

To assemble more sequences, you need the CLC Genomics Workbench (see http://www.clcbio.com/genomics).

When the sequences are selected, click Next, and you will see the dialog shown in figure 18.17

Image assembletoreferencestep2
Figure 18.17: Parameters for how the reference should be handled when assembling sequences to a reference sequence.

This dialog gives you the following options for assembling:

When the reference sequence has been selected, click Next, to see the dialog shown in figure 18.18

Image assembletoreferencestep3
Figure 18.18: Options for how the input sequences should be aligned and how nucleotide conflicts should be handled.

In this dialog, you can specify the following options:

Click Next if you wish to adjust how to handle the results. If not, click Finish. This will start the assembly process. See View and edit contigs on how to use the resulting contigs.