Assemble sequences to reference

This section describes how to assemble a number of sequence reads into a contig using a reference sequence. A reference sequence can be particularly helpful when the objective is to characterize SNP variation in the data.

To start the assembly:

        Toolbox | Sequencing Data Analysis (Image assemblyfolder)| Assemble Sequences to Reference (Image assembletoreference)

This opens a dialog where you can alter your choice of sequences to assemble. If you have already selected sequences in the Navigation Area, these will be shown in Selected Elements, however you can remove these or add others, by using the arrows to move sequences between the Navigation Area and Selected Elements boxes. You can also add sequence lists.

Note! You can assemble a maximum of 2000 sequences at a time.

To assemble more sequences, please use the Map Reads to Reference (Image read_mapping_16_n_p) under NGS Core Tools (Image ngs_folder_open_16_n_p) in the Toolbox.

To assemble more sequences, you need the CLC Genomics Workbench (see http://www.clcbio.com/genomics).

When the sequences are selected, click Next, and you will see the dialog shown in figure 31.9

Image assembletoreferencestep2
Figure 31.9: Parameters for how the reference should be handled when assembling sequences to a reference sequence.

This dialog gives you the following options for assembling:

When the reference sequence has been selected, click Next, to see the dialog shown in figure 31.10

Image assembletoreferencestep3
Figure 31.10: Options for how the input sequences should be aligned and how nucleotide conflicts should be handled.

In this dialog, you can specify the following options:

Click Next if you wish to adjust how to handle the results. If not, click Finish. This will start the assembly process. See View and edit contigs on how to use the resulting contigs.