Reverse translation from protein into DNA

A protein sequence can be back-translated into DNA using CLC Main Workbench. Due to degeneracy of the genetic code every amino acid could translate into several different codons (only 20 amino acids but 64 different codons). Thus, the program offers a number of choices for determining which codons should be used. These choices are explained in this section. For background information see Bioinformatics explained: Reverse translation.

In order to make a reverse translation:

        Toolbox | Protein Analysis (Image proteinanalyses)| Reverse Translate (Image reversetranslate)

This opens the dialog displayed in figure 20.24:

Image reversetranslation_selectsequences
Figure 20.24: Choosing a protein sequence for reverse translation.

If a sequence was selected before choosing the Toolbox action, the sequence is now listed in the Selected Elements window of the dialog. Use the arrows to add or remove sequences or sequence lists from the selected elements. You can translate several protein sequences at a time.

Adjust the parameters for the translation in the dialog shown in figure 20.25.

Image reversetranslation_setparameters
Figure 20.25: Choosing parameters for the reverse translation.

The Codon Frequency Table is used to determine the frequencies of the codons. Select a frequency table from the list that fits the organism you are working with. A translation table of an organism is created on the basis of counting all the codons in the coding sequences. Every codon in a Codon Frequency Table has its own count, frequency (per thousand) and fraction which are calculated in accordance with the occurrences of the codon in the organism. The tables provided were made using Codon Usage database http://www.kazusa.or.jp/codon/ that was built on The NCBI-GenBank Flat File Release 160.0 [June 15 2007]. You can customize the list of codon frequency tables for your installation, see Custom codon frequency tables.

Click Finish to start the tool. The newly created nucleotide sequence is shown, and if the analysis was performed on several protein sequences, there will be a corresponding number of views of nucleotide sequences.


Subsections