Sequence statistics

CLC Sequence Viewer can produce an output with many relevant statistics for protein sequences. Some of the statistics are also relevant to produce for DNA sequences. Therefore, this section deals with both types of statistics. The required steps for producing the statistics are the same.

To create a statistic for the sequence, do the following:

        Toolbox | General Sequence Analysis (Image generalsequenceanalyses)| Create Sequence Statistics (Image proteinreport)

Select one or more sequence(s) or/and one or more sequence list(s). Note! You cannot create statistics for DNA and protein sequences at the same time, they must be run separately.

Next (figure 9.4), the dialog offers to adjust the following parameters:

.

Image statisticsstep2
Figure 9.4: Setting parameters for the sequence statistics.

You can also choose to include Background distribution of amino acids. If this box is ticked, an extra column with amino acid distribution of the chosen species, is included in the table output. (The distributions are calculated from UniProt www.uniprot.org  version 6.0, dated September 13 2005.)

Click Finish to start the tool. An example of protein sequence statisticsis shown in figure 9.5.

Image statisticsoutput
Figure 9.5: Example of protein sequence statistics.

Nucleotide sequence statistics are generated using the same dialog as used for protein sequence statistics. However, the output of Nucleotide sequence statistics is less extensive than that of the protein sequence statistics.

Note! The headings of the tables change depending on whether you calculate 'individual' or 'comparative' sequence statistics.

The output of protein sequence statistics includes:

The output of nucleotide sequence statistics include:

If nucleotide sequences are used as input, and these are annotated with CDS, a section on Codon statistics for Coding Regions is included.



Subsections