Browse the manual

Introduction to CLC Main Workbench
- Contact information
- Download and installation
- System requirements
- Workbench Licenses
- When the program is installed: Getting started
  - Quick start
  - Import of example data
- Plugins
- Network configuration
- Latest improvements
User interface
- View Area
- Zoom and selection in View Area
- Toolbox and Status Bar
- Workspace
- List of shortcuts
Data management and search
- Navigation Area
- Metadata
- Working with tables
  - Filtering tables
- Customized attributes on data locations
- Local search
User preferences and settings
- General preferences
- View preferences
  - Import and export Side Panel settings
- Data preferences
- Advanced preferences
- Export/import of preferences
  - The different options for export and import
- View settings for the Side Panel
  - Saving, removing and applying saved settings
Printing
- Selecting which part of the view to print
- Page setup
- Print preview
Import/export of data and graphics
- Standard import
- Data export
- Export graphics to files
- Export graph data points to a file
- Copy/paste view output
Data download
- GenBank search
- Search for structures at NCBI
- UniProt (Swiss-Prot/TrEMBL) search
- Sequence web info
Running tools, handling results and batching
- Running tools
- Handling results
- Batch processing
Workflows
- Creating a workflow
- Distributing and installing workflows
- Executing a workflow
- Open copy of installed workflow
Other data types
- Tracks
Viewing and editing sequences
- View sequence
- Circular DNA
  - Using split views to see details of the circular molecule
  - Mark molecule as circular and specify starting point
- Working with annotations
- Element information
- View as text
- Sequence Lists
3D Molecule Viewer
- Importing molecule structure files
- Viewing molecular structures in 3D
- Customizing the visualization
  - Visualization styles and colors
  - Project settings
- Snapshots of the molecule visualization
- Tools for linking sequence and structure
- Protein structure alignment
Sequence alignment
- Create an alignment
- View alignments
- Bioinformatics explained: Sequence logo
  - Calculation of sequence logos
- Edit alignments
- Join alignments
  - How alignments are joined
- Pairwise comparison
- Bioinformatics explained: Multiple alignments
  - Use of multiple alignments
  - Constructing multiple alignments
Phylogenetic trees
- Phylogenetic tree features
- Create Trees
- Tree Settings
- Metadata and phylogenetic trees
General sequence analyses
- Extract Annotations
- Extract sequences
- Shuffle sequence
- Dot plots
  - Create dot plots
  - View dot plots
- Bioinformatics explained: Dot plots
  - Realization of dot plots
  - Examples and interpretations of dot plots
- Bioinformatics explained: Scoring matrices
  - Different scoring matrices
  - Use of scoring matrices
- Local complexity plot
- Sequence statistics
  - Bioinformatics explained: Protein statistics
- Join sequences
- Pattern discovery
  - Pattern discovery search parameters
  - Pattern search output
- Motif Search
- Create motif list
Nucleotide analyses
- Convert DNA to RNA
- Convert RNA to DNA
- Reverse complements of sequences
- Reverse sequence
- Translation of DNA or RNA to protein
  - Translate part of a nucleotide sequence
- Find open reading frames
  - Open reading frame parameters
Protein analyses
- Signal peptide prediction
- Protein charge
  - Modifying the layout
- Transmembrane helix prediction
- Antigenicity
  - Plot of antigenicity
  - Antigenicity graphs along sequence
- Hydrophobicity
  - Hydrophobicity plot
  - Hydrophobicity graphs along sequence
- Bioinformatics explained: Protein hydrophobicity
  - Hydrophobicity scales
- Pfam domain search
  - Download of Pfam database
  - Running Pfam Domain Search
- Secondary structure prediction
- Protein report
  - Protein report output
- Reverse translation from protein into DNA
  - Reverse translation parameters
- Bioinformatics explained: Reverse translation
  - The Genetic Code
  - Solving the ambiguities of reverse translation
- Proteolytic cleavage detection
  - Proteolytic cleavage parameters
- Bioinformatics explained: Proteolytic cleavage
Sequencing data analyses and Assembly
- Importing and viewing trace data
  - Scaling traces
  - Trace settings in the Side Panel
- Trim sequences
  - Trimming using the Trim tool
  - Manual trimming
- Assemble sequences
- Assemble sequences to reference
- Sort sequences by name
- Add sequences to an existing contig
- View and edit contigs and read mappings
- Reassemble contig
- Secondary peak calling
Primers and probes
- Primer design - an introduction
  - General concept
  - Scoring primers
- Setting parameters for primers and probes
  - Primer Parameters
- Graphical display of primer information
  - Compact information mode
  - Detailed information mode
- Output from primer design
- Standard PCR
  - User input
  - Standard PCR output table
- Nested PCR
  - Nested PCR output table
- TaqMan
  - TaqMan output table
- Sequencing primers
  - Sequencing primers output table
- Alignment-based primer and probe design
- Analyze primer properties
- Find binding sites and create fragments
  - Binding parameters
  - Results - binding sites and fragments
- Order primers
Cloning and restriction sites
- Molecular cloning
- Gateway cloning
- Restriction site analysis
  - Dynamic restriction sites
  - Restriction site analysis from the Toolbox
- Gel electrophoresis
- Restriction enzyme lists
  - Create enzyme list
  - View and modify enzyme list
RNA structure
- RNA secondary structure prediction
- View and edit secondary structures
- Evaluate structure hypothesis
  - Selecting sequences for evaluation
  - Probabilities
- Structure scanning plot
  - Selecting sequences for scanning
  - The structure scanning result
- Bioinformatics explained: RNA structure prediction by minimum free energy minimization
  - The algorithm
  - Structure elements and their energy contribution
Expression analysis
- Experimental design
- Working with tracks and experiments
  - Data structures for transcriptomics
- Transformation and normalization
- Quality control
- Statistical analysis - identifying differential expression
- Feature clustering
  - Hierarchical clustering of features
  - K-means/medoids clustering
- Annotation tests
  - Hypergeometric tests on annotations
  - Gene set enrichment analysis
- General plots
BLAST search
- Running BLAST searches
- Output from BLAST searches
- Extract consensus sequence
- Local BLAST databases
- Manage BLAST databases
- Bioinformatics explained: BLAST
Appendix
- Graph preferences
- BLAST databases
- Proteolytic cleavage enzymes
- Restriction enzymes database configuration
- Technical information about modifying Gateway cloning sites
- IUPAC codes for amino acids
- IUPAC codes for nucleotides
- Formats for import and export
  - List of bioinformatic data formats
  - List of graphics data formats
- Gene expression annotation files and microarray data formats
- Translation Tables
- Custom codon frequency tables
- Matrices for alignment calculation
Bibliography

Calculation of sequence logos

A comprehensive walk-through of the calculation of the information content in sequence logos is beyond the scope of this document but can be found in the original paper by [Schneider and Stephens, 1990]. Nevertheless, the conservation of every position is defined as $R_{seq}$ which is the difference between the maximal entropy ( $S_{max}$ ) and the observed entropy for the residue distribution ( $S_{obs}$ ),

$\displaystyle R_{seq}=S_{max}-S_{obs}=\log_2N-\bigg(-\sum_{n=1}^Np_n\log_2p_n\bigg)$

is the observed frequency of a amino acid residue or nucleotide of symbol at a particular position and is the number of distinct symbols for the sequence alphabet, either 20 for proteins or four for DNA/RNA. This means that the maximal sequence information content per position is $\log_2 4=2 bits$ for DNA/RNA and $\log_2 20 \approx 4.32 bits$ for proteins.

The original implementation by Schneider does not handle sequence gaps.

We have slightly modified the algorithm so an estimated logo is presented in areas with sequence gaps.

If amino acid residues or nucleotides of one sequence are found in an area containing gaps, we have chosen to show the particular residue as the fraction of the sequences. Example; if one position in the alignment contain 9 gaps and only one alanine (A) the A represented in the logo has a hight of 0.1.

Other useful resources
The website of Tom Schneider
http://www-lmmb.ncifcrf.gov/~toms/

WebLogo
http://weblogo.berkeley.edu/

[Crooks et al., 2004]