Browse the manual

Introduction to CLC Genomics Workbench
- Contact information
- System requirements
  - Limitations on maximum number of cores
- Workbench Licenses
- About CLC Workbenches
- When the program is installed: Getting started
  - Quick start
  - Import of example data
- Plugins
- Network configuration
- Latest improvements
User interface
- View Area
- Zoom and selection in View Area
- Toolbox and Status Bar
- Workspace
- List of shortcuts
Data management and search
- Navigation Area
- Customized attributes on data locations
- Filling in values
  - What happens when a clc object is copied to another data location?
  - Searching
- Local search
User preferences and settings
- General preferences
- Default view preferences
  - Number formatting in tables
  - Import and export Side Panel settings
- Data preferences
- Advanced preferences
  - Default data location
  - NCBI BLAST
- Export/import of preferences
  - The different options for export and import
- View settings for the Side Panel
  - Saving, removing and applying saved settings
Printing
- Selecting which part of the view to print
- Page setup
  - Header and footer
- Print preview
Import/export of data and graphics
- Standard import
- Import tracks
- Import high-throughput sequencing data
- Data export
- Export graphics to files
- Export graph data points to a file
- Copy/paste view output
History log
- Element history
  - Sharing data with history
Batching and result handling
- Batch processing
- How to handle results of analyses
  - Table outputs
  - Batch log
- Working with tables
  - Filtering tables
Workflows
- Creating a workflow
- Distributing and installing workflows
- Executing a workflow
- Open copy of installed workflow
Viewing and editing sequences
- View sequence
- Circular DNA
  - Using split views to see details of the circular molecule
  - Mark molecule as circular and specify starting point
- Working with annotations
- Element information
- View as text
- Sequence Lists
Data download
- GenBank search
- UniProt (Swiss-Prot/TrEMBL) search
- Search for structures at NCBI
- Download reference genome data
  - Selecting data types for download
  - Cytogenetic ideograms
- Sequence web info
BLAST search
- Running BLAST searches
- Output from BLAST searches
- Local BLAST databases
- Manage BLAST databases
  - Migrating from a previous version of the Workbench
- Bioinformatics explained: BLAST
3D Molecule Viewer
- Importing molecule structure files
- Viewing molecular structures in 3D
- Customizing the visualization
  - Visualization styles and colors
  - Project settings
- Snapshots of the molecule visualization
- Tools for linking sequence and structure
- Protein structure alignment
General sequence analyses
- Extract Annotations
- Extract sequences
- Shuffle sequence
- Dot plots
  - Create dot plots
  - View dot plots
- Bioinformatics explained: Dot plots
  - Realization of dot plots
  - Examples and interpretations of dot plots
- Bioinformatics explained: Scoring matrices
  - Different scoring matrices
  - Use of scoring matrices
- Local complexity plot
- Sequence statistics
  - Bioinformatics explained: Protein statistics
- Join sequences
- Pattern discovery
  - Pattern discovery search parameters
  - Pattern search output
- Motif Search
- Create motif list
Nucleotide analyses
- Convert DNA to RNA
- Convert RNA to DNA
- Reverse complements of sequences
- Reverse sequence
- Translation of DNA or RNA to protein
  - Translate part of a nucleotide sequence
- Find open reading frames
  - Open reading frame parameters
Protein analyses
- Signal peptide prediction
  - Signal peptide prediction parameter settings
  - Signal peptide prediction output
- Bioinformatics explained: Prediction of signal peptides
- Protein charge
  - Modifying the layout
- Transmembrane helix prediction
- Antigenicity
  - Plot of antigenicity
  - Antigenicity graphs along sequence
- Hydrophobicity
  - Hydrophobicity plot
  - Hydrophobicity graphs along sequence
- Bioinformatics explained: Protein hydrophobicity
  - Hydrophobicity scales
- Pfam domain search
  - Download of Pfam database
  - Running Pfam Domain Search
- Secondary structure prediction
- Protein report
  - Protein report output
- Reverse translation from protein into DNA
  - Reverse translation parameters
- Bioinformatics explained: Reverse translation
  - The Genetic Code
  - Solving the ambiguities of reverse translation
- Proteolytic cleavage detection
  - Proteolytic cleavage parameters
- Bioinformatics explained: Proteolytic cleavage
Primers
- Primer design - an introduction
  - General concept
  - Scoring primers
- Setting parameters for primers and probes
  - Primer Parameters
- Graphical display of primer information
  - Compact information mode
  - Detailed information mode
- Output from primer design
- Standard PCR
  - User input
  - Standard PCR output table
- Nested PCR
  - Nested PCR output table
- TaqMan
  - TaqMan output table
- Sequencing primers
  - Sequencing primers output table
- Alignment-based primer and probe design
- Analyze primer properties
- Find binding sites and create fragments
  - Binding parameters
  - Results - binding sites and fragments
- Order primers
Sequencing data analyses
- Importing and viewing trace data
  - Scaling traces
  - Trace settings in the Side Panel
- Trim sequences
  - Trimming using the Trim tool
  - Manual trimming
- Assemble sequences
- Sort sequences by name
- Assemble sequences to reference
- Add sequences to an existing contig
- View and edit read mappings
- Reassemble contig
- Secondary peak calling
Cloning and cutting
- Molecular cloning
- Gateway cloning
- Restriction site analysis
  - Dynamic restriction sites
  - Restriction site analysis from the Toolbox
- Gel electrophoresis
- Restriction enzyme lists
  - Create enzyme list
  - View and modify enzyme list
Sequence alignment
- Create an alignment
- View alignments
- Bioinformatics explained: Sequence logo
  - Calculation of sequence logos
- Edit alignments
- Join alignments
  - How alignments are joined
- Pairwise comparison
- Bioinformatics explained: Multiple alignments
  - Use of multiple alignments
  - Constructing multiple alignments
Phylogenetic trees
- Phylogenetic tree features
- Create Trees
- Tree Settings
- Metadata and phylogenetic trees
RNA structure
- RNA secondary structure prediction
- View and edit secondary structures
- Evaluate structure hypothesis
  - Selecting sequences for evaluation
  - Probabilities
- Structure scanning plot
  - Selecting sequences for scanning
  - The structure scanning result
- Bioinformatics explained: RNA structure prediction by minimum free energy minimization
  - The algorithm
  - Structure elements and their energy contribution
Trimming, multiplexing and sequencing quality control
- Trim Sequences
- Demultiplex reads
- Sequencing data quality control
- Merge overlapping pairs
  - Using quality scores when merging
  - Report of merged pairs
Tracks
- Track lists
- Retrieving reference data tracks
- Merging tracks
- Converting data to tracks and back
  - Convert to tracks
  - Convert from tracks
- Annotate and filter tracks
- Creating graph tracks
Read mapping
- Map Reads to Reference
- Mapping output options
- Mapping reports
  - Detailed mapping report
  - Summary mapping report
- Color space
- Mapping result
- Local realignment
- Merge mapping results
- Remove duplicate mapped reads
  - Algorithm details and parameters
  - Running the duplicate reads removal
- Extract consensus sequence
- Coverage analysis
  - Running the Coverage analysis tool
Sample reads
- What is Sample Reads?
- How to run Sample Reads
Resequencing
- Create Statistics for Target Regions
- Variant Detectors - overview
  - Differences among the variants called by the three variant callers
  - How the variant detectors work
- Basic Variant Detection
- Fixed Ploidy Variant Detection
  - Ploidy and sensitivity
- Low Frequency Variant Detection
- Variant Detectors - error model estimation
- Variant Detectors - filters
  - General filters
  - Noise filters
- Variant Detectors - the outputs
- The Fixed Ploidy and Low Frequency variant callers: detailed descriptions
  - The Fixed Ploidy Variant Caller: Models and methods
  - The Low Frequency Variant caller: Models and methods
- InDels and Structural Variants
- Variant data
- Detailed information about overlapping paired reads
- Annotate and filter variants
- Comparing variants
- Predicting functional consequences
Transcriptomics
- RNA-Seq analysis
- Small RNA analysis
- Expression profiling by tags
- Experimental design
- Working with tracks and experiments
- Transformation and normalization
- Quality control
- Statistical analysis - identifying differential expression
- Feature clustering
  - Hierarchical clustering of features
  - K-means/medoids clustering
- Annotation tests
  - Hypergeometric tests on annotations
  - Gene set enrichment analysis
- General plots
De novo sequencing
- De novo assembly
- Map reads to contigs
Epigenomics
- ChIP-Seq Analysis
- Annotate with nearby gene information
Legacy tools
- Quality-based variant detection
- Probabilistic variant detection
- ChIP-Seq Analysis (legacy)
Appendix
- Comparison of workbenches
- Use of multi-core computers
- Graph preferences
- BLAST databases
- Proteolytic cleavage enzymes
- Restriction enzymes database configuration
- Technical information about modifying Gateway cloning sites
- IUPAC codes for amino acids
- IUPAC codes for nucleotides
- Formats for import and export
  - List of bioinformatic data formats
  - List of graphics data formats
- SAM/BAM export format specification
  - Flags
- Gene expression annotation files and microarray data formats
- Translation Tables
- Custom codon frequency tables
- Comparison of track comparison tools
- Matrices for alignment calculation
Bibliography

Calculation of the prior and error probabilities

The prior probabilities are estimated using only the mapped reads through four rounds of Expectation Maximization and are calculated for each potential combination of alleles (site types). Thus, the prior probabilities reflect the likelihood of observing each combination of alleles in the genome studied. The reference sequence is not taken into account during the first part of the analysis. More about the Maximum Likelihood estimation (MLE) can be found at http://en.wikipedia.org/wiki/Maximum_likelihood.

For a diploid organism, the initial parameters for the priors, which are then updated, are shown in Table 31.1. The sum of the probabilities for all site types is always 1.

Table 31.1: Site Types for a diploid organism with example probabilities.

Site Type	Prior probability
A/A	0.2475
A/C	0.001
A/G	0.001
A/T	0.001
T/C	0.001
T/G	0.001
T/T	0.2475
G/C	0.001
C/C	0.2475
G/G	0.2475
G/-	0.001
A/-	0.001
C/-	0.001
T/-	0.001

If the expected ploidy level is set to 1, analogous values to table 31.1 are calculated. Here, only the values for the homozygous site types like A, C, G, T and - would be calculated.

If the expected ploidy is set to 3, the analogous values are calculated, which here would be values for site types like A|A|A, A|C|G, G|G|- and so on.

Error probabilities are calculated alongside the priors for each observed allele and assumed reference allele, before the reference sequence is incorporated into the analysis. Table 31.2 illustrates an example of the values calculated in an error probability matrix.

Table 31.2: Error probability matrix - observed sequenced nucleotide in read versus actual nucleotide at this position.

	A	C	G	T	-
A	0.90	0.025	0.025	0.025	0.025
C	0.025	0.90	0.025	0.025	0.025
G	0.025	0.025	0.90	0.025	0.025
T	0.025	0.025	0.025	0.90	0.025
-	0.025	0.025	0.025	0.025	0.90

If quality values are available, an error matrix is calculated for each quality value.