Browse the manual

Introduction to CLC Genomics Workbench
- Contact information
- System requirements
- Licenses
- About CLC Workbenches
- When the program is installed: Getting started
  - Quick start
  - Import of example data
- Plug-ins
- Network configuration
User interface
- View Area
- Zoom and selection in View Area
- Toolbox and Status Bar
- Workspace
- List of shortcuts
Data management and search
- Navigation Area
- Customized attributes on data locations
- Filling in values
  - What happens when the sequence gets outside the data location?
  - Searching
- Local search
User preferences and settings
- General preferences
- Default view preferences
  - Number formatting in tables
  - Import and export Side Panel settings
- Data preferences
- Advanced preferences
  - Default data location
  - NCBI BLAST
- Export/import of preferences
  - The different options for export and importing
- View settings for the Side Panel
  - Floating Side Panel
Printing
- Selecting which part of the view to print
- Page setup
  - Header and footer
- Print preview
Import/export of data and graphics
- Standard import
- Import high-throughput sequencing data
- Import tracks
- Data export
- Export graphics to files
- Export graph data points to a file
- Copy/paste view output
History log
- Element history
  - Sharing data with history
Batching and result handling
- Batch processing
- How to handle results of analyses
  - Table outputs
  - Batch log
Workflows
- Creating a workflow
- Distributing and installing workflows
- Executing a workflow
Viewing and editing sequences
- View sequence
- Circular DNA
  - Using split views to see details of the circular molecule
  - Mark molecule as circular and specify starting point
- Working with annotations
- Element information
- View as text
- Creating a new sequence
- Sequence Lists
Data download
- GenBank search
- UniProt (Swiss-Prot/TrEMBL) search
- Search for structures at NCBI
- Download reference genome
  - Selecting data types for download
- Sequence web info
BLAST search
- Running BLAST searches
- Output from BLAST searches
- Local BLAST databases
- Manage BLAST databases
  - Migrating from a previous version of the Workbench
- Bioinformatics explained: BLAST
3D molecule viewing
- Importing structure files
- Viewing structure files
  - Moving and rotating
- Selections and display of the 3D structure
  - Coloring of the 3D structure
  - Hierarchical view - changing how selections of the structure are displayed
- 3D Output
General sequence analyses
- Shuffle sequence
- Dot plots
  - Create dot plots
  - View dot plots
- Bioinformatics explained: Dot plots
  - Realization of dot plots
  - Examples and interpretations of dot plots
- Bioinformatics explained: Scoring matrices
  - Different scoring matrices
  - Use of scoring matrices
- Local complexity plot
- Sequence statistics
  - Bioinformatics explained: Protein statistics
- Join sequences
- Pattern Discovery
  - Pattern discovery search parameters
  - Pattern search output
- Motif Search
Nucleotide analyses
- Convert DNA to RNA
- Convert RNA to DNA
- Reverse complements of sequences
- Reverse sequence
- Translation of DNA or RNA to protein
  - Translate part of a nucleotide sequence
- Find open reading frames
  - Open reading frame parameters
Protein analyses
- Signal peptide prediction
  - Signal peptide prediction parameter settings
  - Signal peptide prediction output
- Bioinformatics explained: Prediction of signal peptides
- Protein charge
  - Modifying the layout
- Transmembrane helix prediction
- Antigenicity
  - Plot of antigenicity
  - Antigenicity graphs along sequence
- Hydrophobicity
  - Hydrophobicity plot
  - Hydrophobicity graphs along sequence
- Bioinformatics explained: Protein hydrophobicity
  - Hydrophobicity scales
- Pfam domain search
  - Pfam search parameters
  - Download and installation of additional Pfam databases
- Secondary structure prediction
- Protein report
  - Protein report output
- Reverse translation from protein into DNA
  - Reverse translation parameters
- Bioinformatics explained: Reverse translation
- Proteolytic cleavage detection
  - Proteolytic cleavage parameters
- Bioinformatics explained: Proteolytic cleavage
Primers
- Primer design - an introduction
  - General concept
  - Scoring primers
- Setting parameters for primers and probes
  - Primer Parameters
- Graphical display of primer information
  - Compact information mode
  - Detailed information mode
- Output from primer design
- Standard PCR
  - User input
  - Standard PCR output table
- Nested PCR
  - Nested PCR output table
- TaqMan
  - TaqMan output table
- Sequencing primers
  - Sequencing primers output table
- Alignment-based primer and probe design
- Analyze primer properties
- Find binding sites and create fragments
  - Binding parameters
  - Results - binding sites and fragments
- Order primers
Sequencing data analyses
- Importing and viewing trace data
  - Scaling traces
  - Trace settings in the Side Panel
- Trim sequences
  - Manual trimming
  - Automatic trimming
- Assemble sequences
- Assemble to reference sequence
- Add sequences to an existing contig
- View and edit read mappings
- Reassemble contig
- Secondary peak calling
Cloning and cutting
- Molecular cloning
- Gateway cloning
- Restriction site analysis
- Dynamic restriction sites
- Restriction site analysis from the Toolbox
- Gel electrophoresis
- Restriction enzyme lists
  - Create enzyme list
  - View and modify enzyme list
Sequence alignment
- Create an alignment
- View alignments
- Bioinformatics explained: Sequence logo
  - Calculation of sequence logos
- Edit alignments
- Join alignments
  - How alignments are joined
- Pairwise comparison
- Bioinformatics explained: Multiple alignments
  - Use of multiple alignments
  - Constructing multiple alignments
Phylogenetic trees
- Inferring phylogenetic trees
  - Phylogenetic tree parameters
  - Tree View Preferences
- Bioinformatics explained: phylogenetics
RNA structure
- RNA secondary structure prediction
- View and edit secondary structures
- Evaluate structure hypothesis
  - Selecting sequences for evaluation
  - Probabilities
- Structure Scanning Plot
  - Selecting sequences for scanning
  - The structure scanning result
- Bioinformatics explained: RNA structure prediction by minimum free energy minimization
  - The algorithm
  - Structure elements and their energy contribution
Trimming, multiplexing and sequencing quality control
- Trimming
- Multiplexing
  - Sort sequences by name
  - Process tagged sequences
- Sequencing data quality control
  - Report contents
  - Running the quality control tool
- Merge overlapping pairs
  - Using quality scores when merging
  - Report of merged pairs
Tracks
- Track lists
- Retrieving reference data tracks
- Merging tracks
- Converting data to tracks and back
  - Convert to tracks
  - Convert from tracks
- Annotate and filter tracks
- Creating graph tracks
Read mapping
- The read mapper tool
- Mapping reports
  - Detailed mapping report
  - Summary mapping report
- Color space
- Mapping result
- Merge mapping results
- Extract consensus sequence
Resequencing
- Target regions statistics
- Quality-based variant detection
- Probabilistic variant detection
- Variant data
- Detailed information about overlapping paired reads
- Filtering and annotating variants
- Comparing variants
- Predicting functional consequences
Transcriptomics
- RNA-Seq analysis
- Expression profiling by tags
- Small RNA analysis
- Experimental design
- Transformation and normalization
- Quality control
- Statistical analysis - identifying differential expression
- Feature clustering
  - Hierarchical clustering of features
  - K-means/medoids clustering
- Annotation tests
  - Hypergeometric tests on annotations
  - Gene set enrichment analysis
- General plots
De novo sequencing
- De novo assembly
Epigenomics
- ChIP sequencing
Appendix
- Comparison of workbenches
- Use of multi-core computers
- Graph preferences
- Working with tables
  - Filtering tables
- BLAST databases
- Proteolytic cleavage enzymes
- Restriction enzymes database configuration
- Technical information about modifying Gateway cloning sites
- IUPAC codes for amino acids
- IUPAC codes for nucleotides
- Formats for import and export
  - List of bioinformatic data formats
  - List of graphics data formats
- SAM/BAM export format specification
- Microarray data formats
- Translation Tables
- Custom codon frequency tables
- Matrices for alignment calculation
Bibliography

Quality trimming

This opens the dialog displayed in figure 23.1 where you can specify parameters for quality trimming.

Image ngstrimstep2
Figure 23.1: Specifying quality trimming.

The following parameters can be adjusted in the dialog:

Trim using quality scores. If the sequence files contain quality scores from a base-caller algorithm this information can be used for trimming sequence ends. The program uses the modified-Mott trimming algorithm for this purpose (Richard Mott, personal communication):
Quality scores in the Workbench are on a Phred scale in the Workbench (formats using other scales are converted during import). First step in the trim process is to convert the quality score (Q) to error probability: $p_{error} = 10^{\frac{Q}{-10}}$ . (This now means that low values are high quality bases.)
Next, for every base a new value is calculated: $Limit - p_{error}$ . This value will be negative for low quality bases, where the error probability is high.
For every base, the Workbench calculates the running sum of this value. If the sum drops below zero, it is set to zero. The part of the sequence not trimmed will be the region between the first positive value of the running sum and the highest value of the running sum. Everything before and after this region will be trimmed.
A read will be completely removed if the score never makes it above zero.
At http://www.clcbio.com/files/usermanuals/trim.zip you find an example sequence and an Excel sheet showing the calculations done for this particular sequence to illustrate the procedure described above.
Trim ambiguous nucleotides. This option trims the sequence ends based on the presence of ambiguous nucleotides (typically N). Note that the automated sequencer generating the data must be set to output ambiguous nucleotides in order for this option to apply. The algorithm takes as input the maximal number of ambiguous nucleotides allowed in the sequence after trimming. If this maximum is set to e.g. 3, the algorithm finds the maximum length region containing 3 or fewer ambiguities and then trims away the ends not included in this region.