Browse the manual

Introduction to CLC Genomics Workbench
- Contact information
- Download and installation
- System requirements
  - Limitations on maximum number of cores
- Workbench Licenses
- Plugins
- Network configuration
- CLC Server connection
- Getting started and latest improvements
User interface
- View Area
- Zoom and selection in View Area
- Toolbox and Status Bar
- Workspace
- List of shortcuts
Data management and search
- Navigation Area
- Metadata
- Working with tables
  - Filtering tables
- Customized attributes on data locations
- Local search
  - Quick search
  - Advanced search
User preferences and settings
- General preferences
- View preferences
  - Import and export Side Panel settings
- Data preferences
- Advanced preferences
- Export/import of preferences
- View settings for the Side Panel
Printing
- Selecting which part of the view to print
- Page setup
- Print preview
Import/export of data and graphics
- Standard import
  - External files
- Import tracks
  - GFF3 format
- Import high-throughput sequencing data
- Import RNA spike-in controls
- Import Primers
  - Import Primer Pairs
- Data export
- Export graphics to files
  - File formats
- Export graph data points to a file
- CLC Server data import and export
- Copy/paste view output
Data download
- Search for Sequences at NCBI
  - NCBI search options
  - Handling of NCBI search results
- Search for PDB Structures at NCBI
- Search for Sequences in UniProt (Swiss-Prot/TrEMBL)
- SRA search
- Sequence web info
References management
- Download Genomes
- QIAGEN Sets
- Custom Sets
- Imported Data
  - Exporting reference data outside of the Reference Data Manager framework
Running tools, handling results and batching
- Running tools
- Handling results
  - Running a tool on a CLC Server
- Batch processing
Workflows
- Creating a workflow
- Distributing and installing workflows
- Executing a workflow
- Open copy of installed workflow
- Batch launching workflows with multiple inputs
Viewing and editing sequences
- View sequence
- Circular DNA
  - Using split views to see details of the circular molecule
  - Mark molecule as circular and specify starting point
- Working with annotations
- Element information
- View as text
- Sequence Lists
BLAST search
- Running BLAST searches
  - BLAST at NCBI
  - BLAST against local data
- Output from BLAST searches
- Local BLAST databases
- Manage BLAST databases
- Bioinformatics explained: BLAST
3D Molecule Viewer
- Importing molecule structure files
- Viewing molecular structures in 3D
  - Updating old structure files
- Customizing the visualization
  - Visualization styles and colors
  - Project settings
- Tools for linking sequence and structure
- Protein structure alignment
General sequence analyses
- Extract Annotations
- Extract sequences
- Shuffle sequence
- Dot plots
- Local complexity plot
- Sequence statistics
  - Bioinformatics explained: Protein statistics
- Join sequences
- Pattern discovery
  - Pattern discovery search parameters
  - Pattern search output
- Motif Search
- Create motif list
Nucleotide analyses
- Convert DNA to RNA
- Convert RNA to DNA
- Reverse complements of sequences
- Reverse sequence
- Translation of DNA or RNA to protein
- Find open reading frames
  - Open reading frame parameters
Protein analyses
- Protein charge
- Antigenicity
- Hydrophobicity
  - Hydrophobicity graphs along sequence
  - Bioinformatics explained: Protein hydrophobicity
- Pfam domain search
  - Download of Pfam database
  - Running Pfam Domain Search
- Secondary structure prediction
- Protein report
- Reverse translation from protein into DNA
  - Bioinformatics explained: Reverse translation
- Proteolytic cleavage detection
  - Bioinformatics explained: Proteolytic cleavage
Primers
- Primer design - an introduction
  - General concept
  - Scoring primers
- Setting parameters for primers and probes
  - Primer Parameters
- Graphical display of primer information
  - Compact information mode
  - Detailed information mode
- Output from primer design
- Standard PCR
- Nested PCR
- TaqMan
- Sequencing primers
- Alignment-based primer and probe design
- Analyze primer properties
- Find binding sites and create fragments
  - Binding parameters
  - Results - binding sites and fragments
- Order primers
Sequencing data analyses
- Importing and viewing trace data
  - Trace settings in the Side Panel
- Trim sequences
  - Trimming using the Trim tool
  - Manual trimming
- Assemble sequences
- Assemble sequences to reference
- Sort sequences by name
- Add sequences to an existing contig
- View and edit contigs and read mappings
- Reassemble contig
- Secondary peak calling
Cutting and cloning
- Restriction site analyses
  - Dynamic restriction sites
  - Restriction Site Analysis
- Restriction enzyme lists
- Molecular cloning
- Gateway cloning
- Gel electrophoresis
  - Gel view
Sequence alignment
- Create an alignment
- View alignments
  - Bioinformatics explained: Sequence logo
- Edit alignments
  - Realignment
- Join alignments
- Pairwise comparison
  - The pairwise comparison table
  - Bioinformatics explained: Multiple alignments
Phylogenetic trees
- K-mer Based Tree Construction
- Create tree
- Model Testing
- Maximum Likelihood Phylogeny
  - Bioinformatics explained
- Tree Settings
- Metadata and phylogenetic trees
RNA structure
- RNA secondary structure prediction
- View and edit secondary structures
- Evaluate structure hypothesis
  - Selecting sequences for evaluation
  - Probabilities
- Structure scanning plot
  - Selecting sequences for scanning
  - The structure scanning result
- Bioinformatics explained: RNA structure prediction by minimum free energy minimization
  - The algorithm
  - Structure elements and their energy contribution
Tracks
- Track types
  - Visualizing, zooming and navigating tracks
  - Showing a track in a table
- Track lists
- Retrieving reference data tracks
- Merge Annotation Tracks
- Track Conversion
  - Convert to Tracks
  - Convert from Tracks
- Annotate and Filter
- Graphs
Prepare Sequencing Data
- QC for Sequencing Reads
- Trim Reads
- Demultiplex Reads
  - An example using Illumina barcoded sequences
Read mapping
- Map Reads to Reference
- Reads tracks and stand-alone read mappings
- Local Realignment
- Merge Read Mappings
- Remove Duplicate Mapped Reads
  - Algorithm details and parameters
  - Running the duplicate reads removal
- Extract Consensus Sequence
Variant detection
- Variant Detection tools
- Fixed Ploidy Variant Detection
  - Ploidy and sensitivity
- Low Frequency Variant Detection
- Basic Variant Detection
- Variant Detection - error model estimation
- Variant Detection - filters
  - General filters
  - Noise filters
- Variant Detection - the outputs
- Fixed Ploidy and Low Frequency Detection tools: detailed descriptions
  - The Fixed Ploidy Variant Detection tool: Models and methods
  - The Low Frequency Variant Detection tool: Models and methods
- Variant data
- Copy Number Variant Detection
- Identify Known Mutations from Sample Mappings
  - Run the Identify Known Mutations from Sample Mappings tool
  - Output from the Identify Known Mutations from Sample Mappings tool
- InDels and Structural Variants
Resequencing analysis
- Quality Control for resequencing analyses
- Variant filtering
- Variant annotation
- Variants Comparison
- Functional consequences
RNA-seq
- RNA-Seq Analysis
- Create Combined RNA-Seq Report
- PCA for RNA-Seq
  - Principal component analysis plot (2D)
  - Principal component analysis plot (3D)
- Differential Expression
- Create Heat Map for RNA-Seq
  - Clustering of features and samples
  - The heat map view
- Create Expression Browser
  - The expression browser
- Create Venn Diagram for RNA-Seq
  - Venn diagram table view
- Gene Set Test
Microarray and Small RNA analysis
- Small RNA analysis
- Experimental design
- Transformation and normalization
- Quality control
- Statistical analysis - identifying differential expression
- Feature clustering
  - Hierarchical clustering of features
  - K-means/medoids clustering
- Annotation tests
  - Hypergeometric tests on annotations
  - Gene set enrichment analysis
- General plots
De Novo sequencing
- The CLC de novo assembly algorithm
- De Novo Assembly
- Map Reads to Contigs
Epigenomics analysis
- ChIP-Seq Analysis
- Annotate with nearby gene information
- Bisulfite Sequencing
Utility tools
- Batch Rename
- Extract Annotations
- Sample reads
- Extract Reads
Legacy tools
- Compare Sample Variant Tracks
- Download Reference Genome Data
- Identify Differentially Expressed Gene Groups and Pathways
- Merge Overlapping Pairs
- Add Information from Overlapping Genes
- Import Roche 454
- Create Fold Change Track
- Add Fold Changes
- Create Track from Experiment
Appendix
- Graph preferences
- BLAST databases
- Proteolytic cleavage enzymes
- Restriction enzymes database configuration
- Technical information about modifying Gateway cloning sites
- IUPAC codes for amino acids
- IUPAC codes for nucleotides
- Complex variant representations and VCF reference overlap
  - Import of complex variants with reference overlap
- Formats for import and export
  - List of bioinformatic data formats
  - List of graphics data formats
- SAM/BAM export format specification
  - Flags
- Gene expression annotation files and microarray data formats
- Translation Tables
- Custom codon frequency tables
- Comparison of track comparison tools
- Matrices for alignment calculation
Bibliography

Quality trimming

This opens the dialog displayed in figure 24.2 where you can specify parameters for quality trimming.

Image ngstrimstep2
Figure 24.2: Specifying quality trimming.

The following parameters can be adjusted in the dialog:

Trim using quality scores. If the sequence files contain quality scores from a base caller algorithm this information can be used for trimming sequence ends. The program uses the modified-Mott trimming algorithm for this purpose (Richard Mott, personal communication):
Quality scores in the Workbench are on a Phred scale, and formats using other scales will be converted during import. The Phred quality scores (Q), defined as: , where P is the base-calling error probability, can then be used to calculate the error probabilities, which in turn can be used to set the limit for, which bases should be trimmed.
Hence, the first step in the trim process is to convert the quality score (Q) to an error probability: $p_{error} = 10^{\frac{Q}{-10}}$ . (This now means that low values are high quality bases.)
Next, for every base a new value is calculated: $Limit - p_{error}$ . This value will be negative for low quality bases, where the error probability is high.
For every base, the Workbench calculates the running sum of this value. If the sum drops below zero, it is set to zero. The part of the sequence not trimmed will be the region ending at the highest value of the running sum and starting at the last zero value before this highest score. Everything before and after this region will be trimmed. A read will be completely removed if the score never makes it above zero.
Trim ambiguous nucleotides. This option trims the sequence ends based on the presence of ambiguous nucleotides (typically N). Note that the automated sequencer generating the data must be set to output ambiguous nucleotides in order for this option to apply. The algorithm takes as input the maximal number of ambiguous nucleotides allowed in the sequence after trimming. If this maximum is set to e.g. 3, the algorithm finds the maximum length region containing 3 or fewer ambiguities and then trims away the ends not included in this region. The "Trim ambiguous nucleotides" option trims all types of ambiguous nucleotides (see IUPAC codes for nucleotides).