Browse the manual

Introduction to CLC Genomics Workbench
- Contact information and citation
- System requirements
- Installation and startup
- Workbench Licenses
- Plugins
- Network configuration
User interface
- View Area
- Zoom functionality in the View Area
- Toolbox panel
- Processes tab and Status bar
- Element Info view
- History view
- Workspace
- List of shortcuts
Data management and search
- Navigation Area
- Working with non-CLC format files
- Customized attributes on data locations
- Searching for data in CLC Locations
  - Quick Search
  - Local Search
- Backing up data from the CLC Workbench
- Working with AWS S3 using the Remote Files tab
User preferences and settings
- General preferences
- View preferences
- Data preferences
- Advanced preferences
- Export/import of preferences
- Side Panel view settings
Printing
- Selecting which part of the view to print
- Page setup
- Print preview
Connections to other systems
- CLC Server connection
  - CLC Server data import and export
- AWS Connections
- Public S3 buckets
Importing data
- Standard import
- Import tracks
  - GFF3 format
  - VCF import
- Import NGS Reads
- Import other high-throughput sequencing data
- Import RNA spike-in controls
- Import Primer Pairs
Exporting data and graphics
- Data export
- Export graphics to files
  - File formats
- Export graph data points to a file
- Copying and pasting data from an open view
Working with tables
- Table view settings and column ordering
- Filtering tables
  - Simple filtering
  - Advanced filtering
Working with reports
Data download
- Search for Sequences at NCBI
  - NCBI search options
  - Handling of NCBI search results
- Search for PDB Structures at NCBI
- Search for Sequences in UniProt (Swiss-Prot/TrEMBL)
  - UniProt search options
  - Handling of UniProt search results
- Search for Reads in SRA
- Sequence web info
References management
- Download Genomes
- QIAGEN Sets
- Reference Data Sets and defining Custom Sets
- Storing, managing and moving reference data
  - Imported Data
  - Exporting reference data outside of the Reference Data Manager framework
Running tools, handling results and batching
- Running tools
  - Running a tool on a CLC Server
- Handling results
- Batch processing
Metadata
- Creating metadata tables
  - Importing metadata
  - Creating a metadata table directly in the Workbench
- Associating data elements with metadata
  - Associate Data Automatically
  - Associate Data with Row
- Working with data and metadata
- Moving, copying and exporting metadata
Workflows
- Creating and editing workflows
- Workflow elements
- Launching workflows individually and in batches
- Advanced workflow batching
  - Batching workflows with more than one input changing per run
  - Multiple levels of batching
- Template workflows
- Managing workflows
- QIAseq Panel Analysis Assistant
Viewing and editing sequences
- Working with sequences
- Working with sequence lists
- Working with annotations
- Sequence element information
- View as text
BLAST search
- BLAST against local data
- BLAST at NCBI
- Output from BLAST searches
- Local BLAST databases
- Manage BLAST databases
- Bioinformatics explained: BLAST
3D Molecule Viewer
- Importing molecule structure files
- Viewing molecular structures in 3D
- Customizing the visualization
  - Visualization styles and colors
  - Project settings
- Tools for linking sequence and structure
- Align Protein Structure
  - Example: alignment of calmodulin
  - The Align Protein Structure algorithm
- Generate Biomolecule
General sequence analyses
- Annotate with GFF/GTF/GVF files
- Create Complexity Plot
- Create Dot Plot
- Create Sequence Statistics
  - Bioinformatics explained: Protein statistics
- Extract Sequences
- Join Sequences
- Pattern Discovery
- Shuffle Sequence
- Motif
Nucleotide analyses
- Convert DNA to RNA
- Convert RNA to DNA
- Reverse complements of sequences
- Translation of DNA or RNA to protein
- Find open reading frames
Protein analyses
- Protein charge
- Antigenicity
- Hydrophobicity
  - Hydrophobicity graphs along sequence
  - Bioinformatics explained: Protein hydrophobicity
- Download Pfam Database
- Pfam domain search
- Find and Model Structure
  - Create structure model
  - Model structure
- Secondary structure prediction
- Protein report
- Reverse translation from protein into DNA
  - Bioinformatics explained: Reverse translation
- Proteolytic cleavage detection
  - Bioinformatics explained: Proteolytic cleavage
Primers
- Primer design - an introduction
- Setting parameters for primers and probes
- Graphical display of primer information
  - Compact information mode
  - Detailed information mode
- Output from primer design
- Standard PCR
- Nested PCR
- TaqMan
- Sequencing primers
- Alignment-based primer and probe design
- Analyze primer properties
- Find binding sites and create fragments
  - Results - binding sites and fragments
- Order primers
Sequencing data analyses
- Importing and viewing trace data
  - Trace settings in the Side Panel
- Trim sequences
  - Trimming using the Trim Sequences tool
  - Manual trimming
- Assemble sequences
- Assemble sequences to reference
- Sort sequences by name
- Add sequences to an existing contig
- View and edit contigs and read mappings
- Reassemble contig
- Secondary peak calling
- Extract Consensus Sequence
Cloning and restriction sites
- Create Sequence Constructs
  - Create Sequence Constructs output
- Homology Based Cloning
- Gateway cloning
- Restriction Analysis and Cloning
- Gel electrophoresis
  - Gel view
Sequence alignment
- Create an alignment
- View alignments
  - Bioinformatics explained: Sequence logo
- Edit alignments
  - Realignment
- Join alignments
- Pairwise comparison
  - The pairwise comparison table
- Bioinformatics explained: Multiple alignments
Phylogenetic trees
- Tools for tree construction
- Tree Settings
- Additional tree views
  - Tree table view
- Bioinformatics explained
RNA structure
- RNA secondary structure prediction
- View and edit secondary structures
- Evaluate structure hypothesis
  - Selecting sequences for evaluation
  - Probabilities
- Structure scanning plot
  - Selecting sequences for scanning
  - The structure scanning result
- Bioinformatics explained: RNA structure prediction by minimum free energy minimization
  - The algorithm
  - Structure elements and their energy contribution
Tracks
- Track types
- Track lists
- Working with tracks
- Reference data as tracks
- Convert tracks
  - Convert from Tracks
  - Convert to Tracks
- Graph tracks
- Merge tracks
  - Merge Annotation Tracks
  - Merge Variant Tracks
- Modify tracks
Prepare sequencing data
- QC for Sequencing Reads
- Trim Reads
- Demultiplex Reads
Quality control for resequencing analysis
- QC for Targeted Sequencing
- Target Region Coverage Analysis
  - Output from Target Region Coverage Analysis
- QC for Read Mapping
- Whole Genome Coverage Analysis
Read mapping
- Map Reads to Reference
- Map Long Reads to Reference
  - Map Long Reads to Reference output
- Reads tracks and stand-alone read mappings
- Local Realignment
- Merge Read Mappings
- Remove Duplicate Mapped Reads
  - Algorithm details and parameters
  - Running remove duplicate mapped reads
- Extract Consensus Sequence
Variant detection
- Variant Detection tools
- Fixed Ploidy Variant Detection
- Low Frequency Variant Detection
- Basic Variant Detection
- Variant Detection - filters
  - General filters
  - Noise filters
- Variant Detection - the outputs
- Fixed Ploidy and Low Frequency Detection tools: detailed descriptions
- Copy Number Variant Detection
- Identify Known Mutations from Sample Mappings
  - Run the Identify Known Mutations from Sample Mappings tool
  - Output from the Identify Known Mutations from Sample Mappings tool
- InDels and Structural Variants
- Structural Variant Caller for Long Reads
- Detect Fusion Genes from DNA
Resequencing analysis
- Variant filtering
- Variant annotation
- Variants comparison
- Variant quality control
  - Create Variant Track Statistics Report
- Functional consequences
- Create Consensus Sequences from Variants
RNA-Seq and Small RNA analysis
- RNA-Seq normalization
- Create Expression Browser
  - The expression browser
  - Expression browser plot
- miRNA analysis
- RNA-Seq Tools
- Expression Plots
- Differential Expression
Microarray analysis
- Experimental design
- Transformation and normalization
- Quality control
- Feature clustering
  - Hierarchical clustering of features
  - K-means/medoids clustering
- Statistical analysis - identifying differential expression
- Annotation tests
  - Hypergeometric Tests on Annotations
  - Gene Set Enrichment Analysis
- General plots
De Novo sequencing
- De Novo Assembly
- Map Reads to Contigs
- De Novo Assemble Long Reads
  - De Novo Assemble Long Reads parameters
  - De Novo Assemble Long Reads output
- Polish Contigs with Reads
  - Polish Contigs with Reads output
Epigenomics analysis
- Histone Chip-Seq
- ChIP-Seq Analysis
- Bisulfite Sequencing
- Advanced Peak Shape Tools
Utility tools
- Extraction
  - Extract Annotated Regions
  - Extract Reads
- Filtering
- Renaming
  - Rename Elements
  - Rename Sequences in Lists
- Reports
- Sequences
- Track utility tools
Legacy tools
- AGP export (legacy)
Appendix
- Use of multi-core computers
- Graph preferences
- Proteolytic cleavage enzymes
- Restriction enzymes database configuration
- Technical information about modifying Gateway cloning sites
- IUPAC codes for amino acids
- IUPAC codes for nucleotides
- Formats for import and export
  - List of bioinformatic data formats
  - List of graphics data formats
- SAM/BAM/CRAM export format specification
  - Flags
- Gene expression annotation files and microarray data formats
- Custom codon frequency tables
- Comparison of track comparison tools
- Matrices for alignment calculation
Bibliography

Copy Number Variant Detection

The Copy Number Variant Detection (Targeted) tool is designed to detect copy number variations (CNVs) from targeted resequencing experiments.

The tool takes read mappings and target regions as input, and produces amplification and deletion annotations. The annotations are generated by a 'depth-of-coverage' method, where the target-level coverages of the case and the controls are compared in a statistical framework using a model based on 'selected' targets. Note that to be 'selected', a target has to have a coverage higher than the specified coverage cutoff AND must be found on a chromosome that was not identified as a coverage outlier in the chromosomal analysis step. If fewer than 50 'selected' targets are found suitable for setting up the statistical models, the CNV tool will terminate prematurely.

The algorithm implemented in the Copy Number Variant Detection (Targeted) tool is inspired by the following papers:

Li et al., CONTRA: copy number analysis for targeted resequencing, Bioinformatics. 2012, 28(10):1307-1313[Li et al., 2012].
Niu and Zhang, The screening and ranking algorithm to detect DNA copy number variations, Ann Appl Stat. 2012, 6(3): 1306-1326 [Niu and Zhang, 2012].

For more information, you can also read our whitepaper.

The Copy Number Variant Detection (Targeted) tool identifies CNVs regions where the normalized coverage is statistically significantly different from the controls.

The algorithm carries out the analysis in several steps.

Base-level coverages are analyzed for all samples, and a robust coverage baseline is generated using the control samples.
Chromosome-level coverage analysis is carried out on the case sample, and any chromosomes with unexpectedly high or low coverages are identified.
Sample coverages are normalized, and a global, target-level statistical model is set up for the variation in fold-change as a function of coverage in the baseline.
Each chromosome is segmented into regions of similar fold-changes.
The expected fold-change variation in region is determined using the statistical model for target-level coverages. Region-level CNVs are identified as the regions with fold-changes significantly different from 1.0.
If chosen in the parameter steps, gene-level CNV calls are also produced.

Subsections