Browse the manual

Introduction to Biomedical Genomics Workbench
- Contact information
- System requirements
  - Limitations on maximum number of cores
- Workbench Licenses
- About CLC Workbenches
  - New program feature request
  - Getting help
- When the program is installed: Getting started
  - Import of example data
- Plugins
- Network configuration
User interface
- View Area
- Zoom and selection in View Area
- Toolbox and Status Bar
- Workspace
- List of shortcuts
Data organization and management
- Navigation Area
- Customized attributes on data locations
- Filling in values
  - What happens when a clc object is copied to another data location?
  - Searching
- Local search
- Sequence web info
User preferences and settings
- General preferences
- Default view preferences
  - Number formatting in tables
  - Import and export Side Panel settings
- Data preferences
- Advanced preferences
  - Default data location
- Export/import of preferences
  - The different options for export and import
- View settings for the Side Panel
  - Saving, removing and applying saved settings
Printing
- Selecting which part of the view to print
- Page setup
  - Header and footer
- Print preview
Import/export of data and graphics
- Standard import
- Import tracks
- Import high-throughput sequencing data
- Import Primer Pairs
- Data export
- Export graphics to files
- Export graph data points to a file
- Copy/paste view output
History log
- Element history
  - Sharing data with history
Batching and result handling
- Batch processing
- How to handle results of analyses
  - Table outputs
  - Batch log
- Working with tables
  - Filtering tables
Viewing and editing sequences
- View sequence
- Circular DNA
  - Using split views to see details of the circular molecule
  - Mark molecule as circular and specify starting point
- Working with annotations
- Element information
- View as text
- Sequence Lists
Viewing structures
- Importing molecule structure files
- Viewing molecular structures in 3D
  - Moving and rotating
  - Troubleshooting 3D graphics errors
- Customizing the visualization
  - Visualization styles and colors
  - Project settings
- Snapshots of the molecule visualization
- Tools for linking sequence and structure
- Protein structure alignment
Getting started
- Reference data
- Create new folder
- Import data
  - How to import data
Preparing Raw Data
- Prepare sequencing data - all application types
- Analysis of sequencing data
Whole genome sequencing (WGS)
- Automatic analysis of sequencing data (WGS)
- Identify Variants (WGS)
  - How to run the 'Identify Variants' ready-to-use workflow
  - Output from the Identify Variants workflow
- Annotate Variants (WGS)
- Filter Somatic Variants (WGS)
- Identify Somatic Variants from Tumor Normal Pair (WGS)
- Identify Known Variants in One Sample (WGS)
Whole exome sequencing (WES)
- Automatic analysis of sequencing data (WES)
- Identify Variants (WES)
- Annotate Variants (WES)
- Filter Somatic Variants (WES)
- Identify Somatic Variants from Tumor Normal Pair (WES)
  - Import your targeted regions
  - How to run the 'Identify Somatic Variants from Tumor Normal Pair' ready-to-use workflow
- Identify Known Variants in One Sample (WES)
- Identify and Annotate Variants (WES)
Targeted amplicon sequencing (TAS)
- Automatic analysis of sequencing data (TAS)
- Identify Variants (TAS)
- Annotate Variants (TAS)
- Filter Somatic Variants (TAS)
- Identify Somatic Variants from Tumor Normal Pair (TAS)
  - Import your targeted regions
  - How to run the 'Identify Somatic Variants from Tumor Normal Pair' ready-to-use workflow
- Identify Known Variants in One Sample (TAS)
- Identify and Annotate Variants (TAS)
Whole Transcriptome Sequencing (WTS)
- Automatic analysis of RNA-seq data
- Analysis of multiple samples
- Annotate Variants (WTS)
- Compare variants in DNA and RNA
- Identify Candidate Variants and Genes from Tumor Normal Pair
- Identify variants and add expression values
- Identify and Annotate Differentially Expressed Genes and Pathways
Using data from other workbenches
- Open outputs from other workbenches
Genome browser tools
- Create new genome browser view
- Genome browser view
- Creating graph tracks
Quality control tools
- QC for Target Sequencing
- QC for Sequencing Reads
- QC for Read Mapping
  - Running the 'QC for Read Mapping' tool
  - Summary mapping report
Preparing raw data tools
- Merge overlapping pairs
  - Using quality scores when merging
  - Report of merged pairs
- Trim Sequences
- Demultiplex reads
Resequencing analysis tools
- Identify Known Mutations from Sample Mappings
  - Input and Parameters
  - Output from the 'Identify Known Mutations from Sample Mappings' tool
  - How to run the 'Identify Known Mutations from Sample Mappings' tool
- Trim primers of mapped reads
- Extract reads based on overlap
- Map Reads to Reference
  - Selecting reads and reference
  - Including or excluding regions (masking)
  - Mapping parameters
  - Mapping paired reads
  - Non-specific matches
  - Gap placement
  - Computational requirements
  - Reference Caching
- Mapping output options
- Color space
  - Sequencing
  - Error modes
  - Mapping in color space
  - Viewing color space information
- Mapping result
  - View settings in the Side Panel
- Local realignment
  - Method
  - Realignment of unaligned ends
  - Guided Realignment
  - Multi-pass local realignment
  - Known Limitations
  - Computational Requirements
  - How to run the Local Realignment tool
- Merge mapping results
- Copy Number Variant Detection
  - Running the Copy Number Variant Detection tool
  - Region-level CNV track (Region CNVs)
  - Target-level CNV track (Target CNVs)
  - Gene-level annotation track (Gene CNVs)
  - CNV results report
  - CNV algorithm report
- Remove duplicate mapped reads
  - Algorithm details and parameters
  - Running the duplicate reads removal
- Coverage analysis
  - Running the Coverage analysis tool
- Variant Detectors - overview
  - Differences among the variants called by the three variant callers
  - How the variant detectors work
- Basic Variant Detection
- Fixed Ploidy Variant Detection
  - Ploidy and sensitivity
- Low Frequency Variant Detection
- Variant Detectors - error model estimation
- Variant Detectors - filters
  - General filters
  - Noise filters
- Variant Detectors - the outputs
  - The variant track output
  - The annotated table output
  - The report
- The Fixed Ploidy and Low Frequency variant callers: detailed descriptions
  - The Fixed Ploidy Variant Caller: Models and methods
  - The Low Frequency Variant caller: Models and methods
- InDels and Structural Variants
  - How to run the InDels and Structural Variants tool
  - The Structural Variants and InDels output
  - The InDels and Structural Variants detection algorithm
  - The InDels and Structural Variants detection algorithm - Step 1: Creating Left- and Right breakpoint signatures
  - The InDels and Structural Variants detection algorithm - Step 2: Creating Structural variant signatures
  - Theoretically expected structural variant signatures
  - How sequence complexity is calculated
- Variant data
  - Variant tracks
  - The annotated variant table
  - Variant types
- Detailed information about overlapping paired reads
Add information to variants tools
- Add information from variant databases
- Add conservation scores
- Add exon number
- Add flanking sequence
- Add fold changes
- Add information about amino acid changes
- Add information from genomic regions
- Add information from overlapping genes
- Link Variants to 3D Protein Structure
- Download 3D Protein Structure Database
- From databases
Remove variants tools
- Remove variants found in external database
- Remove variants not found in external database
- Remove false positives
- Remove Germline Variants
- Remove reference variants
- Remove variants inside genome regions
- Remove variants outside genome regions
- Remove variants outside targeted regions
- From databases
Add information to genes tool
- Add information from overlapping variants
Compare samples tools
- Compare shared variants within a group of samples
- Identify Enriched Variants in Case vs Control Group
- Trio analysis
Identify candidate variants tools
- Create Filter Criteria
- Identify candidate variants
- Remove information from variants
- Identify variants with effect on splicing
Identify candidate genes tools
- Identify differentially expressed gene groups and pathways
- Identify highly mutated gene groups and pathways
- Identify mutated genes
- Select genes by name
Transcriptomics tools
- RNA-Seq analysis
- Small RNA analysis
- Experimental design
- Working with tracks and experiments
- Transformation and normalization
- Quality control
- Statistical analysis - identifying differential expression
- Feature clustering
  - Hierarchical clustering of features
  - K-means/medoids clustering
- Annotation tests
  - Hypergeometric tests on annotations
  - Gene set enrichment analysis
- General plots
Helper tools
- Extract sequences
Cloning and cutting
- Molecular cloning
- Gateway cloning
- Restriction site analysis
  - Dynamic restriction sites
  - Restriction site analysis from the Toolbox
- Gel electrophoresis
- Restriction enzyme lists
  - Create enzyme list
  - View and modify enzyme list
Sequencing Data Analysis
- Importing and viewing trace data
  - Scaling traces
  - Trace settings in the Side Panel
- Trim sequences
  - Trimming using the Trim tool
  - Manual trimming
- Assemble sequences
- Sort sequences by name
- Assemble sequences to reference
- Add sequences to an existing contig
- View and edit read mappings
- Reassemble contig
- Secondary peak calling
Primers
- Primer design - an introduction
  - General concept
  - Scoring primers
- Setting parameters for primers and probes
  - Primer Parameters
- Graphical display of primer information
  - Compact information mode
  - Detailed information mode
- Output from primer design
- Standard PCR
  - User input
  - Standard PCR output table
- Nested PCR
  - Nested PCR output table
- TaqMan
  - TaqMan output table
- Sequencing primers
  - Sequencing primers output table
- Alignment-based primer and probe design
- Analyze primer properties
- Find binding sites and create fragments
  - Binding parameters
  - Results - binding sites and fragments
- Order primers
Epigenomics
- ChIP-Seq Analysis
- Annotate with nearby gene information
Workflows
- Creating a workflow
- Distributing and installing workflows
- Executing a workflow
- Open copy of ready-to-use workflow
Legacy tools
- Quality-based variant detection
- Probabilistic variant detection
Appendix
- Use of multi-core computers
- Reference data overview
- Proteolytic cleavage enzymes
- Restriction enzymes database configuration
- Technical information about modifying Gateway cloning sites
- IUPAC codes for amino acids
- IUPAC codes for nucleotides
- Formats for import and export
  - List of bioinformatic data formats
  - List of graphics data formats
- SAM/BAM export format specification
  - Flags
- Gene expression annotation files and microarray data formats
- Translation Tables
- Matrices for alignment calculation
Bibliography

Duplicated sequences analysis

The duplicated sequences analysis identifies sequences that have been sequenced multiple times. In order to achieve reasonable performance, not all input sequences are analyzed. Instead a sequence-dictionary is used, whose entries are sampled evenly from input sequences. Please note that if you select multiple sequence lists as an input, they will all be considered one data set for this analysis (batching can be used to generate separate reports for an individual sequence list). As soon as a sequence makes it into the dictionary (which is a random process), it is tracked for duplicates until all sequences have been examined. The dictionary size is 250 000 sequences.

Because all current sequencing techniques tend to report fading quality scores for the 3' ends of sequences, there is a risk that duplicates are NOT detected, just because of sequencing errors near their 3' ends. Therefore, the identity of two sequences is calculated using only the first 50nt from the 5' end.

Sequence duplication levels: This results in a table correlating duplication counts with the number of sequences that featured that duplicate-count. For example, if the dictionary contains 10 sequences and each sequence was seen exactly once, then the table will contain only one row displaying: duplication-count=1 and sequence-count=10. Note: due to space restrictions the corresponding bar-plot shows only bars for duplication-counts of x=[0-100]. Bar-heights of duplication-counts >100 are accumulated at x=100, such that a significantly elevated bar-height at x=100 is a normal observation. Please refer to the table-report for a full list of individual duplication-counts.
Duplicated sequences: This results in a list of actual sequences most prevalently observed. The list contains a maximum of 25 (most frequently observed) sequences and is only present in the supplementary report.