Browse the manual

Introduction to CLC Genomics Workbench
- Contact information and citation
- Download and installation
- System requirements
  - Limitations on maximum number of cores
- Workbench Licenses
- Plugins
- Network configuration
User interface
- View Area
- Zoom functionality in the View Area
- Toolbox and Favorites tabs
  - Toolbox tab
  - Favorites tab
- Processes tab and Status bar
- History and Element Info views
- Workspace
- List of shortcuts
Data management and search
- Navigation Area
- Working with non-CLC format files
- Customized attributes on data locations
- Searching for data in CLC Locations
  - Quick Search
  - Local Search
- Backing up data from the CLC Workbench
User preferences and settings
- General preferences
- View preferences
- Data preferences
- Advanced preferences
- Export/import of preferences
- Side Panel view settings
Printing
- Selecting which part of the view to print
- Page setup
- Print preview
Connections to other systems
- CLC Server connection
  - CLC Server data import and export
- AWS Connections
Import of data and graphics
- Standard import
- Import tracks
  - GFF3 format
  - VCF import
- Import NGS Reads
- Import other high-throughput sequencing data
- Import RNA spike-in controls
- Import Primer Pairs
Export of data and graphics
- Data export
- Export graphics to files
  - File formats
- Export graph data points to a file
- Copy/paste view output
Working with tables
- Table view settings and column ordering
- Filtering tables
Data download
- Search for Sequences at NCBI
  - NCBI search options
  - Handling of NCBI search results
- Search for PDB Structures at NCBI
- Search for Sequences in UniProt (Swiss-Prot/TrEMBL)
  - UniProt search options
  - Handling of UniProt search results
- Search for Reads in SRA
- Sequence web info
References management
- Download Genomes
- QIAGEN Sets
- Reference Data Sets and defining Custom Sets
- Storing, managing and moving reference data
  - Imported Data
  - Exporting reference data outside of the Reference Data Manager framework
Running tools, handling results and batching
- Running tools
  - Running a tool on a CLC Server
- Handling results
- Batch processing
Metadata
- Creating metadata tables
  - Importing metadata
  - Creating a metadata table directly in the Workbench
- Associating data elements with metadata
  - Associate Data Automatically
  - Associate Data with Row
- Working with data and metadata
- Moving, copying and exporting metadata
Workflows
- Creating and editing workflows
- Workflow elements
- Launching workflows individually and in batches
- Advanced workflow batching
  - Batching workflows with more than one input changing per run
  - Multiple levels of batching
- Template workflows
- Managing workflows
- QIAseq Panel Analysis Assistant
Viewing and editing sequences
- Sequence Lists
- View sequences
- Working with annotations
- Element information
- View as text
BLAST search
- Running BLAST searches
  - BLAST at NCBI
  - BLAST against local data
- Output from BLAST searches
- Local BLAST databases
- Manage BLAST databases
- Bioinformatics explained: BLAST
3D Molecule Viewer
- Importing molecule structure files
- Viewing molecular structures in 3D
- Customizing the visualization
  - Visualization styles and colors
  - Project settings
- Tools for linking sequence and structure
- Align Protein Structure
  - Example: alignment of calmodulin
  - The Align Protein Structure algorithm
- Generate Biomolecule
General sequence analyses
- Annotate with GFF/GTF/GVF file
- Extract sequences
- Shuffle sequence
- Dot plots
- Local complexity plot
- Sequence statistics
  - Bioinformatics explained: Protein statistics
- Join Sequences
- Pattern discovery
  - Pattern discovery search parameters
  - Pattern search output
- Motif Search
- Create motif list
Nucleotide analyses
- Convert DNA to RNA
- Convert RNA to DNA
- Reverse complements of sequences
- Translation of DNA or RNA to protein
- Find open reading frames
Protein analyses
- Protein charge
- Antigenicity
- Hydrophobicity
  - Hydrophobicity graphs along sequence
  - Bioinformatics explained: Protein hydrophobicity
- Download Pfam Database
- Pfam domain search
- Find and Model Structure
  - Create structure model
  - Model structure
- Secondary structure prediction
- Protein report
- Reverse translation from protein into DNA
  - Bioinformatics explained: Reverse translation
- Proteolytic cleavage detection
  - Bioinformatics explained: Proteolytic cleavage
Primers
- Primer design - an introduction
  - General concept
  - Scoring primers
- Setting parameters for primers and probes
  - Primer Parameters
- Graphical display of primer information
  - Compact information mode
  - Detailed information mode
- Output from primer design
- Standard PCR
- Nested PCR
- TaqMan
- Sequencing primers
- Alignment-based primer and probe design
- Analyze primer properties
- Find binding sites and create fragments
  - Binding parameters
  - Results - binding sites and fragments
- Order primers
Sequencing data analyses
- Importing and viewing trace data
  - Trace settings in the Side Panel
- Trim sequences
  - Trimming using the Trim tool
  - Manual trimming
- Assemble sequences
- Assemble sequences to reference
- Sort sequences by name
- Add sequences to an existing contig
- View and edit contigs and read mappings
- Reassemble contig
- Secondary peak calling
- Extract Consensus Sequence
Cutting and cloning
- Restriction site analyses
- Restriction enzyme lists
- Restriction Based Cloning
- Homology Based Cloning
- Gateway cloning
- Gel electrophoresis
  - Gel view
Sequence alignment
- Create an alignment
- View alignments
  - Bioinformatics explained: Sequence logo
- Edit alignments
  - Realignment
- Join alignments
- Pairwise comparison
  - The pairwise comparison table
  - Bioinformatics explained: Multiple alignments
Phylogenetic trees
- K-mer Based Tree Construction
- Create tree
- Model Testing
- Maximum Likelihood Phylogeny
  - Bioinformatics explained
- Tree Settings
- Metadata and phylogenetic trees
RNA structure
- RNA secondary structure prediction
- View and edit secondary structures
- Evaluate structure hypothesis
  - Selecting sequences for evaluation
  - Probabilities
- Structure scanning plot
  - Selecting sequences for scanning
  - The structure scanning result
- Bioinformatics explained: RNA structure prediction by minimum free energy minimization
  - The algorithm
  - Structure elements and their energy contribution
Tracks
- Track types
- Track lists
- Working with tracks
- Reference data as tracks
- Merge Annotation Tracks
- Merge Variant Tracks
- Track Conversion
  - Convert to Tracks
  - Convert from Tracks
- Annotate and Filter
- Graphs
Prepare sequencing data
- QC for Sequencing Reads
- Trim Reads
- Demultiplex Reads
Quality control for resequencing analysis
- QC for Targeted Sequencing
- Target Region Coverage Analysis
  - Output from Target Region Coverage Analysis
- QC for Read Mapping
- Whole Genome Coverage Analysis
Read mapping
- Map Reads to Reference
- Reads tracks and stand-alone read mappings
- Local Realignment
- Merge Read Mappings
- Remove Duplicate Mapped Reads
  - Algorithm details and parameters
  - Running remove duplicate mapped reads
- Extract Consensus Sequence
Variant detection
- Variant Detection tools
- Fixed Ploidy Variant Detection
- Low Frequency Variant Detection
- Basic Variant Detection
- Variant Detection - filters
  - General filters
  - Noise filters
- Variant Detection - the outputs
- Fixed Ploidy and Low Frequency Detection tools: detailed descriptions
- Copy Number Variant Detection
- Identify Known Mutations from Sample Mappings
  - Run the Identify Known Mutations from Sample Mappings tool
  - Output from the Identify Known Mutations from Sample Mappings tool
- InDels and Structural Variants
Resequencing analysis
- Variant filtering
- Variant annotation
- Variants comparison
- Variant quality control
  - Create Variant Track Statistics Report
- Functional consequences
- Create Consensus Sequences from Variants
RNA-Seq and Small RNA analysis
- RNA-Seq normalization
- Create Expression Browser
  - The expression browser
  - Expression browser plot
- miRNA analysis
- RNA-Seq Tools
  - RNA-Seq Analysis
  - Detect and Refine Fusion Genes
- Expression Plots
- Differential Expression
Microarray analysis
- Experimental design
- Transformation and normalization
- Quality control
- Feature clustering
  - Hierarchical clustering of features
  - K-means/medoids clustering
- Statistical analysis - identifying differential expression
- Annotation tests
  - Hypergeometric Tests on Annotations
  - Gene Set Enrichment Analysis
- General plots
De Novo sequencing
- The CLC de novo assembly algorithm
- De Novo Assembly
- Map Reads to Contigs
Epigenomics analysis
- Histone Chip-Seq
- ChIP-Seq Analysis
- Bisulfite Sequencing
- Advanced Peak Shape Tools
Utility tools
- Extract Annotated Regions
- Extract Reads
- Filter on Custom Criteria
- Merge Overlapping Pairs
- Combine Reports
  - Combine Reports output
- Create Sample Report
  - Create Sample Report output
- Modify Report Type
  - Modifying report types in workflows
- Track tools
- Create Sequence List
- Update Sequence Attributes in Lists
- Split Sequence List
- Subsample Sequence List
- Rename Elements
- Rename Sequences in Lists
Appendix
- Use of multi-core computers
- Graph preferences
- BLAST databases
- Proteolytic cleavage enzymes
- Restriction enzymes database configuration
- Technical information about modifying Gateway cloning sites
- IUPAC codes for amino acids
- IUPAC codes for nucleotides
- Formats for import and export
  - List of bioinformatic data formats
  - List of graphics data formats
- SAM/BAM/CRAM export format specification
  - Flags
- Gene expression annotation files and microarray data formats
- Translation Tables
- Custom codon frequency tables
- Comparison of track comparison tools
- Matrices for alignment calculation
Bibliography

Updating equations for the prior site type probabilities

We first derive the updating equations for the prior site type probabilities $f_s, s \in S$ . The probability that the site is of type

given that we observe the nucleotides

in the reads at the site is:

$\displaystyle P(t\vert n_1,...,n_k)$	$\displaystyle =$	$\displaystyle \frac{P(t, n_1,...,n_k)}{\sum_{s \in S} P(s, n_1,...,n_k)}$
	$\displaystyle =$	$\displaystyle \frac{P(t) P(n_1,...,n_k\vert t)}{\sum_{s \in S} P(s) P(n_1,...,n_k\vert s)}$	(31.4)

Now, for we use our current value for , and if we further insert the expression for $P(n_1,...,n_k\vert t)$ (31.2) we get:

$\displaystyle P(t\vert n_1,...,n_k) = \frac{f_t \prod_{i=1}^k \sum_{N \in \{A, ... ...rod_{i=1}^k \sum_{N \in \{A, C, G, T, -\}}P_s(N) \times e_q(N \rightarrow n_i)}$

(31.5)

We get the updating equation for the prior site type probabilities, $f_t, t \in S$ , from equation 31.5: Let index the sites in the alignment (). Given the current values for the set of site frequencies, $f_t, t \in S$ , and the current values for the set of error probabilities, we obtain updated values for the site frequencies, $f_t^*, t \in S$ , by summing the site type probabilities given the data (as given by equation 31.5) across all sites in the alignment:

$\displaystyle f_t^* = \frac{\sum_{h=1}^H \frac{f_t \prod_{i=1}^k \sum_{N \in \{... ...=1}^k \sum_{N \in \{A, C, G, T, -\}}P_s(N) \times e_q(N \rightarrow n_i^h)}}{H}$

(31.6)