Browse the manual

Introduction to CLC Genomics Workbench
- Contact information and citation
- Download and installation
- System requirements
  - Limitations on maximum number of cores
- Workbench Licenses
- Plugins
- Network configuration
- History of the CLC Workbenches
User interface
- View Area
- Zoom functionality in the View Area
- Toolbox and Favorites tabs
  - Toolbox tab
  - Favorites tab
- Processes tab and Status bar
- History and Element Info views
- Workspace
- List of shortcuts
Data management and search
- Navigation Area
- Working with tables
  - Filtering tables
- Customized attributes on data locations
- Searching for data in CLC Locations
  - Quick Search
  - Local Search
User preferences and settings
- General preferences
- View preferences
  - Import and export Side Panel settings
- Data preferences
- Advanced preferences
- Export/import of preferences
- Side Panel view settings
Printing
- Selecting which part of the view to print
- Page setup
- Print preview
Connections to other systems
- CLC Server connection
- AWS Connections
Import/export of data and graphics
- Standard import
  - External files
- Import tracks
  - GFF3 format
  - VCF import
- Import high-throughput sequencing data
- Import RNA spike-in controls
- Import Primers
  - Import Primer Pairs
- Data export
- Export graphics to files
  - File formats
- Export graph data points to a file
- CLC Server data import and export
- Copy/paste view output
Data download
- Search for Sequences at NCBI
  - NCBI search options
  - Handling of NCBI search results
- Search for PDB Structures at NCBI
- Search for Sequences in UniProt (Swiss-Prot/TrEMBL)
  - UniProt search options
  - Handling of UniProt search results
- Search for Reads in SRA
- Sequence web info
References management
- Download Genomes
- QIAGEN Sets
- Custom Sets
- Imported Data
- Exporting reference data outside of the Reference Data Manager framework
Running tools, handling results and batching
- Running tools
  - Running a tool on a CLC Server
- Handling results
- Batch processing
Metadata
- Creating metadata tables
  - Importing metadata
  - Creating a metadata table directly in the Workbench
- Associating data elements with metadata
  - Associate Data Automatically
  - Associate Data with Row
- Working with data and metadata
- Moving, copying and exporting metadata
Workflows
- Creating and editing workflows
- Workflow elements
- Launching workflows individually and in batches
- Advanced workflow batching
  - Batching workflows with more than one input changing per run
  - Multiple levels of batching
- Template workflows
- Managing workflows
Viewing and editing sequences
- Sequence Lists
- View sequences
- Working with annotations
- Element information
- View as text
BLAST search
- Running BLAST searches
  - BLAST at NCBI
  - BLAST against local data
- Output from BLAST searches
- Local BLAST databases
- Manage BLAST databases
- Bioinformatics explained: BLAST
3D Molecule Viewer
- Importing molecule structure files
- Viewing molecular structures in 3D
- Customizing the visualization
  - Visualization styles and colors
  - Project settings
- Tools for linking sequence and structure
- Align Protein Structure
  - Example: alignment of calmodulin
  - The Align Protein Structure algorithm
- Generate Biomolecule
General sequence analyses
- Annotate with GFF/GTF/GVF file
- Extract sequences
- Shuffle sequence
- Dot plots
- Local complexity plot
- Sequence statistics
  - Bioinformatics explained: Protein statistics
- Join Sequences
- Pattern discovery
  - Pattern discovery search parameters
  - Pattern search output
- Motif Search
- Create motif list
Nucleotide analyses
- Convert DNA to RNA
- Convert RNA to DNA
- Reverse complements of sequences
- Translation of DNA or RNA to protein
- Find open reading frames
  - Open reading frame parameters
Protein analyses
- Protein charge
- Antigenicity
- Hydrophobicity
  - Hydrophobicity graphs along sequence
  - Bioinformatics explained: Protein hydrophobicity
- Download Pfam Database
- Pfam domain search
- Find and Model Structure
  - Create structure model
  - Model structure
- Secondary structure prediction
- Protein report
- Reverse translation from protein into DNA
  - Bioinformatics explained: Reverse translation
- Proteolytic cleavage detection
  - Bioinformatics explained: Proteolytic cleavage
Primers
- Primer design - an introduction
  - General concept
  - Scoring primers
- Setting parameters for primers and probes
  - Primer Parameters
- Graphical display of primer information
  - Compact information mode
  - Detailed information mode
- Output from primer design
- Standard PCR
- Nested PCR
- TaqMan
- Sequencing primers
- Alignment-based primer and probe design
- Analyze primer properties
- Find binding sites and create fragments
  - Binding parameters
  - Results - binding sites and fragments
- Order primers
Sequencing data analyses
- Importing and viewing trace data
  - Trace settings in the Side Panel
- Trim sequences
  - Trimming using the Trim tool
  - Manual trimming
- Assemble sequences
- Assemble sequences to reference
- Sort sequences by name
- Add sequences to an existing contig
- View and edit contigs and read mappings
- Reassemble contig
- Secondary peak calling
- Extract Consensus Sequence
- Combine Reports
  - Combine Reports output
Cutting and cloning
- Restriction site analyses
- Restriction enzyme lists
- Restriction Based Cloning
- Homology Based Cloning
- Gateway cloning
- Gel electrophoresis
  - Gel view
Sequence alignment
- Create an alignment
- View alignments
  - Bioinformatics explained: Sequence logo
- Edit alignments
  - Realignment
- Join alignments
- Pairwise comparison
  - The pairwise comparison table
  - Bioinformatics explained: Multiple alignments
Phylogenetic trees
- K-mer Based Tree Construction
- Create tree
- Model Testing
- Maximum Likelihood Phylogeny
  - Bioinformatics explained
- Tree Settings
- Metadata and phylogenetic trees
RNA structure
- RNA secondary structure prediction
- View and edit secondary structures
- Evaluate structure hypothesis
  - Selecting sequences for evaluation
  - Probabilities
- Structure scanning plot
  - Selecting sequences for scanning
  - The structure scanning result
- Bioinformatics explained: RNA structure prediction by minimum free energy minimization
  - The algorithm
  - Structure elements and their energy contribution
Tracks
- Track types
- Working with tracks
- Track lists
- Retrieving reference data tracks
- Merge Annotation Tracks
- Merge Variant Tracks
- Track Conversion
  - Convert to Tracks
  - Convert from Tracks
- Annotate and Filter
- Graphs
Prepare sequencing data
- QC for Sequencing Reads
- Trim Reads
- Demultiplex Reads
Quality control for resequencing analysis
- QC for Targeted Sequencing
- Target Region Coverage Analysis
  - Output from Target Region Coverage Analysis
- QC for Read Mapping
- Whole Genome Coverage Analysis
- Combine Reports
  - Combine Reports output
  - Report types supported
- Create Sample Report
  - Create Sample Report output
Read mapping
- Map Reads to Reference
- Reads tracks and stand-alone read mappings
- Local Realignment
- Merge Read Mappings
- Remove Duplicate Mapped Reads
  - Algorithm details and parameters
  - Running remove duplicate mapped reads
- Extract Consensus Sequence
Variant detection
- Variant Detection tools
- Fixed Ploidy Variant Detection
- Low Frequency Variant Detection
- Basic Variant Detection
- Variant Detection - filters
  - General filters
  - Noise filters
- Variant Detection - the outputs
- Fixed Ploidy and Low Frequency Detection tools: detailed descriptions
- Copy Number Variant Detection
- Identify Known Mutations from Sample Mappings
  - Run the Identify Known Mutations from Sample Mappings tool
  - Output from the Identify Known Mutations from Sample Mappings tool
- InDels and Structural Variants
Resequencing analysis
- Variant filtering
- Variant annotation
- Variants comparison
- Variant quality control
  - Create Variant Track Statistics Report
- Functional consequences
- Create Consensus Sequences from Variants
RNA-Seq and Small RNA analysis
- RNA-Seq normalization
- Create Expression Browser
  - The expression browser
  - Expression browser plot
- miRNA analysis
- RNA-Seq Tools
  - RNA-Seq Analysis
  - Detect and Refine Fusion Genes
- Expression Plots
- Differential Expression
Microarray analysis
- Experimental design
- Transformation and normalization
- Quality control
- Feature clustering
  - Hierarchical clustering of features
  - K-means/medoids clustering
- Statistical analysis - identifying differential expression
- Annotation tests
  - Hypergeometric Tests on Annotations
  - Gene Set Enrichment Analysis
- General plots
De Novo sequencing
- The CLC de novo assembly algorithm
- De Novo Assembly
- Map Reads to Contigs
Epigenomics analysis
- Histone Chip-Seq
- ChIP-Seq Analysis
- Annotate with nearby gene information
- Bisulfite Sequencing
- Advanced Peak Shape Tools
Utility tools
- Extract Annotated Regions
- Extract Reads
- Filter on Custom Criteria
- Merge Overlapping Pairs
- Track tools
- Create Sequence List
- Update Sequence Attributes in Lists
- Split Sequence List
- Subsample Sequence List
- Rename Elements
- Rename Sequences in Lists
Legacy tools
- QIAGEN GeneReader Sequencing Import (legacy)
Appendix
- Use of multi-core computers
- Graph preferences
- BLAST databases
- Proteolytic cleavage enzymes
- Restriction enzymes database configuration
- Technical information about modifying Gateway cloning sites
- IUPAC codes for amino acids
- IUPAC codes for nucleotides
- Formats for import and export
  - List of bioinformatic data formats
  - List of graphics data formats
- SAM/BAM export format specification
  - Flags
- Gene expression annotation files and microarray data formats
- Translation Tables
- Custom codon frequency tables
- Comparison of track comparison tools
- Matrices for alignment calculation
Bibliography

Corrected p-values

Clicking Next will display a dialog as shown in figure 32.48.

Image stat_step3
Figure 32.48: Additional settings for the statistical analysis.

At the top, you can select which values to analyze (see Selecting transformed and normalized values for analysis).

Below you can select to add two kinds of corrected p-values to the analysis (in addition to the standard p-value produced for the test statistic):

Bonferroni corrected.
FDR corrected.

Both are calculated from the original p-values, and aim in different ways to take into account the issue of multiple testing [Dudoit et al., 2003]. The problem of multiple testing arises because the original p-values are related to a single test: the p-value is the probability of observing a more extreme value than that observed in the test carried out. If the p-value is 0.04, we would expect an as extreme value as that observed in 4 out of 100 tests carried out among groups with no difference in means. Popularly speaking, if we carry out 10000 tests and select the features with original p-values below 0.05, we will expect about 0.05 times 10000 = 500 to be false positives.

The Bonferroni corrected p-values handle the multiple testing problem by controlling the 'family-wise error rate': the probability of making at least one false positive call. They are calculated by multiplying the original p-values by the number of tests performed. The probability of having at least one false positive among the set of features with Bonferroni corrected p-values below 0.05, is less than 5%. The Bonferroni correction is conservative: there may be many genes that are differentially expressed among the genes with Bonferroni corrected p-values above 0.05, that will be missed if this correction is applied.

Instead of controlling the family-wise error rate we can control the false discovery rate: FDR. The false discovery rate is the proportion of false positives among all those declared positive. We expect 5 % of the features with FDR corrected p-values below 0.05 to be false positive. There are many methods for controlling the FDR - the method used in CLC Genomics Workbench is that of [Benjamini and Hochberg, 1995].

Click Finish to start the tool.

Note that if you have already performed statistical analysis on the same values, the existing one will be overwritten.