Copy Number Variant Detection
The Copy Number Variant Detection tool is designed to detect copy number variations (CNVs) from targeted resequencing experiments.
The tool takes read mappings and target regions as input, and produces amplification and deletion annotations. The annotations are generated by a 'depth-of-coverage' method, where the target-level coverages of the case and the controls are compared in a statistical framework using a model based on 'selected' targets. Note that to be 'selected', a target has to have a coverage higher than the specified coverage cutoff AND must be found on a chromosome that was not identified as a coverage outlier in the chromosomal analysis step. If fewer than 50 'selected' targets are found suitable for setting up the statistical models, the CNV tool will terminate prematurely.
The algorithm implemented in the Copy Number Variant Detection tool is inspired by the following papers:
- Li et al., CONTRA: copy number analysis for targeted resequencing, Bioinformatics. 2012, 28(10):1307-1313[Li et al., 2012].
- Niu and Zhang, The screening and ranking algorithm to detect DNA copy number variations, Ann Appl Stat. 2012, 6(3): 1306-1326 [Niu and Zhang, 2012].
For more information, you can also read our whitepaper: https://digitalinsights.qiagen.com/files/whitepapers/Biomedical_Genomics_Workbench_CNV_White_Paper.pdf.
The Copy Number Variant Detection tool identifies CNVs regions where the normalized coverage is statistically significantly different from the controls.
The algorithm carries out the analysis in several steps.
- Base-level coverages are analyzed for all samples, and a robust coverage baseline is generated using the control samples.
- Chromosome-level coverage analysis is carried out on the case sample, and any chromosomes with unexpectedly high or low coverages are identified.
- Sample coverages are normalized, and a global, target-level statistical model is set up for the variation in fold-change as a function of coverage in the baseline.
- Each chromosome is segmented into regions of similar fold-changes.
- The expected fold-change variation in region is determined using the statistical model for target-level coverages. Region-level CNVs are identified as the regions with fold-changes significantly different from 1.0.
- If chosen in the parameter steps, gene-level CNV calls are also produced.
Subsections
- The Copy Number Variant Detection tool
- Region-level CNV track (Region CNVs)
- Target-level CNV track (Target CNVs)
- Gene-level annotation track (Gene CNVs)
- How to interpret fold-changes when the sample purity is not 100%
- CNV results report
- CNV algorithm report