CNV and LOH Detection

The CNV and LOH Detection tool is designed to detect copy number variations (CNVs) and loss-of-heterozygosity (LOH) from targeted resequencing experiments.

The tool takes read mappings, target regions and optionally variant tracks as input, and produces amplification and deletion annotations. The annotations are generated by a 'depth-of-coverage' method, where the target-level coverages of the case and the controls are compared in a statistical framework using a model based on 'selected' targets. Note that to be 'selected', a target has to have a coverage higher than the specified coverage cutoff AND must be found on a chromosome that was not identified as a coverage outlier in the chromosomal analysis step. If fewer than 50 'selected' targets are found suitable for setting up the statistical models, the CNV tool will terminate prematurely. If a somatic variant track is provided the tool can use B allele frequencies to improve the normalization of target coverages in small panels and to infer LOH.

The algorithm implemented in the CNV and LOH Detection tool is inspired by the following papers:

For more information, you can also read our whitepaper: https://digitalinsights.qiagen.com/files/whitepapers/Biomedical_Genomics_Workbench_CNV_White_Paper.pdf.

The CNV and LOH Detection tool identifies CNV regions where the normalized coverage is statistically significantly different from the controls.

The algorithm carries out the analysis in several steps.

  1. Base-level coverages are analyzed for all samples, and a robust coverage baseline is generated using the control samples.
  2. Chromosome-level coverage analysis is carried out on the case sample, and any chromosomes with unexpectedly high or low coverages are identified.
  3. Sample coverages are normalized, and a global, target-level statistical model is set up for the variation in fold-change as a function of coverage in the baseline.
  4. Each chromosome is segmented into regions of similar fold-changes.
  5. The expected fold-change variation in region is determined using the statistical model for target-level coverages. Region-level CNVs are identified as the regions with fold-changes significantly different from 1.0.
  6. If chosen in the parameter steps, gene-level CNV calls are also produced.

Based on coverage ratios and the allele ratios of putative heterozygous germline variants the tool can also detect targets and regions affected by Loss-of-heterozygosity events. The tool can handle both matched tumor normal data and unpaired tumor data. In both cases variants that are assumed to be heterozygous in normal tissue has to be identified.

Tumor-normal pairs: For matched tumor normal data, a track with somatic variants and a track with germline variants will be used. The variants used to detect LOH are simply the somatic variants overlapping heterozygous germline variants.

Tumor only: For unpaired tumor data, a somatic variant track and a database of known segregating variants are used (typically dbSNP common). The variants used in LOH detection are the somatic variants overlapping the variants in the database.

The model operates with a number of ploidy states, which are characterized by their numbers of parental and maternal alleles (Table 25.1). The state together with the tumor purity (the percentage of cells in the sample originating from the tumor) determines the expected coverage ratio and the expected allele frequencies of the heterozygous variants. As an example, if a normal diploid sample would yield 200 reads, then a sample with purity 50% and copy-number 1 (deletion) would yield 150 reads (50%*200+50%*100). That means the coverage ratio is 150/200 = 75%. Table 25.2 shows the expected coverage ratios for different states and purities.

The state together with tumor purity also determines the expected allele frequencies of heterozygous variants. As an example, consider a sample with 60% purity where the cancer cells contain a deletion in a region with two alleles, A and B. If we take 100 cells:

In total there will be 100 copies of allele A, and 40 copies of B. And the frequency of A will be 100 / (100 + 40) = 71.4%.

The tool estimates the purity using a hidden Markov model (HMM), that is then used to predict the most probable state for each target.


Table 25.1: The expected frequencies of variants that are heterozygous in the normal tissue given tumor purity and the ploidy state.
State Allele-ratio Copy-number Loss-of-heterozygosity
Bi-allelic deletion 0:0 0  
Deletion 0:1 1 deletion LOH
Diploid 1:1 2  
Uniparental disomy 0:2 2 copy-neutral LOH
Duplication 1:2 3  
WGD 2:2 4  



Table 25.2: The expected frequencies of variants that are heterozygous in the normal tissue given tumor purity and the ploidy state.
Purity Bi-allelic deletion Deletion Diploid Uniparental disomy Duplication WGD
10.0% 90.0% 95.0% 100.0% 100.0% 105.0% 110.0%
20.0% 80.0% 90.0% 100.0% 100.0% 110.0% 120.0%
30.0% 70.0% 85.0% 100.0% 100.0% 115.0% 130.0%
40.0% 60.0% 80.0% 100.0% 100.0% 120.0% 140.0%
50.0% 50.0% 75.0% 100.0% 100.0% 125.0% 150.0%
60.0% 40.0% 70.0% 100.0% 100.0% 130.0% 160.0%
70.0% 30.0% 65.0% 100.0% 100.0% 135.0% 170.0%
80.0% 20.0% 60.0% 100.0% 100.0% 140.0% 180.0%
90.0% 10.0% 55.0% 100.0% 100.0% 145.0% 190.0%
100.0% 0.0% 50.0% 100.0% 100.0% 150.0% 200.0%



Table 25.3: The expected frequencies of variants that are heterozygous in the normal tissue given tumor purity and the ploidy state.
Purity Bi-allelic deletion Deletion Diploid Uniparental disomy Duplication WGD
10.0% 50.0% 52.6% 50.0% 55.0% 52.4% 50.0%
20.0% 50.0% 55.6% 50.0% 60.0% 54.5% 50.0%
30.0% 50.0% 58.8% 50.0% 65.0% 56.5% 50.0%
40.0% 50.0% 62.5% 50.0% 70.0% 58.3% 50.0%
50.0% 50.0% 66.7% 50.0% 75.0% 60.0% 50.0%
60.0% 50.0% 71.4% 50.0% 80.0% 61.5% 50.0%
70.0% 50.0% 76.9% 50.0% 85.0% 63.0% 50.0%
80.0% 50.0% 83.3% 50.0% 90.0% 64.3% 50.0%
90.0% 50.0% 90.9% 50.0% 95.0% 65.5% 50.0%
100.0%   100.0% 50.0% 100.0% 66.7% 50.0%


Running the CNV and LOH Detection tool

To run the CNV and LOH Detection tool, go to:

        Toolbox | Resequencing Analysis (Image resequencing) | CNV and LOH Detection (Image cnv_detection5_16_n_p)

Select the case read mapping and click Next.

You are now presented with choices regarding the data to use in the CNV prediction method, as shown in figure 25.4.

Image cnv_detection_step1
Figure 25.4: The first step of the CNV and LOH detection tool.

Image cnv_detection_step2
Figure 25.5: The second step of the CNV detection tool

Click Next to set the parameters related to the target-level and region-level CNV detection, as shown in as shown in figure 25.5.

Clicking Next, you are presented with options about the results (see figure 25.6). In this step, you can choose to create an algorithm report by checking the Create algorithm report box. Furthermore, you can choose to output results for every target in your input, by checking the Create target-level CNV track box.

Image cnv_detection_savestep
Figure 25.6: Specifying whether an algorithm report and a target-level CNV track should be created.

When finished with the settings, click Next to start the algorithm.


Copy number and fold change

When configuring the minimum fold change thresholds for calling CNVs, it can be useful to understand the difference between copy number and fold change and the relationship between tumor fold change, sample fold change and sample purity.

The copy number (CN) gives the number of copies of a gene. For a normal diploid sample the copy number, or ploidy, of a gene is 2.

The fold change is a measure of how much the copy number of a case sample differs from that of a normal sample. When the copy number for both the case sample and the normal sample is 2, this corresponds to a fold change of 1 (or -1).

The sample fold change can be calculated from the normal copy number and sample copy number. The formula differs for amplifications and deletions:

Fold change, amplifications (CN(sample) > CN(normal))$\displaystyle = \frac{\text{CN(sample)}}{\text{CN(normal)}}$ (25.1)

Fold change, deletions (CN(sample) < CN(normal))$\displaystyle = -\frac{\text{CN(normal)}}{\text{CN(sample)}}$ (25.2)

Fold change values for amplifications and deletions are asymmetric in that a 50% increase in copy number from 2 to 3 (heterozygote amplification) converts to a fold change of 1.5, whereas a 50% decrease in copy number from 2 to 1 (heterozygous deletion), gives a fold change of -2.0. The difference is even more pronounced if we consider what could be interpreted as a homozygote duplication (copy number 4) and a homozygote deletion (copy number 0). Here, the calculated fold changes land at 2 and $ -\infty$, respectively.

The fact that the same percent-wise change in coverage (copy number) leads to a higher fold change for deletions than for amplifications means that given the same amplification and deletion fold change cutoff there is a higher risk of calling false positive deletions than amplifications - it takes less coverage fluctuation to pass the fold change cutoff for deletions.


Table 25.4: The relationship between copy number and fold change for amplifications and deletions.
  Copy number Fold change
Amplifications
  2 1
  3 1.5
  4 2
  6 3
  8 4
Deletions
  2 -1
  1 -2
  0.5 -4
  0.2 -10
  0.1 -20
  0 $ -\infty$



How to set the fold-change cutoff when the sample purity is not 100%

Given a sample purity of $ X$%, and a desired detection level (absolute value of fold-change in 100% pure sample) of $ T$, the following formula gives the required fold-change cutoff for an amplification:

cutoff$\displaystyle = \frac{X\text{\%}}{100\text{\%}} \times T + (1-\frac{X\text{\%}}{100\text{\%}}).$ (25.3)

For example, if the sample purity is 40%, and you want to detect 6-fold amplifications (e.g. 12 copies instead of 2), then the cutoff should be:

cutoff$\displaystyle = \frac{40\text{\%}}{100\text{\%}} \times 6 + (1-\frac{40\text{\%}}{100\text{\%}}) = 3.0.$ (25.4)

The following formula gives the required fold-change cutoff for a deletion:

cutoff$\displaystyle = \frac{1}{\frac{X\text{\%}}{100\text{\%}} \times \frac{1}{T} + (1-\frac{X\text{\%}}{100\text{\%}})}.$ (25.5)

For example, if the sample purity is 40%, and you want to detect a 2-fold deletions (e.g. 1 copy instead of 2), then the cutoff should be:

cutoff$\displaystyle = \frac{1}{\frac{40\text{\%}}{100\text{\%}} \times \frac{1}{T} + (1-\frac{40\text{\%}}{100\text{\%}})} = 1.25.$ (25.6)

Figure 25.7 and Figure 25.8 shows the required fold-change cutoffs in order to detect a particular degree of amplification or deletion respectively at different sample purities.

Image sample_purity_graph_amp
Figure 25.7: The required fold-change cutoff to detect amplifications of different magnitudes as a function of sample purity.

Image sample_purity_graph_del
Figure 25.8: The required fold-change cutoff to detect deletions of different magnitudes as a function of sample purity.

The CNV and LOH Detection tool calls CNVs that are both global outliers on the target-level, and locally consistent on the region-level. The tool produces several outputs, which are described below.



Subsections