Ploidy state detection

Detect Regional Ploidy is inspired by [Beroukhim et al., 2006].

The tool predicts a ploidy state (table 10.1) for each locus from the input tracks: a CNV target or a somatic SNP that

A ploidy state can be associated to loss-of-heterozygosity (LOH), which is characterized by loss of one allele, whereas the other allele is present in one or more copies.


Table 10.1: The nine possible predicted ploidy states, characterized by the copy number of the minor and major alleles, and the loss-of-heterozygosity (LOH) status. The total copy number is simply the sum of the minor and major copy numbers.
Ploidy state Copy number minor:major LOH
Bi-allelic deletion 0:0 No
Deletion 0:1 Loss
Uniparental disomy 0:2 Neutral
Normal diploid 1:1 No
(0,3) 0:3 Gain
Duplication 1:2 No
(0,4) 0:4 Gain
(1,3) 1:3 No
Whole genome duplication 2:2 No


The tool uses as evidence for copy-number changes the relative log coverage ratio (RLR), calculated as the signed $ \log_2$ of the adjusted fold change from the CNV track.

The tool also relies on B-allele frequencies for somatic SNPs that are assumed to be heterozygous in normal cells. For this, either matched germline variants or a variant database are needed:

If either somatic or germline variants have the "Filter" attribute set (see Filter on Custom Criteria), only variants where this attribute is set to PASS are used.

The sample purity, the proportion of cells in the sample that are tumor-derived, and the ploidy state of a locus determine the expected RLR and B-allele frequencies of the heterozygous variants. For example:

The tool jointly optimizes:

The normalization factor helps correct for systematic coverage shifts. For example, if a large fraction of loci are affected by a deletion, the RLR may be too low, resulting in under-detection of copy-number changes. Because deletions affect both coverage and allele frequencies, the model uses the observed B-allele frequencies to adjust the RLR appropriately. For example, in a case where a normal sample has copy number 2 and a tumor sample with copy number 1 throughout, the normalization factor should ideally be 0.5.

The optimized parameters and HMM are then used to filter loci and generate regions by the following steps:

Finally, loci and regions are annotated with an LOH status, according to table 10.1 and are output in a locus-level ploidy track and a region-level ploidy track, respectively.

Limitations

Detect Regional Ploidy is designed for autosomal chromosomes. The underlying model does not account for the haploid baseline of sex chromosomes in male samples, and may therefore misinterpret the coverage and allele frequencies in these regions. Results for sex chromosomes should be interpreted with caution.