Detect Regional Ploidy
The Detect Regional Ploidy tool is designed to detect regional ploidy levels including loss-of-heterozygosity (LOH) from targeted research resequencing experiments.
The tool takes a target-level CNV events annotation track (from a CNV tool), somatic variants, and either germline variants or known segregating variants and optionally centromers.
To run the Detect Regional Ploidy tool, go to:
Toolbox | Resequencing Analysis () | Variant Detection () | Detect Regional Ploidy ()
Select the CNV target-level annotation track generated by a CNV tool and click Next.
You are now presented with choices regarding LOH detection.
- Somatic variants A track containing variants in the somatic sample. Their allele frequencies must be provided.
- Type of variant track with known variants Choose if the track with known variants is a variant database (Variant database) or a matching germline variant track (Germline variants). This will determine if LOH detection is performed in unpaired mode or in matched tumor normal mode.
- Known variants If "Variant database" is chosen above, provide a variant track of known SNPs in the population annotated with allele frequencies (e.g. dbSNP). The variant track can be restricted to the target region to improve computation time. This can be done using the Filter Based on Overlap tool, see https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?manual=Filter_Based_on_Overlap.html. If "Germline variants" are chosen above, provide a variant track with matching germline variants. The variants are automatically filtered to heterozygous variants. For optimal performance, the variants should be high confidence.
- Centromeres If provided, the centromeric regions will be excluded from the region-level ploidy track.
- Normalize coverage using allele frequencies If enabled, allele frequencies will be used to find the correct coverage normalization. If a large fraction of targets are affected by say a deletion, the normalization factor used for the sample will be too low, resulting in underdetection. However, a deletion is both expected to affect the coverage and the allele frequencies and this information can be used to correct the normalization factor (see tables 11.2 and 11.3). As an example, if the control sample has copy number 2 for all targets, but the case sample has copy number 1 for all targets, the coverage after correcting for total library size should ideally be adjusted by a factor 0.5. Enabling this option is recommended for small panels where a large fraction of targets may be affected by CNV events.
- Minimum normalization, Maximum normalization If `Normalize coverage using allele frequencies` is enabled this defines the limits to the amount of normalization done.
- Minimum sample purity The lowest sample purity the model can estimate. It is hard to distinguish a sample with only a few CNV and LOH events from a sample with very low purity. Set this parameter to the lowest purity that the model is allowed to use.
- Transition factory The transition factor controls the chance of switching state. A higher transition factor makes state switches less probable.
- HMM decoding method Method for optimizing and decoding Hidden Markov Model (VITERBI or POSTERIOR).
- Minimum merge size Remove merged regions consisting of fewer than this number of targets, when joining targets into regions.
Regional ploidy estimation
The algorithm implemented in the Detect Regional Ploidy tool is inspired by the following paper:
- Beroukhim et al. Inferring loss-of-heterozygosity from unpaired tumors using high-density oligonucleotide SNP arrays, PLoS Computational Biology. 2006, 2(5): 323-332 [Beroukhim et al., 2006]
Based on coverage ratios and the allele ratios of putative heterozygous germline variants the tool detects targets and regions affected by Loss-of-heterozygosity events. The tool can handle both matched tumor normal data and unpaired tumor data. In both cases variants that are assumed to be heterozygous in normal tissue has to be identified.
Tumor-normal pairs: For matched tumor normal data, a track with somatic variants and a track with germline variants will be used. The variants used to detect LOH are simply the somatic variants overlapping heterozygous germline variants.
Tumor only: For unpaired tumor data, a somatic variant track and a database of known segregating variants are used (typically dbSNP common). The variants used in LOH calculation are the somatic variants overlapping the variants in the database.
The model operates with a number of ploidy states, which are characterized by their numbers of parental and maternal alleles (Table 11.1). The state together with the tumor purity (the percentage of cells in the sample originating from the tumor) determines the expected coverage ratio and the expected allele frequencies of the heterozygous variants. As an example, if a normal diploid sample would yield 200 reads, then a sample with purity 50% and copy-number 1 (deletion) would yield 150 reads (50%*200+50%*100). That means the coverage ratio is 150/200 = 75%. Table 11.2 shows the expected coverage ratios for different states and purities.
The state together with tumor purity also determines the expected allele frequencies of heterozygous variants. As an example, consider a sample with 60% purity where the cancer cells contain a deletion in a region with two alleles, A and B. If we take 100 cells:
- 60 cells (tumor) will contain one copy of allele A
- 40 cells (normal) will contain one copy of allele A and one copy of B
The tool estimates the purity using a hidden Markov model (HMM), that is then used to predict the most probable state for each target.
|
|
|
Limitations
Detect Regional Ploidy is designed for ploidy estimation on autosomal chromosomes. The underlying model does not take into account that the normal state of sex chromosomes in male samples is haploid, and hence may mis-interpret detected allele frequencies and coverage ratios. If the tool is used to estimate ploidy for sex chromosomes, the results should be carefully assessed.
Subsections