Calculate LOH and HRD (beta)

The Calculate LOH and HRD (beta) tool is designed to detect loss-of-heterozygosity (LOH) and Homologous recombination deficiency (HRD) from targeted research resequencing experiments.

The tool takes a target-level CNV events annotation track (from a CNV tool), somatic variants, and either germline variants or known segregating variants and optionally centromers.

To run the Calculate LOH and HRD (beta) tool, go to:

        Tools | Resequencing Analysis (Image resequencing) | Variant Detection (Image variant_detection_folder_closed_16_h_p) | Calculate LOH and HRD (beta) (Image calculate_loh_hrd_16_n_p)

Select the CNV target-level annotation track generated by a CNV tool and click Next.

You are now presented with choices regarding LOH detection.

Click Next to set the parameters related to HRD.

When finished with the settings, click Next to start the algorithm.

LOH detection

The algorithm implemented in the Calculate LOH and HRD (beta) tool is inspired by the following paper:

Based on coverage ratios and the allele ratios of putative heterozygous germline variants the tool detects targets and regions affected by Loss-of-heterozygosity events. The tool can handle both matched tumor normal data and unpaired tumor data. In both cases variants that are assumed to be heterozygous in normal tissue has to be identified.

Tumor-normal pairs: For matched tumor normal data, a track with somatic variants and a track with germline variants will be used. The variants used to detect LOH are simply the somatic variants overlapping heterozygous germline variants.

Tumor only: For unpaired tumor data, a somatic variant track and a database of known segregating variants are used (typically dbSNP common). The variants used in LOH calculation are the somatic variants overlapping the variants in the database.

The model operates with a number of ploidy states, which are characterized by their numbers of parental and maternal alleles (Table 25.5). The state together with the tumor purity (the percentage of cells in the sample originating from the tumor) determines the expected coverage ratio and the expected allele frequencies of the heterozygous variants. As an example, if a normal diploid sample would yield 200 reads, then a sample with purity 50% and copy-number 1 (deletion) would yield 150 reads (50%*200+50%*100). That means the coverage ratio is 150/200 = 75%. Table 25.6 shows the expected coverage ratios for different states and purities.

The state together with tumor purity also determines the expected allele frequencies of heterozygous variants. As an example, consider a sample with 60% purity where the cancer cells contain a deletion in a region with two alleles, A and B. If we take 100 cells:

In total there will be 100 copies of allele A, and 40 copies of B. And the frequency of A will be 100 / (100 + 40) = 71.4%.

The tool estimates the purity using a hidden Markov model (HMM), that is then used to predict the most probable state for each target.


Table 25.5: The expected frequencies of variants that are heterozygous in the normal tissue given tumor purity and the ploidy state.
State Allele-ratio Copy-number Loss-of-heterozygosity
Bi-allelic deletion 0:0 0  
Deletion 0:1 1 deletion LOH
Diploid 1:1 2  
Uniparental disomy 0:2 2 copy-neutral LOH
Duplication 1:2 3  
WGD 2:2 4  



Table 25.6: The expected frequencies of variants that are heterozygous in the normal tissue given tumor purity and the ploidy state.
Purity Bi-allelic deletion Deletion Diploid Uniparental disomy Duplication WGD
10.0% 90.0% 95.0% 100.0% 100.0% 105.0% 110.0%
20.0% 80.0% 90.0% 100.0% 100.0% 110.0% 120.0%
30.0% 70.0% 85.0% 100.0% 100.0% 115.0% 130.0%
40.0% 60.0% 80.0% 100.0% 100.0% 120.0% 140.0%
50.0% 50.0% 75.0% 100.0% 100.0% 125.0% 150.0%
60.0% 40.0% 70.0% 100.0% 100.0% 130.0% 160.0%
70.0% 30.0% 65.0% 100.0% 100.0% 135.0% 170.0%
80.0% 20.0% 60.0% 100.0% 100.0% 140.0% 180.0%
90.0% 10.0% 55.0% 100.0% 100.0% 145.0% 190.0%
100.0% 0.0% 50.0% 100.0% 100.0% 150.0% 200.0%



Table 25.7: The expected frequencies of variants that are heterozygous in the normal tissue given tumor purity and the ploidy state.
Purity Bi-allelic deletion Deletion Diploid Uniparental disomy Duplication WGD
10.0% 50.0% 52.6% 50.0% 55.0% 52.4% 50.0%
20.0% 50.0% 55.6% 50.0% 60.0% 54.5% 50.0%
30.0% 50.0% 58.8% 50.0% 65.0% 56.5% 50.0%
40.0% 50.0% 62.5% 50.0% 70.0% 58.3% 50.0%
50.0% 50.0% 66.7% 50.0% 75.0% 60.0% 50.0%
60.0% 50.0% 71.4% 50.0% 80.0% 61.5% 50.0%
70.0% 50.0% 76.9% 50.0% 85.0% 63.0% 50.0%
80.0% 50.0% 83.3% 50.0% 90.0% 64.3% 50.0%
90.0% 50.0% 90.9% 50.0% 95.0% 65.5% 50.0%
100.0%   100.0% 50.0% 100.0% 66.7% 50.0%


HRD calculation

The HRD score is a count of chromosomal rearrangements that can be increased in tumors with HRD. It is calculated as the weighted sum of three different chromosomal rearrangements: The number of Telomeric Allelic Imbalances (TAI), Large-scale Transitions (LST), and long regions of Loss of Heterozygosity (LOH). The calculations are based on identified regions of copy number variations (CNV) as well as variant frequencies in a sample, which are identified beforehand.

The LOH score counts long regions with a minor allele count of zero. LOH regions spanning whole chromosomes are excluded.

The TAI score is defined as the number of regions that:

Calculation of the three scores is inspired by:

The LST score is the number of LST events. The LST score counts large rearrangements for each arm of a chromosome. The regions are merged and short regions removed iteratively. For each chromosome arm, as long as there are segments less than 3 MB, the segment at the first position, that is less than 3 MB is removed and adjacent segments across the whole chromosome arm with identical allele counts merged.

Calculate LOH and HRD algorithm report

Loss-of-heterozygosity

This section provides information related to LOH calculation. The first table shows the estimated purity and normalization factor along with confidence intervals. Low purity or a wide confidence interval for purity is an indication that the LOH predictions are uncertain. In the next table the number of targets predicted to be in each ploidy state is shown.

The next two subsections provide information useful for diagnosing potential problems with LOH detection. First, the expected coverage log-ratios for each ploidy state are shown along with the average coverage log-ratios for targets predicted to have this state. The expected coverage log-ratios are simply computed as in table 25.6 based on the estimated purity. Below the table is a plot with coverage log-ratios plotted against the base coverage. The points are colored by their predicted state and horizontal lines indicate the expected log-coverage ratio for each state (Figure 25.26).

Image coverage_ratio_by_state
Figure 25.26: Log-coverage ratios for each target with horizontal lines indicating the expected log-coverage ratio.

Second, the expected allele frequencies for each ploidy state are shown along with the average allele frequency for variants predicted to have this state. Again the expected allele frequencies are computed as in table 25.7 based on the estimated purity. Below the table is a plot with allele frequencies plotted against their coverage. The points are colored by their predicted state and horizontal lines indicate the expected allele frequency for each state (Figure 25.27).

Image allele_frequencies_by_state
Figure 25.27: Allele frequencies for each putative heterozygous variant with horizontal lines indicating the expected allele frequencies.

HRD score

This section of the report is only included when HRD calculation is enabled in the Calculate LOH and HRD (beta) tool. The section provides a table listing the HRD score, as well as individual LOH, LST, and TAI scores. In the table is also listed the events that were counted to give the individual LOH, LST and TAI scores.

LOH regions included in the LOH score are listed in the row LOH regions. For each region, the chromosome and the start and end of the LOH region is included. As an example, the entry "2: 151M 169M" should be read as an LOH event on chromosome 2 occurring from position 51M to 169M.

Each transition included in the LST score is listed in the row LST. As an example, in the entry "S1: 1-2 0M 13M -> 1-1 13M 248M" the parts before and after the arrow describes the chromosomal states on each side of the transition and should be read as: Start of chromosome 1, minor allele count 1, major allele count 2, positions 0M-13M changes to minor allele count 1, major allele count 1, positions 13M to 248M.

For TAI, results are listed for each chromosome in the row TAI. As an example "S1 TAI 2 1-2", should be read as start of chromosome 1, TAI event, most prevalent copy number state for the whole chromosome is 2, for the TAI event minor allele count is 1 and major allele count is 2. Correspondingly, "E1 CENT 125M 248M" should be read as end of chromosome 1, region extends from end of chromosome to centromere and is not counted as TAI, positions 125M-248M and "E10 NO 2 1-1" should be read as end of chromosome 10, no TAI event, most prevalent copy number state for the whole chromosome is 2, and for the region closest to the end of the chromosome minor allele count is 1 and major allele count is 1. Hence, a TAI event is only counted when TAI is part of the annotation for a given chromosome arm.