The Structural Variants and InDels output
The Structural Variants and InDels report
The report gives an overview of the numbers and types of structural variants found in the sample. It contains
- A table listing the total number of reads in the read mapping and the number of reads that were discarded based on length.
- A table with a row for each reference sequence, and information on the number of breakpoint signatures and structural variants found.
- A table giving the total number of left and right unaligned end breakpoint signatures found, and the total number of reads supporting them. Note that paired-end reads are counted once.
- A distribution of the logarithm of the sequence complexity of the unaligned ends of the left and right breakpoint signatures (see section 29.10.5 for how the complexity is calculated).
- A distribution of the length of the unaligned ends of the left and right breakpoint signatures.
- A table giving the total number of the different types of structural variants found.
- Plots depicting the distribution of the lengths of structural variants identified.
The Breakpoint track (BP)
The breakpoint track contains a row for each called breakpoint with the following information:
- Chromosome The chromosome on which the breakpoint is located.
- Region The location on the chromosome of the breakpoint.
- Name The type of the breakpoint ('left breakpoint' or 'right breakpoint').
- p-value The p-value (in the Binomial distribution) of the unaligned end breakpoint.
- Unaligned The consensus sequence of the unaligned ends at the breakpoint.
- Unaligned length The length of the consensus sequence of the unaligned ends at the breakpoint.
- Mapped to self If the unaligned end sequence at the breakpoint was found to map back to the reference in the vicinity of the breakpoint itself, a 'Deletion' or 'Insertion' based on 'self-mapping' evidence is called. This column will contain 'Deletion' or 'Insertion' if that is the case, or be empty if the unaligned end did not map back to the reference in the vicinity of the breakpoint itself.
- Perfect mapped The number of 'perfect mapped' reads (paired-end reads count as one). This number is intended as a proxy for the number of reads that fit with the reference sequence. When calculating this number we consider all reads that extend across the breakpoint. We ignore reads that are non-specifically mapped, in a broken pair, or has more than the maximum number of mismatches. A read is perfectly mapped if (1) it has no insertions or deletions (mismatches are allowed) and (2) it has no unaligned end.
- Not perfect mapped The number of 'not perfect mapped' reads (paired-end reads count as one). This number is intended as a proxy for the number of reads that fit with the predicted indel. When calculating this number we consider all reads that extend across the breakpoint or that has an unaligned end starting at the breakpoint. We ignore reads that are non-specifically mapped, in a broken pair, or has more than the maximum number of mismatches. A read is not perfect mapped if (1) it has an insertion or deletion or (2) it has an unaligned end.
- Fraction non-perfectly mapped the 'Non perfect mapped' divided by the 'Non perfect mapped' + 'Perfect mapped'.
- Sequence complexity The sequence complexity of the unaligned end of the breakpoint (see section 29.10.5 for how the sequence complexity is calculated).
- Reads The number of reads supporting the breakpoint (paired-end reads count as one).
Note that typically, breakpoints will be found for which it is not possible to infer a structural variant. There may be a number of reasons for that: (1) the unaligned ends from which the breakpoint signature was derived might not be caused by an underlying structural variant, but merely be due to read mapping issues or noise, or (2) the breakpoint(s) which the detected breakpoint should have been matched to was/were not detected, and therefore no matching breakpoint(s) were found. Breakpoints may go un-detected either because of lack of coverage in the breakpoint region or because they are located within regions with exclusively non-uniquely mapped reads (only unaligned ends of uniquely mapping reads are used).
The InDel variant track (InDel)
The Indel variant track contains a row for each of the called insertions or deletions. These are the small to medium sized insertions, as well as deletions up to 405 bp in length, for which the algorithm was able to identify the allele sequence, i.e., the exact inserted or deleted sequence.
For insertions, the full allele sequence is found from the unaligned ends of mapped reads. For some insertions, the length and allele sequence cannot be determined and as these do not fulfill the requirements of a 'variant', they do not qualify for representation in the InDel Variant track but instead appear in the Structural Variant track (see below).
The information provided for each of the indels in the InDel Variant track is the 'Chromosome', 'Region', 'Type', 'Reference', 'Allele', 'Reference Allele', 'Length' and 'Zygosity' columns that are provided for all variants (see section 29.6.1). Note that the Zygosity field is set to 'Homozygous' if the 'Variant ratio' is 0.80 or above, and 'Heterozygous' otherwise.
In addition, the track provides the following information, primarily to assess the degree of evidence supporting each predicted indel:
- Evidence The mapping evidence on which the call of the indel was based. This may be either 'Self mapped', 'Paired breakpoint', Cross mapped breakpoint' or 'Tandem duplication' depending of the mapping signature of the unaligned ends of the breakpoint(s) from which the indel was inferred.
- Repeat The algorithm attempts to identify if the variant sequence contains perfect repeats. This is done by searching the region around the structural variant for perfect repeat sequences. The region searched is 3 times the length of variant around the insertion/deletion point. The maximum repeat length searched for is 10. If a repeat sequence is found, the repeated sequence is given in this column. If not, the column is empty.
- Variant ratio This column contains the sum of the 'Non perfect mapped' reads for the breakpoints used to infer the indel, divided by the sum of the 'Non perfect mapped' and 'Perfect mapped' reads for the breakpoints used to infer the indel (see section the description above of the breakpoint track). This fraction is intended to give a hint towards the zygosity of the indel. The closer the value to 1, the higher the likelihood that the variant is homozygous.
- # Reads The total number of reads supporting the breakpoints from which the indel was constructed (paired-end reads count as one).
- Sequence complexity The sequence complexity of the unaligned end of the breakpoint (see section 29.10.5). Indels with higher complexity are typically more reliable than those with low complexity.
The Structural Variant track (SV)
The Structural Variant track contains a row for each of the called structural variants that are not already reported in the InDel track. It contains the following information:
- Chromosome The chromosome on which the structural variant is located.
- Region The location on the chromosome of the structural variant.
- Name The type of the structural variant ('deletion', 'insertion', 'inversion', 'replacement', 'translocation' or 'complex').
- Evidence The breakpoint mapping evidence, i.e., the 'unaligned end' signature on which the call of the structural variant was based. This may be either 'Self mapped', 'Paired breakpoint', 'Cross mapped breakpoints', 'Cross mapped breakpoints (invalid orientation)', 'Close breakpoints', 'Multiple breakpoints' or 'Tandem duplication', depending on which type of signature that was found.
- Length the length of the allele sequence of the structural variant. Note that the length of variants for which the allele sequence could not be determined is reported as 0 (e.g insertions inferred from 'Close breakpoints').
- Reference sequence The sequence of the reference in the region of the structural variant.
- Variant sequence The allele sequence of the structural variant if it is known. If not, the column will be empty.
- Repeat The same as in the InDel track.
- Variant ratio The same as in the InDel track.
- Signatures The number of unaligned breakpoints involved in the signature of the structural variant. In most cases these will be pairs of breakpoints, and the value is 2, however some structural variants that have signatures involving more than two breakpoint (see here). Typically structural variants of type 'complex' will be inferred from more than 2 breakpoint signatures.
- Left breakpoints The positions of the 'Left breakpoints' involved in the signature of the structural variant.
- Right breakpoints The positions of the 'Right breakpoints' involved in the signature of the structural variant.
- Mapping scores fraction The mapping scores of the unaligned ends for each of the breakpoints. These are the similarity values between the unaligned end and the region of the reference to which it was mapped. The values lie between 0 and 1. The closer the value is to 1, the better the match, suggesting better reliability of the inferred variant.
- Reads The total number of reads supporting the breakpoints from which the indels was constructed.
- Sequence complexity The sequence complexity of the unaligned end of the breakpoint (see section 29.10.5).
- Split group Some structural variants extend over a very large a region. For these visualization is challenging, and instead of reporting them in a single row we split them in multiple rows - one for each 'end' of the variant. To allow the user to see which of these 'split features' belong together, we give features that belong to the same structural variant a common 'split group' identifier. If the column is empty the structural variant is not split, but contained within a single row.