How the protein structures are found

For each variant in the input, the Link Variants to 3D Protein Structure tool does the following, to prepare output for the "Link to 3D Protein Structure" column in the output variant track:

  1. Evaluate if the variant is found inside a CDS region. Otherwise the following is returned for the variant: (outside CDS regions).
  2. If the variant is in a CDS region, translate the reference sequence of the impacted gene into an amino acid sequence and evaluate if the variant can be expected to have an effect on protein structure that can be visualized. Overlapping genes (common in prokaryotic genomes) with different reading frames may cover a given variation, in which case multiple protein sequences will be considered.

    For variants that cannot be visualized, the gene name and one of the reasons given below will be listed in the output table:

    • (nonsense) - the variant would result in a stop codon being introduced in the protein sequence.
    • (synonymous) - the variant would not change the amino acid.
    • (frame shift) - the variant would introduce a frame shift.
  3. BLAST the translated amino acid sequence (the query sequence) against the protein structure sequence database (see Download 3D Protein Structure Database) to identify structural candidates. Note that if multiple splicing variants exist, the protein structure search is based on the longest splicing variant. BLAST hits with E-value > 0.0001 are rejected and a maximum of 2500 BLAST hits are retrieved. If no hits are obtained, the gene name and the message (no PDB hits) are listed.
  4. For each BLAST hit, check if the variant is covered by the structure. For a variant resulting in one amino acid being replaced by another, the affected amino acid position should be present on the structure. For a variant resulting in amino acid insertions or deletions, the amino acids on both sides of the insertion/deletion must be present on the structure.
  5. For the BLAST hits covering the variant, rank the structures considering both structure quality and homology (see Ranking structures).
  6. Add the gene name and the description of the amino acid change to the "Link Variant to 3D Protein Structure" column in the output variant track. A link on the description gives access to a 3D view of the variant effect using the best ranked protein structure (see Create 3D visualization of variant). Note that the amino acid numbering is based on the longest CDS annotation found.