The variant table output
For each variant in the input, the Link Variants to 3D Protein Structure tool does the following, to prepare output for the "Link to 3D protein structure" and "Effect on drug binding site" columns in the output variant track:
- Evaluate if the variant is found inside a CDS region. Otherwise the following is returned for the variant: (outside CDS regions).
- If the variant is in a CDS region, translate the reference sequence of the impacted gene into an amino acid sequence and evaluate if the variant can be expected to have an effect on protein structure that can be visualized. Overlapping genes (common in prokaryotic genomes) with different reading frames may cover a given variation, in which case multiple protein sequences will be considered.
For variants that cannot be visualized, the gene name and one of the reasons given below will be listed in the output table:
- (nonsense) - the variant would result in a stop codon being introduced in the protein sequence.
- (synonymous) - the variant would not change the amino acid.
- (frame shift) - the variant would introduce a frame shift.
- BLAST the translated amino acid sequence (the query sequence) against the protein structure sequence database (see Download 3D Protein Structure Database) to identify structural candidates. Note that if multiple splicing variants exist, the protein structure search is based on the longest splicing variant. BLAST hits with E-value > 0.0001 are rejected and a maximum of 2500 BLAST hits are retrieved. If no hits are obtained, the gene name and the message (no PDB hits) are listed.
- For each BLAST hit, check if the variant is covered by the structure. For a variant resulting in one amino acid being replaced by another, the affected amino acid position should be present on the structure. For a variant resulting in amino acid insertions or deletions, the amino acids on both sides of the insertion/deletion must be present on the structure.
- For the BLAST hits covering the variant, rank the structures considering both structure quality and homology (see Ranking structures).
- Add the gene name and the description of the amino acid change to the "Link variant to 3D protein structure" column in the output variant track. A link on the description gives access to a 3D view of the variant effect using the best ranked protein structure from point 5 (see Create 3D visualization of variant). Note that the amino acid numbering is based on the longest CDS annotation found.
- Extract all BLAST hits from point 5, where the affected amino acid(s) are in contact with a drug or ligand in the PDB file (heavy atoms within 5 Å). If no structures with variant-drug interaction are found, the following is returned to the "Effect on drug binding site" column: No drug hits together with the gene name and the description of the amino acid change. If structures with variant-drug interaction are found, the number of different drugs or ligands encountered are written to the "Effect on drug binding site" column as X drug hits. From a link on "X drug hits", a list describing the drug hits in more detail can be opened. The list also has a link for each drug, to create a 3D model and visualization of the variant-drug interaction, see Visualize drug interaction.