Generation of 3D structure on import
This is a description of how the Balloon algorithm is used in CLC Drug Discovery Workbench, to generate 3D molecule structures on import. The algorithm is described in detail in [Vainio and Johnson, 2007].
Input: A SMILES string (or coordinates in 2D) describing the topology of a molecule.
Step 1: Generation of template structure with 3D coordinates. Minimum and maximum interatomic distances are set for all atom pairs based on the input. A special type of bounds are used to specify the chirality of stereochemical centers, if they are given in the input. The sum of the violations to the bounds is minimized, to get an initial structure in 3D. The structure is then refined, making a minimization of the conformational energy as calculated by the MMFF94 force field [Halgren, 1996].
Step 2: Generation of a conformer ensemble. Different conformations are generated by rotation about rotatable bonds, changes to stereochemistry of double bonds and tetrahedral chiral centers (for those not specified on input), and changes to ring conformations. A genetic algorithm is used to generate variations to the structure. A particular molecule conformation (phenotype) is defined by the values at the 'loci' (the genotype). For example, each rotatable bond has a locus specifying the rotation value, and each chiral center, not defined on input, has a locus specifying whether or not to invert the chirality compared to the template structure.
The genetic algorithm runs in 20 generations with a population of five individuals (diverse conformations). The fitness of an individual is evaluated based on both the torsional and van der Waals terms of the MMFF94 potential energy function [Halgren, 1996].
The first generation is constructed from random 'mutations' to the template structure. The steps 1-4 below are then repeated for each generation, to search through relevant conformers, and produce a diverse set with low energy.
- Parents are selected at random between the five individuals in the population, with a bias towards the best fit and towards promoting geometric diversity.
- The parent's genotypes are combined (via random crossovers) to produce five offspring.
- The offspring is then exposed to mutations, making small random changes to individual loci.
- The five offspring together with the five individuals from the parent generation are evaluated, and five individuals are selected for the next generation based on their fitness and geometric diversity.
Step 3: Post-processing. Strain introduced into the structures is relaxes using an MMFF94 force field, where the electrostatic term and the torsional term for rotatable bonds are left out. Conformational duplicates (RMSD < 0.5 Å) are removed, and so are structures whose strain energy remains above a predefined window from the minimum energy value found in the set. For a molecule with no rotatable bonds, the energy window is 5 kcal/mol. The energy window is increased by 0.25 kcal/mol for each rotatable bond present in the molecule.
Output: The molecule conformer with the lowest energy found or the final post-processed ensemble of low-energy conformers.