Import molecules
The supported file formats for importing molecular structures are:
- Mol2 (http://www.tripos.com/)
- Structure-data file (SDF) [Dalby et al., 1992]
- Protein Data Bank (PDB) (http://www.wwpdb.org/documentation/format33/v3.3.html)
Upon import, the imported data are converted to a CLC Molecule Project or CLC Molecule Table.
Molecule Projects () are used to work with a limited number of molecules in a 3D view, and is used to setup binding sites on proteins to use for docking and visualization of molecule interactions. Molecule Tables () are used to work with small molecules in a table view and can contain an unlimited number of molecules.
All importers assign basic chemical properties to the imported structures. This includes determining connectivity, bond order, assigning atom hybridization, and creating explicit hydrogens. For standard residues in proteins and nucleic acids, these properties are assigned based on a set of templates. Additionally, for proteins, secondary structure information can be read from PDB files if present - otherwise secondary structure is assigned using a built-in algorithm.
For small molecules, the following approach is used:
- Connectivity (covalent bonding) is based on any explicit bond information in the input file, but will also be automatically created for atoms sufficiently close to each other. The PDB importer will recognize and import covalently bound molecules as one single molecule.
- Assignment of atom hybridization is based on the geometry of the atom neighborhood (including any explicit hydrogens present). If Sybyl atom types are present in the input file, the importer will use the hybridization from these as a starting point.
- Bond order information may be present in the input file, but for file formats such as PDB, where bond order is not represented, the importers will assign bond orders based on atom hybridization, atom distances, and electronegativity. Notice, that even for file formats with explicit bond order information, the importers may change bond orders to better represent aromatic and delocalized systems.
- If no hydrogens are found on a molecule after import, explicit hydrogens will be created. Since some PDB files only contain hydrogen atoms for polar atoms, input molecules from PDB files are always checked for missing hydrogens.
- Partial atom charges are read from the input file if present. Otherwise charges are assigned according to a set of templates recognizing common chemical motifs. Notice, that charges are only used for visualization purposes - the force field used for molecular docking does not consider atom charges.
You can import molecule structures in seven different ways:
- Using the standard importer (see Using the standard importer)
- Using the Import Molecules with 3D Coordinates importer (see Using Import Molecules with 3D Coordinates)
- Using the Add molecules to Molecule Project importer (see Add Molecules to Molecule Project)
- From the Protein Data Bank (PDB) (see From the Protein Data Bank)
- Using BLAST search against the PDB database (see BLAST search against the PDB database)
- Using the Import Molecules from SMILES or 2D importer (see Using the Import Molecules from SMILES or 2D option)
- Copy-paste of SMILES strings (see Copy-paste of SMILES strings)
Subsections
- Using the standard importer
- Using Import Molecules with 3D Coordinates
- Add Molecules to Molecule Project
- From the Protein Data Bank
- BLAST search against the PDB database
- Import Molecules from SMILES or 2D
- Copy-paste of SMILES strings
- Generation of 3D structure on import
- Import issues