Table of Contents
SMILES, SMARTS, and SMIRKS are trademarks of Daylight Chemical Information Systems.
SYBYL is a registered trademark of Tripos, Inc..
CML is trademarks of P. Murray-Rust and H. S. Rzepa (http://www.xml-cml.org).
Python is a trademark of the Python Software Foundation.
MDL is a registered trademark in the United States of MDL Information Systems, Inc..
Other products and software packages referenced in this document are registered trademarks of their respective manufacturers.
Table of Contents
The basic building blocks of a molecular modeling system are atoms, bonds, and molecules. In an application NTSChem would build molecules by reading and parsing a molecule file. After this you can analyze and/or transform the molecular information and then write the result back into a file.
To start the construction of a methane molecule is shown. First you create an atom.
c = StaticAtom()
One of the most important property of a atom is the element property. You can assign the element for this atom like this:
c.setElement(getElements().fetchElement("C"))
The global function getElements gets the periodic table of the elements.
To build a complete methane molecule you have to create four more atoms and assign the element property for hydrogen.
h1 = StaticAtom()
h1.setElement(getElements().fetchElement("H"))
h2 = StaticAtom()
h2.setElement(getElements().fetchElement("H"))
h3 = StaticAtom()
h3.setElement(getElements().fetchElement("H"))
h3 = StaticAtom()
h3.setElement(getElements().fetchElement("H"))
To build a methane molecule we have to create a molecule and insert the carbon atom and hydrogen atoms.
mol = Molecule()
mol.insert(c)
mol.insert(h1)
mol.insert(h2)
mol.insert(h3)
mol.insert(h4)
At this point the methane molecule contain four atoms, but there are no bonds between the atoms. So we have to create the bonds:
b1 = Bond(o, h1)
b2 = Bond(o, h2)
b3 = Bond(o, h3)
b4 = Bond(o, h4)
Now the methane molecule is build and we can verify same properties.
print mol.countAtoms()
print c.countBonds()
print h1.countBonds()
print h2.countBonds()
print h3.countBonds()
print h4.countBonds()
At this point the molecule does not contain information of the geometry. The default value of the position properties the the null vector. So we have to set positions of the hydrogen atoms.
h1.setPosition(Vector3( 0.0000, 0.0000, -1.0836))
h2.setPosition(Vector3( 0.0000, 1.0216, 0.3612))
h3.setPosition(Vector3( 0.8848, -0.5108, 0.3612))
h4.setPosition(Vector3(-0.8848, -0.5108, 0.3612))
To print the distance between the carbon and the hydrogen atom we build first the distance vector and then evaluate the norm the resulting vector.
vec = c.getPosition() - h1.getPosition()
print vec.norm()
The complete program looks like:
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
import os
c = StaticAtom()
c.setElement(getElements().fetchElement("C"))
h1 = StaticAtom()
h1.setElement(getElements().fetchElement("H"))
h2 = StaticAtom()
h2.setElement(getElements().fetchElement("H"))
h3 = StaticAtom()
h3.setElement(getElements().fetchElement("H"))
h4 = StaticAtom()
h4.setElement(getElements().fetchElement("H"))
h1.setPosition(Vector3( 0.0000, 0.0000, -1.0836))
h2.setPosition(Vector3( 0.0000, 1.0216, 0.3612))
h3.setPosition(Vector3( 0.8848, -0.5108, 0.3612))
h4.setPosition(Vector3(-0.8848, -0.5108, 0.3612))
mol = Molecule()
mol.insert(c)
mol.insert(h1)
mol.insert(h2)
mol.insert(h3)
mol.insert(h4)
b1 = Bond(c, h1)
b2 = Bond(c, h2)
b3 = Bond(c, h3)
b4 = Bond(c, h4)
print mol.countAtoms()
print c.countBonds()
print h1.countBonds()
print h2.countBonds()
print h3.countBonds()
print h4.countBonds()
vec = c.getPosition() - h1.getPosition()
print vec.norm()
Output of firststeps.py
5
4
1
1
1
1
1.08360004425
Table of Contents
Components are atoms, bonds or molecules. Containers for components are composites. NTSChem provides two iterators for lopping over components in a composite. The one iterates over all components in a composite, the other one iterates only over components in the first level. That is components which are direct children.
For each successive iteration through the loop, these iterators return the next atom and/or bond respectively, or more generally the next component or composite type specified in the factory method of the iterator.
The following example shows how to traverse atoms and print out their atomic symbols und then shows how to traverse bonds and print their bond order.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
import os
mol = fromSmiles("c1ccccc1")
print "Atoms :"
for atom in CompositeIteratorFactory.create(Atom, mol):
print atom.getIndex(), atom.getElement().getSymbol()
print "Bonds :"
for bond in CompositeIteratorFactory.create(Bond, mol):
print bond.getOrder()
Output of looping.py:
Atoms :
0 C
1 C
2 C
3 C
4 C
5 C
Bonds :
ORDER_AROMATIC
ORDER_AROMATIC
ORDER_AROMATIC
ORDER_AROMATIC
ORDER_AROMATIC
ORDER_AROMATIC
The method getBonds of the class Atom creates a bond iterator, which can be used to loop over bonds. The example shows how this iterator loops over bonds of an atom to determine the degree of an atom.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
import os
def getExplicitDegree(atom):
result = 0
for bond in BondIterator(atom):
result += 1
return result
mol = fromSmiles("c1ccccc1")
for atom in CompositeIteratorFactory.create(Atom, mol):
print "Atom ", atom.getIndex(), getExplicitDegree(atom)
Output of loopingbonds.py:
Atom 0 2
Atom 1 2
Atom 2 2
Atom 3 2
Atom 4 2
Atom 5 2
If you only interested in the number of bonds the code can be rewritten in a very compact from with using itertools.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from itertools import *
import os
print "Number of Bonds = ", sum(
imap(bool, CompositeIteratorFactory.create(Atom,
fromSmiles("c1ccccc1"))))
Output of loopingbonds2.py:
Number of Bonds = 6
It is frequent not the connections around an atom, that you wish to iterate, but over neighboring atoms. This can be done by the atom iterator.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
import os
def showNeighbors(atom):
for nbor in AtomIterator(atom):
print nbor.getIndex(),
print
mol = fromSmiles("C1CCCCC1")
AssignIndexProcessor()(mol)
for atom in CompositeIteratorFactory.create(Atom, mol):
print atom.getIndex(), showNeighbors(atom)
Output of loopingneighbors.py:
0 1 5
None
1 0 2
None
2 1 3
None
3 2 4
None
4 3 5
None
5 0 4
None
Iterators can not only be used to iterate over atoms or bonds. This is be shown in the following example. The program prints out the amino-acid sequence of Hemoglobin.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
import textwrap
import os
mol = PDB(os.environ['EXAMPLESPATH'] + os.sep + 'Hemoglobin.pdb')
for p in ComponentIteratorFactory.create(Protein, mol) :
print "Chain : "
s = ""
for c in ComponentIteratorFactory.create(Chain, p) :
for r in ComponentIteratorFactory.create(Residue, c) :
s = s + r.getName() + " "
print textwrap.fill(s, width=40)
print
Output of loopingresidues.py:
Chain :
VAL LEU SER GLU GLY GLU TRP GLN LEU VAL
LEU HIS VAL TRP ALA LYS VAL GLU ALA ASP
VAL ALA GLY HIS GLY GLN ASP ILE LEU ILE
ARG LEU PHE LYS SER HIS PRO GLU THR LEU
GLU LYS PHE ASP ARG PHE LYS HIS LEU LYS
THR GLU ALA GLU MET LYS ALA SER GLU ASP
LEU LYS LYS HIS GLY VAL THR VAL LEU THR
ALA LEU GLY ALA ILE LEU LYS LYS LYS GLY
HIS HIS GLU ALA GLU LEU LYS PRO LEU ALA
GLN SER HIS ALA THR LYS HIS LYS ILE PRO
ILE LYS TYR LEU GLU PHE ILE SER GLU ALA
ILE ILE HIS VAL LEU HIS SER ARG HIS PRO
GLY ASP PHE GLY ALA ASP ALA GLN GLY ALA
MET ASN LYS ALA LEU GLU LEU PHE ARG LYS
ASP ILE ALA ALA LYS TYR LYS GLU LEU GLY
TYR GLN GLY
Chain :
HEM
Table of Contents
A special kind of functions is a predicate. Predicates are functions that return a boolean value. In NTSChem, these functions can be passed into another function to specify a algorithm for example selection or searching. NTSChem provides many predefined predicates for classes that are derived from the class Object.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from itertools import *
import os
class HasAtomicNumber:
def __init__(self, n):
self.n = n
def __call__(self, a):
return a.getElement().getAtomicNumber() == self.n
def quantify(seq, pred=bool):
return sum(imap(pred, seq))
mol = fromSmiles("c1c(C)c(O)c(N)cc1")
print "Number of Oxygens = ", quantify(
CompositeIteratorFactory.create(Atom, mol), IsOxygen())
print "Number of Carbons = ", quantify(
CompositeIteratorFactory.create(Atom, mol), HasAtomicNumber(6))
Output of count.py:
Number of Oxygens = 1
Number of Carbons = 7
The script count loops over all atoms and execute the predicate IsOxygen and HasAtomicNumber for each atom. The first predicate is predefined in NTSChem, the second is defined in the script. As you also see in the script the standard framework itertools with the quantify function for predicates is used.
In a straightforward programming stile one can do this by iterating over all atoms or bonds and by selecting the atoms or bonds of intrest in the inner loop. A more clear and better method is to use predicates and itertools.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from itertools import *
import os
def quantify(seq, pred = bool):
return sum(imap(pred, seq))
mol = fromSmiles("c1c(C)c(O)c(N)cc1")
print "Carbon atoms:",
for atom in ComponentIteratorFactory.create(Atom, mol):
if IsCarbon()(atom):
print atom.getIndex(),
print
print "Number Carbon atoms:",
print quantify(ComponentIteratorFactory.create(Atom, mol), IsCarbon())
Output of loopingpredicate.py:
Carbon atoms: 0 1 2 3 5 7 8
Number Carbon atoms: 7
The following example shows a user defined predicate which shows all atoms whose atomic mass is greater than 15. This example use the iterator filter ifilter from itertools.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
from itertools import *
import os
class AtomWeight:
def __init__(self, weight):
self.weight = weight
def __call__(self, a):
return a.getElement().getAtomicWeight() > self.weight
mol = fromSmiles("c1c(C)c(O)c(N)cc1")
AssignIndexProcessor()(mol)
for atom in ifilter(AtomWeight(12), CompositeIteratorFactory.create(Atom, mol)):
print "Atom with index ", atom.getIndex(), " and weight ", \
atom.getElement().getAtomicWeight(), "has a weight > 12."
Output of atomweight.py:
Atom with index 4 and weight 16.0 has a weight > 12.
Atom with index 6 and weight 14.0 has a weight > 12.
This example is very simple, but as you can imagine you can build more complex user defined predicates.
For more complex inquiries it is necessarily to connect several predicates with logical operations. For this the logical predicates And, Or and Not are predefined by NTSChem.
The following example demonstrates use of the And and Not composition predicates with two of the predefined atom predicates.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from itertools import *
import os
def quantify(seq, pred = bool):
return sum(imap(pred, seq))
mol = fromSmiles("c1c(C)c(O)c(N)cc1")
print "Number of Aromatic Oxygens = ",
print quantify(CompositeIteratorFactory.create(Atom, mol),
And(IsOxygen(), IsAromatic()))
print "Number of Non-Carbons = ",
print quantify(CompositeIteratorFactory.create(Atom, mol),
Not(IsCarbon()))
Output of composedpredicate.py:
Number of Aromatic Oxygens = 0
Number of Non-Carbons = 2
Table of Contents
Very atom can be assigned to an element. This can be done by the setElement member function. A unique instance of the table of elements can be fetched the the global defined function getElements. The following script prints the "Periodic Table of Elements" in the simple table form. At run time only one instance of the Periodic Table of Elements exists.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
import os
print '%10s%20s%10s%10s' % ('Symbol', 'AtomicNumber', 'Row', 'Column')
for element in getElements().getElements() :
print '%10s%20d%10d%10d' % (element.getSymbol(),
element.getAtomicNumber(),
element.getRow(),
element.getColumn())
Output of elements.py
Symbol AtomicNumber Row Column
H 1 1 1
He 2 18 1
Li 3 1 2
Be 4 2 2
B 5 13 2
C 6 14 2
N 7 15 2
O 8 16 2
F 9 17 2
Ne 10 18 2
Na 11 1 3
Mg 12 2 3
Al 13 13 3
Si 14 14 3
P 15 15 3
S 16 16 3
Cl 17 17 3
Ar 18 18 3
K 19 1 4
Ca 20 2 4
Sc 21 3 4
Ti 22 4 4
V 23 5 4
Cr 24 6 4
Mn 25 7 4
Fe 26 8 4
Co 27 9 4
Ni 28 10 4
Cu 29 11 4
Zn 30 12 4
Ga 31 13 4
Ge 32 14 4
As 33 15 4
Se 34 16 4
Br 35 17 4
Kr 36 18 4
Rb 37 1 5
Sr 38 2 5
Y 39 3 5
Zr 40 4 5
Nb 41 5 5
Mo 42 6 5
Tc 43 7 5
Ru 44 8 5
Rh 45 9 5
Pd 46 10 5
Ag 47 11 5
Cd 48 12 5
In 49 13 5
Sn 50 14 5
Sb 51 15 5
Te 52 16 5
I 53 17 5
Xe 54 18 5
Cs 55 1 6
Ba 56 2 6
La 57 3 6
Hf 72 4 6
Ta 73 5 6
W 74 6 6
Re 75 7 6
Os 76 8 6
Ir 77 9 6
Pt 78 10 6
Au 79 11 6
Hg 80 12 6
Tl 81 13 6
Pb 82 14 6
Bi 83 15 6
Po 84 16 6
At 85 17 6
Rn 86 18 6
Table of Contents
The following program lists all supported molecular file types. All theses file types can be readed and written from NTSChem.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
import os
print "Reader :"
for s in getStructureReaderRegistry().getFileTypes() :
print "%5s : %30s" % (s, getFileTypeToDescription().getDescription(s)),
", File extension : ",
for e in getExtToFileType().getExtensions(s) :
print "%5s" % (e),
print
print
print "Writer :"
for s in getWriterRegistry().getFileTypes() :
print "%5s : %30s" % (s, getFileTypeToDescription().getDescription(s)),
", File extension : ",
for e in getExtToFileType().getExtensions(s) :
print "%5s" % (e),
print
Output of supportedfiletypes.py:
Reader :
PDB : Brookhaven PDB ent pdb
MOL : MDL MOL mdl mol sd
MOL2 : Sybyl MOL2 mol2
XYZ : XYZ Coordinate xyz xyzr
ALC : Alchemy alc
CML : Chemical Markup Language cml xml
SMI : Smiles smi
SDF : MDL SDF sdf
Writer :
PDB : Brookhaven PDB ent pdb
MOL : MDL MOL mdl mol sd
MOL2 : Sybyl MOL2 mol2
XYZ : XYZ Coordinate xyz xyzr
ALC : Alchemy alc
CML : Chemical Markup Language cml xml
SMI : Smiles smi
Table of Contents
To process rings in NTSChem one can test on ring/cycle membership. The NTSChem processor AssignInRingProcessor is used, in order to determine, which atoms and connections of members of one or more cycles and are acyclic. As soon as AssignInRingProcessor was called, atoms or bonds can be examined for presence in a ring, by either the atom IsInRing or the BondisInRing methods.
To provide an example of how one might use the above functions, the code below returns the number of ring bonds attached to an atom. This is useful for identifying ring fusion atoms (ring bond count = 3) and potential spiro-atoms (ring bond count >= 4).
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
from itertools import *
import os
def quantify(seq, pred = bool):
"Count how many times the predicate is True in the sequence"
return sum(imap(pred, seq))
mol = fromSmiles("c1ccccc1c2ccccc2")
AssignIndexProcessor()(mol)
for atom in CompositeIteratorFactory.create(Atom, mol):
print "atom ", atom.getIndex(), atom.getInRing(),
quantify(AtomIterator(atom), IsInRing())
print
rings = BuildRingsProcessor()(mol)
print "Number of rings : ", \
quantify(ComponentIteratorFactory.create(Composite, rings))
for ring in ComponentIteratorFactory.create(Composite, rings) :
print "Ring size = ", \
quantify(ComponentIteratorFactory.create(Atom, ring))
for atom in ComponentIteratorFactory.create(Atom, ring) :
print "Atom ", atom.getIndex()
Output of countringbonds.py
atom 0 True
atom 1 True
atom 2 True
atom 3 True
atom 4 True
atom 5 True
atom 6 True
atom 7 True
atom 8 True
atom 9 True
atom 10 True
atom 11 True
Number of rings : 2
Ring size = 6
Atom 0
Atom 1
Atom 2
Atom 3
Atom 4
Atom 5
Ring size = 6
Atom 6
Atom 7
Atom 8
Atom 9
Atom 10
Atom 11
Table of Contents
In order to separate molecules that can be split up into separate connected components, e.g. a ligand from a protein, NTSChem makes processor BuildCompositesProcessor available.
The following provides a short example of how to use this processor.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
from itertools import *
import os
import textwrap
def quantify(seq, pred = bool):
return sum(imap(pred, seq))
mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'mols.xyz')
AssignIndexProcessor()(mol)
BuildBondsProcessor()(mol)
parts = BuildCompositesProcessor()(mol)
print "The molecule has %d components\n" % \
quantify(ComponentIteratorFactory.create(Composite, parts))
for part in ComponentIteratorFactory.create(Composite, parts) :
print "Molecule with Atoms :"
s = ""
for atom in ComponentIteratorFactory.create(Atom, part) :
s = s + str(atom.getIndex()) + " "
print textwrap.fill(s, width=40)
Output of connectedcomponents.py
The molecule has 4 components
Molecule with Atoms :
42 41 40 39 38 37 36 35 34 33 32 31 30
29 28 27 26 25 24 23 22 21 20 19 18 17
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
42 41 40 39 38 37 36 35 34 33 32 31 30
29 28 27 26 25 24 23 22 21 20 19 18 17
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Molecule with Atoms :
82 81 80 79 78 77 76 75 74 73 72 71 70
69 68 67 66 65 64 63 62 61 60 59 58 57
56 55 54 53 52 51 50 49 48 47 46 45 44
43 82 81 80 79 78 77 76 75 74 73 72 71
70 69 68 67 66 65 64 63 62 61 60 59 58
57 56 55 54 53 52 51 50 49 48 47 46 45
44 43
Molecule with Atoms :
107 106 105 104 103 102 101 100 99 98 97
96 95 94 93 92 91 90 89 88 87 86 85 84
107 106 105 104 103 102 101 100 99 98 97
96 95 94 93 92 91 90 89 88 87 86 85 84
Molecule with Atoms :
147 146 145 144 143 142 141 140 139 138
137 136 135 134 133 132 131 130 129 128
127 126 125 124 123 122 121 120 119 118
117 116 115 114 113 112 111 110 109 108
147 146 145 144 143 142 141 140 139 138
137 136 135 134 133 132 131 130 129 128
127 126 125 124 123 122 121 120 119 118
117 116 115 114 113 112 111 110 109 108
Table of Contents
Some known file formats are presenting atomic type information and bond orders as attributes for atoms and/or bonds. NTSChem maps external atomic types and bond orders of the different file formats to a type ID. This ensures, which the same atom types finds with the expenditure use.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
import os
mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'caffeine.xyz')
BuildBondsProcessor()(mol)
AssignBondOrderProcessor()(mol)
AssignTypeProcessor()(mol)
for atom in CompositeIteratorFactory.create(Atom, mol) :
print atom.getType('INT')
for bond in CompositeIteratorFactory.create(Bond, mol) :
print bond.getOrder()
Output of readwithtypes.py:
HC
Nam
C2
Nam
C2
C2
C2
Npl
O2
C3
O2
HC
C3
HC
HC
HC
HC
HC
C3
C2
Npl
HC
HC
HC
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE
Atoms with same environment and thus with comparable characteristics are assigned to an atomic type. If one knows the atomic types and bond orders one can determine for example the parameterizing the forces for a MD simulation.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
import os
mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'viagra.xyz')
AssignIndexProcessor()(mol)
BuildBondsProcessor()(mol)
AssignBondOrderProcessor()(mol)
for bond in CompositeIteratorFactory.create(Bond, mol) :
print bond.getFirst().getIndex(), bond.getSecond().getIndex(), \
bond.getOrder()
Output of bonds.py:
0 1 ORDER_SINGLE
0 5 ORDER_SINGLE
0 7 ORDER_SINGLE
1 2 ORDER_SINGLE
1 33 ORDER_SINGLE
1 34 ORDER_SINGLE
2 3 ORDER_SINGLE
2 35 ORDER_SINGLE
2 36 ORDER_SINGLE
3 4 ORDER_SINGLE
3 6 ORDER_SINGLE
4 5 ORDER_SINGLE
4 37 ORDER_SINGLE
4 38 ORDER_SINGLE
5 39 ORDER_SINGLE
5 40 ORDER_SINGLE
6 41 ORDER_SINGLE
6 42 ORDER_SINGLE
6 43 ORDER_SINGLE
7 8 ORDER_DOUBLE
7 9 ORDER_DOUBLE
7 10 ORDER_SINGLE
10 11 ORDER_SINGLE
10 15 ORDER_DOUBLE
11 12 ORDER_DOUBLE
11 44 ORDER_SINGLE
12 13 ORDER_SINGLE
12 45 ORDER_SINGLE
13 14 ORDER_DOUBLE
13 16 ORDER_SINGLE
14 15 ORDER_SINGLE
14 19 ORDER_SINGLE
15 46 ORDER_SINGLE
16 17 ORDER_SINGLE
17 18 ORDER_SINGLE
17 47 ORDER_SINGLE
17 48 ORDER_SINGLE
18 49 ORDER_SINGLE
18 50 ORDER_SINGLE
18 51 ORDER_SINGLE
19 20 ORDER_DOUBLE
19 24 ORDER_SINGLE
20 21 ORDER_SINGLE
21 22 ORDER_DOUBLE
21 25 ORDER_SINGLE
22 23 ORDER_SINGLE
22 27 ORDER_SINGLE
23 24 ORDER_SINGLE
23 28 ORDER_DOUBLE
24 52 ORDER_SINGLE
25 26 ORDER_DOUBLE
25 30 ORDER_SINGLE
26 27 ORDER_SINGLE
27 29 ORDER_SINGLE
29 53 ORDER_SINGLE
29 54 ORDER_SINGLE
29 55 ORDER_SINGLE
30 31 ORDER_SINGLE
30 56 ORDER_SINGLE
30 57 ORDER_SINGLE
31 32 ORDER_SINGLE
31 58 ORDER_SINGLE
31 59 ORDER_SINGLE
32 60 ORDER_SINGLE
32 61 ORDER_SINGLE
32 62 ORDER_SINGLE
With the processors AssignBondOrderProcessor and AssignTypeProcessor one can assign type information from file formats, which do not contain this type of information. In the following example also the BuildBondProcessor becomes necessary, since the xyz file format does not contain atomic bonds.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
import os
mol = MOL2(os.environ['EXAMPLESPATH'] + os.sep + 'histidine.mol2')
AssignBondOrderProcessor()(mol)
AssignTypeProcessor()(mol)
for atom in CompositeIteratorFactory.create(Atom, mol) :
print atom.getType('SYB')
for bond in CompositeIteratorFactory.create(Bond, mol) :
print bond.getOrder()
Output of types.py:
N.4
C.3
C.3
C.2
O.2
C.2
N.pl3
C.2
C.2
N.pl3
H
O.2
H
H
H
H
H
H
H
H
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
Table of Contents
The most common method describing a substructure to search for is to use a SMILES string.
The following example creates a molecule from a SMILES string and prints out the match result. The result is true if one molecule contain a benzene ring (c1ccccc1).
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.graph import *
from nanotechsoftware.chemistry.match import *
import os
mol1 = fromSmiles("c1ccccc1")
AssignIndexProcessor()(mol1)
mol2 = fromSmiles("c1ccccc1c2ccccc2")
AssignIndexProcessor()(mol2)
g1 = createGraphWithComparators(createGraphEdit(mol1))
g2 = createGraphWithComparators(createGraphEdit(mol2))
print singleMatch(g1, g2)
Output of matchsingle.py
True
NTSChem makes a rich set of functions available for discovery of sub structures. One common task is to search for all matches between one reference structure and a goal molecule.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.graph import *
from nanotechsoftware.chemistry.match import *
import os
mol1 = fromSmiles("c1c(O)cccc1")
mol2 = fromSmiles("c1c(O)cccc1c2ccccc2")
g1 = createGraphWithComparators(createGraphEdit(mol1))
g2 = createGraphWithComparators(createGraphEdit(mol2))
c = g1.matchAll(g2)
print "Number of matches ", c.size()
for mol in c:
print "Atoms ",
for a in ComponentIteratorFactory.create(Atom, mol):
print a.getIndex(),
print
print
Output of matchall.py
Number of matches 2
Atoms 0 1 2 3 4 5 6
Atoms 3 1 2 0 6 5 4
While each match is returned by the generator method, a second loop can determine either the atoms or connections of the matches.
You will possibly be surprised why there are two matches. The reason is that there is a symmetry operation that transforms the ring to itself. If you are only interested in the match (that is usaly the case) you can use the matchTotalUnique function.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.graph import *
from nanotechsoftware.chemistry.match import *
import os
def createGraphFromSMILES(s):
return createGraphWithComparators(createGraphEdit(fromSmiles(s)))
c = createGraphFromSMILES("c1c(O)cccc1").matchTotalUnique(createGraphFromSMILES("c1c(O)cccc1c2ccccc2"))
print "Number of matches ", c.size()
for mol in c:
print "Atoms ",
for a in ComponentIteratorFactory.create(Atom, mol):
print a.getIndex(),
print
Output of matchtotalunique.py
Number of matches 1
Atoms 0 1 2 3 4 5 6
One common task is to formulate a query with same properties against a molecular structure. This can be done with so called SMARTS strings. In the following example all substructures with an oxygen atom bonded to an other atom is searched.
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.match import *
from nanotechsoftware.chemistry.graph import *
import os
mol = fromSmiles("c1cc(O)ccc1")
AssignIndexProcessor()(mol)
g = createGraphWithComparators(createGraphEdit(mol))
c = SMARTS("*O").matchTotalUnique(g)
for m in c:
for a in ComponentIteratorFactory.create(Atom, m):
print "Atom", a.getIndex()
print
Output of querywithsmarts.py
Atom 2
Atom 3
SMARTS are a convenient way a create a query for searching substructures. But it is also possible to create a query directly.
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.predicate import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.match import *
from nanotechsoftware.chemistry.graph import *
import os
class IsTrue:
def __call__(self, a):
return 1
pe = PredicateGraphEdit()
i = pe.insertNode(AtomPythonWrapperPredicate(IsTrue()))
j = pe.insertNode(AtomPythonWrapperPredicate(IsOxygen()))
pe.insertEdge(i, j, BondPythonWrapperPredicate(IsTrue()))
pg = createGraphPredicateWithComparators(pe)
mol = fromSmiles("c1cc(O)ccc1")
AssignIndexProcessor()(mol)
g = createGraphWithComparators(createGraphEdit(mol))
for m in pg.matchTotalUnique(g):
for a in ComponentIteratorFactory.create(Atom, m):
print "Atom", a.getIndex()
print
Output of query.py
Atom 2
Atom 3
The following example reads a molecule, which is manufactured by the sentence set of the SMILE, and looks then for the MCS in each goal molecule.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.matchprocessor import *
import os
molref = fromSmiles("CCO")
AssignIndexProcessor()(molref)
processor = MCSProcessor(molref)
for mol in createSDFStreamIterator(os.environ['EXAMPLESPATH'] +
os.sep + 'pharmaceuticals.sdf'):
print 'Description :', mol.getDescription()
AssignIndexProcessor()(mol)
pair = processor(mol)
print "Atoms ",
for a in ComponentIteratorFactory.create(Atom, pair.first()):
print a.getIndex(),
print
print "Atoms ",
for a in ComponentIteratorFactory.create(Atom, pair.second()):
print a.getIndex(),
print
Output of MCS.py
Description : MORPHINE
Atoms 0 1 2 Atoms 6
13
14
Description : Vitamin-A
Atoms 0 1 2 Atoms 13
14
15
Description : Beta-estradiol
Atoms 0 1 2 Atoms 12
16
17
Description : Naproxen
Atoms 0 1 2 Atoms 9
22
26
Table of Contents
In the following example coordinates of each atom in a molecule become print.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
import os
mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'caffeine.xyz')
AssignIndexProcessor()(mol)
for atom in CompositeIteratorFactory.create(Atom, mol) :
print atom.getIndex(),
print atom.getPosition()
Output of listpositions.py:
0 (-3.38041 -1.12724 0.573304)
1 (0.96683 -1.07374 -0.819823)
2 (0.0567293 0.852719 0.392316)
3 (-1.37517 -1.02122 -0.0570552)
4 (-1.2615 0.259071 0.523413)
5 (-0.306834 -1.68363 -0.716934)
6 (1.13942 0.187412 -0.27009)
7 (0.560263 2.08391 0.825159)
8 (-0.49268 -2.81806 -1.20947)
9 (-2.63281 -1.7304 -0.0060953)
10 (-2.23013 0.798862 1.08997)
11 (2.5497 2.9735 0.622959)
12 (2.05274 -1.73609 -1.49313)
13 (-2.48077 -2.72695 0.488263)
14 (-3.0089 -1.90253 -1.0498)
15 (2.91761 -1.84815 -0.785787)
16 (2.37879 -1.12119 -2.37437)
17 (1.71899 -2.74899 -1.84392)
18 (-0.151845 3.097 1.53483)
19 (1.89341 2.11812 0.419319)
20 (2.28613 0.996844 -0.24403)
21 (-0.168703 4.04366 0.930109)
22 (0.353532 3.29791 2.51777)
23 (-1.20745 2.75376 1.7203)
In this example the coordinates are printed a more nicely way.
#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
import os
mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'caffeine.xyz')
AssignIndexProcessor()(mol)
for atom in CompositeIteratorFactory.create(Atom, mol) :
print atom.getIndex(),
print 'x = %6.3f y = %6.3f z = %6.3f' % (
atom.getPosition().x,
atom.getPosition().y,
atom.getPosition().z)
Output of listformatedpositions.py:
0 x = -3.380 y = -1.127 z = 0.573
1 x = 0.967 y = -1.074 z = -0.820
2 x = 0.057 y = 0.853 z = 0.392
3 x = -1.375 y = -1.021 z = -0.057
4 x = -1.262 y = 0.259 z = 0.523
5 x = -0.307 y = -1.684 z = -0.717
6 x = 1.139 y = 0.187 z = -0.270
7 x = 0.560 y = 2.084 z = 0.825
8 x = -0.493 y = -2.818 z = -1.209
9 x = -2.633 y = -1.730 z = -0.006
10 x = -2.230 y = 0.799 z = 1.090
11 x = 2.550 y = 2.973 z = 0.623
12 x = 2.053 y = -1.736 z = -1.493
13 x = -2.481 y = -2.727 z = 0.488
14 x = -3.009 y = -1.903 z = -1.050
15 x = 2.918 y = -1.848 z = -0.786
16 x = 2.379 y = -1.121 z = -2.374
17 x = 1.719 y = -2.749 z = -1.844
18 x = -0.152 y = 3.097 z = 1.535
19 x = 1.893 y = 2.118 z = 0.419
20 x = 2.286 y = 0.997 z = -0.244
21 x = -0.169 y = 4.044 z = 0.930
22 x = 0.354 y = 3.298 z = 2.518
23 x = -1.207 y = 2.754 z = 1.720
Table of Contents
The RMSD is used to measure the distance between atoms for two structures and defined by:
The following programm reads a diamond like structure an print out all Atoms, that RMSD is less than 0.1.
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.predicate import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
import os
def buildRefmol():
a = 3.57
c = StaticAtom()
c.setElement(getElements().fetchElement("C"))
c1 = StaticAtom()
c1.setElement(getElements().fetchElement("C"))
c2 = StaticAtom()
c2.setElement(getElements().fetchElement("C"))
c3 = StaticAtom()
c3.setElement(getElements().fetchElement("C"))
c4 = StaticAtom()
c4.setElement(getElements().fetchElement("C"))
c1.setPosition(Vector3(a/4,a/4,a/4))
c2.setPosition(Vector3(0,a/2,a/2))
c3.setPosition(Vector3(a/2,0,a/2))
c4.setPosition(Vector3(a/2,a/2,0))
mol = Molecule()
mol.insert(c)
mol.insert(c1)
mol.insert(c2)
mol.insert(c3)
mol.insert(c4)
b1 = Bond(c, c1)
b2 = Bond(c, c2)
b3 = Bond(c, c3)
b4 = Bond(c, c4)
return mol
+ 'connectedcomponents.py.out', 'w')
mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'diamond.xyz')
AssignIndexProcessor()(mol)
BuildBondsProcessor()(mol)
pg = createPredicateGraphFromSmartsString("[CX4]~[C](~[C])(~[C])(~[C])")
g = createGraph(createGraphEdit(mol))
p = RMSDeviationProcessor(buildRefmol())
for m in matchTotalUniqueFromPredicateGraph(pg, g):
if p(m) < 0.1:
print "Atoms ",
for a in ComponentIteratorFactory.create(Atom, m):
print a.getIndex(),
print
Output of deviation.py
Atoms 42 39 4 35 46