NTSChem Tutorial

Sat Jul 11 2009


Table of Contents

1. Front
2. First Steps
Building simple Molecules
3. Component and Composite Iterators
Looping over Components in a Composite
Looping over Bonds of an Atom
Looping over Neighbors of an Atom
Looping over Residues of an Protein
4. Predicates
Predicate Functions
Looping over Atoms or Bonds with a Predicate
User defined Predicates
Composition of Predicates in NTSChem
5. Periodic Table of Elements
Element Member Functions
6. Molecular File Formats
Molecular File Formats
7. Ring Processing
Cycle Membership
Number of Ring Bonds to an Atom
8. Connected Components Processing
Identifying Connected Components
9. Atom and Bond Typing
Reading and Writing Files with Atom Types and Bond Orders
Build Bonds Processing
Processes Atom and Bond Types
10. Pattern Matching
Using SMILES
Using SMARTS
Using Predicates for Query Molecules
Maximum Common Substructure Search (MCS Search)
11. Coordinates
Setting and Getting Coordinates
12. Structural Analysis
Root Mean Square Deviation (RMSD)

Chapter 1. Front

SMILES, SMARTS, and SMIRKS are trademarks of Daylight Chemical Information Systems.

SYBYL is a registered trademark of Tripos, Inc..

CML is trademarks of P. Murray-Rust and H. S. Rzepa (http://www.xml-cml.org).

Python is a trademark of the Python Software Foundation.

MDL is a registered trademark in the United States of MDL Information Systems, Inc..

Other products and software packages referenced in this document are registered trademarks of their respective manufacturers.

Chapter 2. First Steps

Table of Contents

Building simple Molecules

Building simple Molecules

The basic building blocks of a molecular modeling system are atoms, bonds, and molecules. In an application NTSChem would build molecules by reading and parsing a molecule file. After this you can analyze and/or transform the molecular information and then write the result back into a file.

To start the construction of a methane molecule is shown. First you create an atom.

	  c = StaticAtom()
    

One of the most important property of a atom is the element property. You can assign the element for this atom like this:

      c.setElement(getElements().fetchElement("C"))
    

The global function getElements gets the periodic table of the elements.

To build a complete methane molecule you have to create four more atoms and assign the element property for hydrogen.

      h1 = StaticAtom()
      h1.setElement(getElements().fetchElement("H"))

      h2 = StaticAtom()
      h2.setElement(getElements().fetchElement("H"))

      h3 = StaticAtom()
      h3.setElement(getElements().fetchElement("H"))

      h3 = StaticAtom()
      h3.setElement(getElements().fetchElement("H"))
    

To build a methane molecule we have to create a molecule and insert the carbon atom and hydrogen atoms.

      mol = Molecule()

      mol.insert(c)
      mol.insert(h1)
      mol.insert(h2)
      mol.insert(h3)
      mol.insert(h4)

    

At this point the methane molecule contain four atoms, but there are no bonds between the atoms. So we have to create the bonds:

      b1 = Bond(o, h1)
      b2 = Bond(o, h2)
      b3 = Bond(o, h3)
      b4 = Bond(o, h4)
    

Now the methane molecule is build and we can verify same properties.

      print mol.countAtoms()
      print c.countBonds()
      print h1.countBonds()
      print h2.countBonds()
      print h3.countBonds()
      print h4.countBonds()
    

At this point the molecule does not contain information of the geometry. The default value of the position properties the the null vector. So we have to set positions of the hydrogen atoms.

      h1.setPosition(Vector3( 0.0000,        0.0000,       -1.0836)) 
      h2.setPosition(Vector3( 0.0000,        1.0216,        0.3612)) 
      h3.setPosition(Vector3( 0.8848,       -0.5108,        0.3612)) 
      h4.setPosition(Vector3(-0.8848,       -0.5108,        0.3612)) 
    

To print the distance between the carbon and the hydrogen atom we build first the distance vector and then evaluate the norm the resulting vector.

      vec = c.getPosition() - h1.getPosition()
      print vec.norm()
    

The complete program looks like:

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *

import os



c = StaticAtom()
c.setElement(getElements().fetchElement("C"))

h1 = StaticAtom()
h1.setElement(getElements().fetchElement("H"))

h2 = StaticAtom()
h2.setElement(getElements().fetchElement("H"))

h3 = StaticAtom()
h3.setElement(getElements().fetchElement("H"))

h4 = StaticAtom()
h4.setElement(getElements().fetchElement("H"))

h1.setPosition(Vector3( 0.0000,        0.0000,       -1.0836))
h2.setPosition(Vector3( 0.0000,        1.0216,        0.3612)) 
h3.setPosition(Vector3( 0.8848,       -0.5108,        0.3612)) 
h4.setPosition(Vector3(-0.8848,       -0.5108,        0.3612)) 

mol = Molecule()

mol.insert(c)
mol.insert(h1)
mol.insert(h2)
mol.insert(h3)
mol.insert(h4)

b1 = Bond(c, h1)
b2 = Bond(c, h2)
b3 = Bond(c, h3)
b4 = Bond(c, h4)

print mol.countAtoms()
print c.countBonds()
print h1.countBonds()
print h2.countBonds()
print h3.countBonds()
print h4.countBonds()

vec = c.getPosition() - h1.getPosition()

print vec.norm()



    

Output of firststeps.py

5
4
1
1
1
1
1.08360004425

    

Chapter 3. Component and Composite Iterators

Looping over Components in a Composite

Components are atoms, bonds or molecules. Containers for components are composites. NTSChem provides two iterators for lopping over components in a composite. The one iterates over all components in a composite, the other one iterates only over components in the first level. That is components which are direct children.

For each successive iteration through the loop, these iterators return the next atom and/or bond respectively, or more generally the next component or composite type specified in the factory method of the iterator.

The following example shows how to traverse atoms and print out their atomic symbols und then shows how to traverse bonds and print their bond order.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *

import os

mol = fromSmiles("c1ccccc1")



print "Atoms :"
for atom in CompositeIteratorFactory.create(Atom, mol):
    print atom.getIndex(), atom.getElement().getSymbol()

print "Bonds :"
for bond in CompositeIteratorFactory.create(Bond, mol):
    print bond.getOrder()



    

Output of looping.py:

Atoms :
0 C
1 C
2 C
3 C
4 C
5 C
Bonds :
ORDER_AROMATIC
ORDER_AROMATIC
ORDER_AROMATIC
ORDER_AROMATIC
ORDER_AROMATIC
ORDER_AROMATIC

    

Looping over Bonds of an Atom

The method getBonds of the class Atom creates a bond iterator, which can be used to loop over bonds. The example shows how this iterator loops over bonds of an atom to determine the degree of an atom.

#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *

import os

def getExplicitDegree(atom):
    result = 0
    for bond in BondIterator(atom):
        result += 1
    return result



mol = fromSmiles("c1ccccc1")

for atom in CompositeIteratorFactory.create(Atom, mol):
	print "Atom ", atom.getIndex(), getExplicitDegree(atom)



    

Output of loopingbonds.py:

Atom  0 2
Atom  1 2
Atom  2 2
Atom  3 2
Atom  4 2
Atom  5 2

    

If you only interested in the number of bonds the code can be rewritten in a very compact from with using itertools.

#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *

from itertools import *

import os



print "Number of Bonds = ", sum(
    imap(bool, CompositeIteratorFactory.create(Atom,
                   fromSmiles("c1ccccc1"))))



    

Output of loopingbonds2.py:

Number of Bonds =  6

    

Looping over Neighbors of an Atom

It is frequent not the connections around an atom, that you wish to iterate, but over neighboring atoms. This can be done by the atom iterator.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

import os




def showNeighbors(atom):
    for nbor in AtomIterator(atom):
        print nbor.getIndex(),
    print


mol = fromSmiles("C1CCCCC1")

AssignIndexProcessor()(mol)

for atom in CompositeIteratorFactory.create(Atom, mol):
    print atom.getIndex(), showNeighbors(atom)



    

Output of loopingneighbors.py:

0 1 5
None
1 0 2
None
2 1 3
None
3 2 4
None
4 3 5
None
5 0 4
None

    

Looping over Residues of an Protein

Iterators can not only be used to iterate over atoms or bonds. This is be shown in the following example. The program prints out the amino-acid sequence of Hemoglobin.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *

import textwrap
import os



mol = PDB(os.environ['EXAMPLESPATH'] + os.sep + 'Hemoglobin.pdb')

for p in ComponentIteratorFactory.create(Protein, mol) :
    print "Chain : "
    s = ""
    for c in ComponentIteratorFactory.create(Chain, p) :
        for r in ComponentIteratorFactory.create(Residue, c) :
            s = s + r.getName() + " "
        print textwrap.fill(s, width=40)
    print



    

Output of loopingresidues.py:

Chain : 
VAL LEU SER GLU GLY GLU TRP GLN LEU VAL
LEU HIS VAL TRP ALA LYS VAL GLU ALA ASP
VAL ALA GLY HIS GLY GLN ASP ILE LEU ILE
ARG LEU PHE LYS SER HIS PRO GLU THR LEU
GLU LYS PHE ASP ARG PHE LYS HIS LEU LYS
THR GLU ALA GLU MET LYS ALA SER GLU ASP
LEU LYS LYS HIS GLY VAL THR VAL LEU THR
ALA LEU GLY ALA ILE LEU LYS LYS LYS GLY
HIS HIS GLU ALA GLU LEU LYS PRO LEU ALA
GLN SER HIS ALA THR LYS HIS LYS ILE PRO
ILE LYS TYR LEU GLU PHE ILE SER GLU ALA
ILE ILE HIS VAL LEU HIS SER ARG HIS PRO
GLY ASP PHE GLY ALA ASP ALA GLN GLY ALA
MET ASN LYS ALA LEU GLU LEU PHE ARG LYS
ASP ILE ALA ALA LYS TYR LYS GLU LEU GLY
TYR GLN GLY

Chain : 
HEM


    

Chapter 4. Predicates

Predicate Functions

A special kind of functions is a predicate. Predicates are functions that return a boolean value. In NTSChem, these functions can be passed into another function to specify a algorithm for example selection or searching. NTSChem provides many predefined predicates for classes that are derived from the class Object.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *

from itertools import *

import os

class HasAtomicNumber:
      def __init__(self, n):
        self.n = n
      def __call__(self, a):
        return a.getElement().getAtomicNumber() == self.n 

def quantify(seq, pred=bool):
    return sum(imap(pred, seq))



mol = fromSmiles("c1c(C)c(O)c(N)cc1")

print "Number of Oxygens = ", quantify(
      CompositeIteratorFactory.create(Atom, mol), IsOxygen())
print "Number of Carbons = ", quantify(
      CompositeIteratorFactory.create(Atom, mol), HasAtomicNumber(6))



    

Output of count.py:

Number of Oxygens =  1
Number of Carbons =  7

    

The script count loops over all atoms and execute the predicate IsOxygen and HasAtomicNumber for each atom. The first predicate is predefined in NTSChem, the second is defined in the script. As you also see in the script the standard framework itertools with the quantify function for predicates is used.

Looping over Atoms or Bonds with a Predicate

In a straightforward programming stile one can do this by iterating over all atoms or bonds and by selecting the atoms or bonds of intrest in the inner loop. A more clear and better method is to use predicates and itertools.

#!/usr/bin/env python
from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *

from itertools import *

import os

def quantify(seq, pred = bool):
    return sum(imap(pred, seq))



mol = fromSmiles("c1c(C)c(O)c(N)cc1")

print "Carbon atoms:",
for atom in ComponentIteratorFactory.create(Atom, mol):
    if IsCarbon()(atom):
        print atom.getIndex(),
print
print "Number Carbon atoms:",
print quantify(ComponentIteratorFactory.create(Atom, mol), IsCarbon())




    

Output of loopingpredicate.py:

Carbon atoms: 0 1 2 3 5 7 8
Number Carbon atoms: 7

    

User defined Predicates

The following example shows a user defined predicate which shows all atoms whose atomic mass is greater than 15. This example use the iterator filter ifilter from itertools.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

from itertools import *

import os

class AtomWeight:
      def __init__(self, weight):
        self.weight = weight
      def __call__(self, a):
        return a.getElement().getAtomicWeight() > self.weight 
                



mol = fromSmiles("c1c(C)c(O)c(N)cc1")

AssignIndexProcessor()(mol)

for atom in ifilter(AtomWeight(12), CompositeIteratorFactory.create(Atom, mol)):
    print "Atom with index ", atom.getIndex(), " and weight ", \
          atom.getElement().getAtomicWeight(), "has a weight > 12."
    


    

Output of atomweight.py:

Atom with index  4  and weight  16.0 has a weight > 12.
Atom with index  6  and weight  14.0 has a weight > 12.

    

This example is very simple, but as you can imagine you can build more complex user defined predicates.

Composition of Predicates in NTSChem

For more complex inquiries it is necessarily to connect several predicates with logical operations. For this the logical predicates And, Or and Not are predefined by NTSChem.

The following example demonstrates use of the And and Not composition predicates with two of the predefined atom predicates.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *

from itertools import *

import os

def quantify(seq, pred = bool):
    return sum(imap(pred, seq))



mol = fromSmiles("c1c(C)c(O)c(N)cc1")

print "Number of Aromatic Oxygens = ",
print quantify(CompositeIteratorFactory.create(Atom, mol),
               And(IsOxygen(), IsAromatic()))

print "Number of Non-Carbons = ",
print quantify(CompositeIteratorFactory.create(Atom, mol),
               Not(IsCarbon()))



    

Output of composedpredicate.py:

Number of Aromatic Oxygens =  0
Number of Non-Carbons =  2

    

Chapter 5. Periodic Table of Elements

Table of Contents

Element Member Functions

Element Member Functions

Very atom can be assigned to an element. This can be done by the setElement member function. A unique instance of the table of elements can be fetched the the global defined function getElements. The following script prints the "Periodic Table of Elements" in the simple table form. At run time only one instance of the Periodic Table of Elements exists.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *

import os



print '%10s%20s%10s%10s' % ('Symbol', 'AtomicNumber', 'Row', 'Column')
for element in getElements().getElements() :
    print '%10s%20d%10d%10d' % (element.getSymbol(),
                                element.getAtomicNumber(),
                                element.getRow(),
                                element.getColumn())
    


    

Output of elements.py

    Symbol        AtomicNumber       Row    Column
         H                   1         1         1
        He                   2        18         1
        Li                   3         1         2
        Be                   4         2         2
         B                   5        13         2
         C                   6        14         2
         N                   7        15         2
         O                   8        16         2
         F                   9        17         2
        Ne                  10        18         2
        Na                  11         1         3
        Mg                  12         2         3
        Al                  13        13         3
        Si                  14        14         3
         P                  15        15         3
         S                  16        16         3
        Cl                  17        17         3
        Ar                  18        18         3
         K                  19         1         4
        Ca                  20         2         4
        Sc                  21         3         4
        Ti                  22         4         4
         V                  23         5         4
        Cr                  24         6         4
        Mn                  25         7         4
        Fe                  26         8         4
        Co                  27         9         4
        Ni                  28        10         4
        Cu                  29        11         4
        Zn                  30        12         4
        Ga                  31        13         4
        Ge                  32        14         4
        As                  33        15         4
        Se                  34        16         4
        Br                  35        17         4
        Kr                  36        18         4
        Rb                  37         1         5
        Sr                  38         2         5
         Y                  39         3         5
        Zr                  40         4         5
        Nb                  41         5         5
        Mo                  42         6         5
        Tc                  43         7         5
        Ru                  44         8         5
        Rh                  45         9         5
        Pd                  46        10         5
        Ag                  47        11         5
        Cd                  48        12         5
        In                  49        13         5
        Sn                  50        14         5
        Sb                  51        15         5
        Te                  52        16         5
         I                  53        17         5
        Xe                  54        18         5
        Cs                  55         1         6
        Ba                  56         2         6
        La                  57         3         6
        Hf                  72         4         6
        Ta                  73         5         6
         W                  74         6         6
        Re                  75         7         6
        Os                  76         8         6
        Ir                  77         9         6
        Pt                  78        10         6
        Au                  79        11         6
        Hg                  80        12         6
        Tl                  81        13         6
        Pb                  82        14         6
        Bi                  83        15         6
        Po                  84        16         6
        At                  85        17         6
        Rn                  86        18         6

    

Chapter 6. Molecular File Formats

Table of Contents

Molecular File Formats

Molecular File Formats

The following program lists all supported molecular file types. All theses file types can be readed and written from NTSChem.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *

import os



print "Reader :"
for s in getStructureReaderRegistry().getFileTypes() :
    print "%5s : %30s" % (s, getFileTypeToDescription().getDescription(s)),
    ", File extension : ",
    for e in getExtToFileType().getExtensions(s) :
        print "%5s" % (e),
    print

print

print "Writer :"
for s in getWriterRegistry().getFileTypes() :
    print "%5s : %30s" % (s, getFileTypeToDescription().getDescription(s)),
    ", File extension : ",
    for e in getExtToFileType().getExtensions(s) :
        print "%5s" % (e),
    print


    

Output of supportedfiletypes.py:

Reader :
  PDB :                 Brookhaven PDB   ent   pdb
  MOL :                        MDL MOL   mdl   mol    sd
 MOL2 :                     Sybyl MOL2  mol2
  XYZ :                 XYZ Coordinate   xyz  xyzr
  ALC :                        Alchemy   alc
  CML :       Chemical Markup Language   cml   xml
  SMI :                         Smiles   smi
  SDF :                        MDL SDF   sdf

Writer :
  PDB :                 Brookhaven PDB   ent   pdb
  MOL :                        MDL MOL   mdl   mol    sd
 MOL2 :                     Sybyl MOL2  mol2
  XYZ :                 XYZ Coordinate   xyz  xyzr
  ALC :                        Alchemy   alc
  CML :       Chemical Markup Language   cml   xml
  SMI :                         Smiles   smi

    

Chapter 7. Ring Processing

Cycle Membership

To process rings in NTSChem one can test on ring/cycle membership. The NTSChem processor AssignInRingProcessor is used, in order to determine, which atoms and connections of members of one or more cycles and are acyclic. As soon as AssignInRingProcessor was called, atoms or bonds can be examined for presence in a ring, by either the atom IsInRing or the BondisInRing methods.

Number of Ring Bonds to an Atom

To provide an example of how one might use the above functions, the code below returns the number of ring bonds attached to an atom. This is useful for identifying ring fusion atoms (ring bond count = 3) and potential spiro-atoms (ring bond count >= 4).

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

from itertools import *

import os

def quantify(seq, pred = bool):
    "Count how many times the predicate is True in the sequence"
    return sum(imap(pred, seq))



mol = fromSmiles("c1ccccc1c2ccccc2")

AssignIndexProcessor()(mol)

for atom in CompositeIteratorFactory.create(Atom, mol):
    print "atom ", atom.getIndex(), atom.getInRing(),
    quantify(AtomIterator(atom), IsInRing())
    print

rings = BuildRingsProcessor()(mol)
print "Number of rings : ", \
      quantify(ComponentIteratorFactory.create(Composite, rings))
for ring in ComponentIteratorFactory.create(Composite, rings) :
    print "Ring size = ", \
          quantify(ComponentIteratorFactory.create(Atom, ring))
    for atom in ComponentIteratorFactory.create(Atom, ring) :
        print "Atom ", atom.getIndex()



    

Output of countringbonds.py

atom  0 True
atom  1 True
atom  2 True
atom  3 True
atom  4 True
atom  5 True
atom  6 True
atom  7 True
atom  8 True
atom  9 True
atom  10 True
atom  11 True
Number of rings :  2
Ring size =  6
Atom  0
Atom  1
Atom  2
Atom  3
Atom  4
Atom  5
Ring size =  6
Atom  6
Atom  7
Atom  8
Atom  9
Atom  10
Atom  11

    

Chapter 8. Connected Components Processing

Identifying Connected Components

In order to separate molecules that can be split up into separate connected components, e.g. a ligand from a protein, NTSChem makes processor BuildCompositesProcessor available.

The following provides a short example of how to use this processor.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

from itertools import *

import os
import textwrap

def quantify(seq, pred = bool):
    return sum(imap(pred, seq))



mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'mols.xyz')

AssignIndexProcessor()(mol)
BuildBondsProcessor()(mol)

parts = BuildCompositesProcessor()(mol)

print "The molecule has %d components\n" % \
      quantify(ComponentIteratorFactory.create(Composite, parts))

for part in ComponentIteratorFactory.create(Composite, parts) :
    print "Molecule with Atoms :"
    s = ""
    for atom in ComponentIteratorFactory.create(Atom, part) :
        s = s + str(atom.getIndex()) + " "
    print textwrap.fill(s, width=40)
    



    

Output of connectedcomponents.py

The molecule has 4 components

Molecule with Atoms :
42 41 40 39 38 37 36 35 34 33 32 31 30
29 28 27 26 25 24 23 22 21 20 19 18 17
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
42 41 40 39 38 37 36 35 34 33 32 31 30
29 28 27 26 25 24 23 22 21 20 19 18 17
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Molecule with Atoms :
82 81 80 79 78 77 76 75 74 73 72 71 70
69 68 67 66 65 64 63 62 61 60 59 58 57
56 55 54 53 52 51 50 49 48 47 46 45 44
43 82 81 80 79 78 77 76 75 74 73 72 71
70 69 68 67 66 65 64 63 62 61 60 59 58
57 56 55 54 53 52 51 50 49 48 47 46 45
44 43
Molecule with Atoms :
107 106 105 104 103 102 101 100 99 98 97
96 95 94 93 92 91 90 89 88 87 86 85 84
107 106 105 104 103 102 101 100 99 98 97
96 95 94 93 92 91 90 89 88 87 86 85 84
Molecule with Atoms :
147 146 145 144 143 142 141 140 139 138
137 136 135 134 133 132 131 130 129 128
127 126 125 124 123 122 121 120 119 118
117 116 115 114 113 112 111 110 109 108
147 146 145 144 143 142 141 140 139 138
137 136 135 134 133 132 131 130 129 128
127 126 125 124 123 122 121 120 119 118
117 116 115 114 113 112 111 110 109 108

    

Chapter 9. Atom and Bond Typing

Reading and Writing Files with Atom Types and Bond Orders

Some known file formats are presenting atomic type information and bond orders as attributes for atoms and/or bonds. NTSChem maps external atomic types and bond orders of the different file formats to a type ID. This ensures, which the same atom types finds with the expenditure use.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

import os


 
mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'caffeine.xyz')

BuildBondsProcessor()(mol)
AssignBondOrderProcessor()(mol)
AssignTypeProcessor()(mol)

for atom in CompositeIteratorFactory.create(Atom, mol) :
    print atom.getType('INT')

for bond in CompositeIteratorFactory.create(Bond, mol) :
    print bond.getOrder()



    

Output of readwithtypes.py:

HC
Nam
C2
Nam
C2
C2
C2
Npl
O2
C3
O2
HC
C3
HC
HC
HC
HC
HC
C3
C2
Npl
HC
HC
HC
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE

    

Build Bonds Processing

Atoms with same environment and thus with comparable characteristics are assigned to an atomic type. If one knows the atomic types and bond orders one can determine for example the parameterizing the forces for a MD simulation.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

import os
 


mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'viagra.xyz')

AssignIndexProcessor()(mol)
BuildBondsProcessor()(mol)
AssignBondOrderProcessor()(mol)

for bond in CompositeIteratorFactory.create(Bond, mol) :
    print bond.getFirst().getIndex(), bond.getSecond().getIndex(), \
          bond.getOrder()




    

Output of bonds.py:

0 1 ORDER_SINGLE
0 5 ORDER_SINGLE
0 7 ORDER_SINGLE
1 2 ORDER_SINGLE
1 33 ORDER_SINGLE
1 34 ORDER_SINGLE
2 3 ORDER_SINGLE
2 35 ORDER_SINGLE
2 36 ORDER_SINGLE
3 4 ORDER_SINGLE
3 6 ORDER_SINGLE
4 5 ORDER_SINGLE
4 37 ORDER_SINGLE
4 38 ORDER_SINGLE
5 39 ORDER_SINGLE
5 40 ORDER_SINGLE
6 41 ORDER_SINGLE
6 42 ORDER_SINGLE
6 43 ORDER_SINGLE
7 8 ORDER_DOUBLE
7 9 ORDER_DOUBLE
7 10 ORDER_SINGLE
10 11 ORDER_SINGLE
10 15 ORDER_DOUBLE
11 12 ORDER_DOUBLE
11 44 ORDER_SINGLE
12 13 ORDER_SINGLE
12 45 ORDER_SINGLE
13 14 ORDER_DOUBLE
13 16 ORDER_SINGLE
14 15 ORDER_SINGLE
14 19 ORDER_SINGLE
15 46 ORDER_SINGLE
16 17 ORDER_SINGLE
17 18 ORDER_SINGLE
17 47 ORDER_SINGLE
17 48 ORDER_SINGLE
18 49 ORDER_SINGLE
18 50 ORDER_SINGLE
18 51 ORDER_SINGLE
19 20 ORDER_DOUBLE
19 24 ORDER_SINGLE
20 21 ORDER_SINGLE
21 22 ORDER_DOUBLE
21 25 ORDER_SINGLE
22 23 ORDER_SINGLE
22 27 ORDER_SINGLE
23 24 ORDER_SINGLE
23 28 ORDER_DOUBLE
24 52 ORDER_SINGLE
25 26 ORDER_DOUBLE
25 30 ORDER_SINGLE
26 27 ORDER_SINGLE
27 29 ORDER_SINGLE
29 53 ORDER_SINGLE
29 54 ORDER_SINGLE
29 55 ORDER_SINGLE
30 31 ORDER_SINGLE
30 56 ORDER_SINGLE
30 57 ORDER_SINGLE
31 32 ORDER_SINGLE
31 58 ORDER_SINGLE
31 59 ORDER_SINGLE
32 60 ORDER_SINGLE
32 61 ORDER_SINGLE
32 62 ORDER_SINGLE

    

Processes Atom and Bond Types

With the processors AssignBondOrderProcessor and AssignTypeProcessor one can assign type information from file formats, which do not contain this type of information. In the following example also the BuildBondProcessor becomes necessary, since the xyz file format does not contain atomic bonds.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

import os



mol = MOL2(os.environ['EXAMPLESPATH'] + os.sep + 'histidine.mol2')

AssignBondOrderProcessor()(mol)
AssignTypeProcessor()(mol)

for atom in CompositeIteratorFactory.create(Atom, mol) :
    print atom.getType('SYB')

for bond in CompositeIteratorFactory.create(Bond, mol) :
    print bond.getOrder()




    

Output of types.py:

N.4
C.3
C.3
C.2
O.2
C.2
N.pl3
C.2
C.2
N.pl3
H
O.2
H
H
H
H
H
H
H
H
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_DOUBLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE
ORDER_SINGLE

    

Chapter 10. Pattern Matching

Using SMILES

The most common method describing a substructure to search for is to use a SMILES string.

The following example creates a molecule from a SMILES string and prints out the match result. The result is true if one molecule contain a benzene ring (c1ccccc1).

#!/usr/bin/env python


from nanotechsoftware.base import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.graph import *
from nanotechsoftware.chemistry.match import *

import os



mol1 = fromSmiles("c1ccccc1")
AssignIndexProcessor()(mol1)
        
mol2 = fromSmiles("c1ccccc1c2ccccc2")
AssignIndexProcessor()(mol2)

g1 = createGraphWithComparators(createGraphEdit(mol1))
g2 = createGraphWithComparators(createGraphEdit(mol2))

print singleMatch(g1, g2)



    

Output of matchsingle.py

True

    

NTSChem makes a rich set of functions available for discovery of sub structures. One common task is to search for all matches between one reference structure and a goal molecule.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.graph import *
from nanotechsoftware.chemistry.match import *

import os



mol1 = fromSmiles("c1c(O)cccc1")
mol2 = fromSmiles("c1c(O)cccc1c2ccccc2")

g1 = createGraphWithComparators(createGraphEdit(mol1))
g2 = createGraphWithComparators(createGraphEdit(mol2))

c = g1.matchAll(g2)

print "Number of matches ", c.size()

for mol in c:
    print "Atoms ",
    for a in ComponentIteratorFactory.create(Atom, mol):
        print a.getIndex(),
    print
print



    

Output of matchall.py

Number of matches  2
Atoms  0 1 2 3 4 5 6
Atoms  3 1 2 0 6 5 4


    

While each match is returned by the generator method, a second loop can determine either the atoms or connections of the matches.

You will possibly be surprised why there are two matches. The reason is that there is a symmetry operation that transforms the ring to itself. If you are only interested in the match (that is usaly the case) you can use the matchTotalUnique function.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.graph import *
from nanotechsoftware.chemistry.match import *

import os

def createGraphFromSMILES(s):
    return createGraphWithComparators(createGraphEdit(fromSmiles(s)))



c = createGraphFromSMILES("c1c(O)cccc1").matchTotalUnique(createGraphFromSMILES("c1c(O)cccc1c2ccccc2"))

print "Number of matches ", c.size()

for mol in c:
    print "Atoms ",
    for a in ComponentIteratorFactory.create(Atom, mol):
        print a.getIndex(),
    print



    

Output of matchtotalunique.py

Number of matches  1
Atoms  0 1 2 3 4 5 6

    

Using SMARTS

One common task is to formulate a query with same properties against a molecular structure. This can be done with so called SMARTS strings. In the following example all substructures with an oxygen atom bonded to an other atom is searched.

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.match import *
from nanotechsoftware.chemistry.graph import *

import os



mol = fromSmiles("c1cc(O)ccc1")

AssignIndexProcessor()(mol)
g = createGraphWithComparators(createGraphEdit(mol))

c = SMARTS("*O").matchTotalUnique(g)

for m in c:
    for a in ComponentIteratorFactory.create(Atom, m):
        print "Atom", a.getIndex()
    print



    

Output of querywithsmarts.py

Atom 2
Atom 3


    

Using Predicates for Query Molecules

SMARTS are a convenient way a create a query for searching substructures. But it is also possible to create a query directly.

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.predicate import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.match import *
from nanotechsoftware.chemistry.graph import *

import os

class IsTrue:
      def __call__(self, a):
        return 1 



pe = PredicateGraphEdit()
i = pe.insertNode(AtomPythonWrapperPredicate(IsTrue()))
j = pe.insertNode(AtomPythonWrapperPredicate(IsOxygen()))
pe.insertEdge(i, j, BondPythonWrapperPredicate(IsTrue()))
pg = createGraphPredicateWithComparators(pe)

mol = fromSmiles("c1cc(O)ccc1")
AssignIndexProcessor()(mol)
g = createGraphWithComparators(createGraphEdit(mol))

for m in pg.matchTotalUnique(g):
    for a in ComponentIteratorFactory.create(Atom, m):
        print "Atom", a.getIndex()
    print



    

Output of query.py

Atom 2
Atom 3


    

Maximum Common Substructure Search (MCS Search)

The following example reads a molecule, which is manufactured by the sentence set of the SMILE, and looks then for the MCS in each goal molecule.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.io import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.matchprocessor import *

import os



molref = fromSmiles("CCO")
AssignIndexProcessor()(molref)
processor = MCSProcessor(molref)


for mol in createSDFStreamIterator(os.environ['EXAMPLESPATH'] + 
                                os.sep + 'pharmaceuticals.sdf'):
    print 'Description :', mol.getDescription()
    AssignIndexProcessor()(mol)
    pair = processor(mol)
    print "Atoms ",
    for a in ComponentIteratorFactory.create(Atom, pair.first()):
        print a.getIndex(),
    print
    print "Atoms ",
    for a in ComponentIteratorFactory.create(Atom, pair.second()):
        print a.getIndex(),
        print



    

Output of MCS.py

Description : MORPHINE
Atoms  0 1 2 Atoms  6
13
14
Description : Vitamin-A
Atoms  0 1 2 Atoms  13
14
15
Description : Beta-estradiol
Atoms  0 1 2 Atoms  12
16
17
Description : Naproxen
Atoms  0 1 2 Atoms  9
22
26

    

Chapter 11. Coordinates

Setting and Getting Coordinates

In the following example coordinates of each atom in a molecule become print.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

import os



mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'caffeine.xyz')

AssignIndexProcessor()(mol)

for atom in CompositeIteratorFactory.create(Atom, mol) :
    print atom.getIndex(),
    print atom.getPosition()



    

Output of listpositions.py:

0 (-3.38041 -1.12724 0.573304)
1 (0.96683 -1.07374 -0.819823)
2 (0.0567293 0.852719 0.392316)
3 (-1.37517 -1.02122 -0.0570552)
4 (-1.2615 0.259071 0.523413)
5 (-0.306834 -1.68363 -0.716934)
6 (1.13942 0.187412 -0.27009)
7 (0.560263 2.08391 0.825159)
8 (-0.49268 -2.81806 -1.20947)
9 (-2.63281 -1.7304 -0.0060953)
10 (-2.23013 0.798862 1.08997)
11 (2.5497 2.9735 0.622959)
12 (2.05274 -1.73609 -1.49313)
13 (-2.48077 -2.72695 0.488263)
14 (-3.0089 -1.90253 -1.0498)
15 (2.91761 -1.84815 -0.785787)
16 (2.37879 -1.12119 -2.37437)
17 (1.71899 -2.74899 -1.84392)
18 (-0.151845 3.097 1.53483)
19 (1.89341 2.11812 0.419319)
20 (2.28613 0.996844 -0.24403)
21 (-0.168703 4.04366 0.930109)
22 (0.353532 3.29791 2.51777)
23 (-1.20745 2.75376 1.7203)

    

In this example the coordinates are printed a more nicely way.

#!/usr/bin/env python

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

import os



mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'caffeine.xyz')
AssignIndexProcessor()(mol)

for atom in CompositeIteratorFactory.create(Atom, mol) :
    print atom.getIndex(),
    print 'x = %6.3f   y = %6.3f   z = %6.3f' % (
        atom.getPosition().x,
        atom.getPosition().y,
        atom.getPosition().z)



    

Output of listformatedpositions.py:

0 x = -3.380   y = -1.127   z =  0.573
1 x =  0.967   y = -1.074   z = -0.820
2 x =  0.057   y =  0.853   z =  0.392
3 x = -1.375   y = -1.021   z = -0.057
4 x = -1.262   y =  0.259   z =  0.523
5 x = -0.307   y = -1.684   z = -0.717
6 x =  1.139   y =  0.187   z = -0.270
7 x =  0.560   y =  2.084   z =  0.825
8 x = -0.493   y = -2.818   z = -1.209
9 x = -2.633   y = -1.730   z = -0.006
10 x = -2.230   y =  0.799   z =  1.090
11 x =  2.550   y =  2.973   z =  0.623
12 x =  2.053   y = -1.736   z = -1.493
13 x = -2.481   y = -2.727   z =  0.488
14 x = -3.009   y = -1.903   z = -1.050
15 x =  2.918   y = -1.848   z = -0.786
16 x =  2.379   y = -1.121   z = -2.374
17 x =  1.719   y = -2.749   z = -1.844
18 x = -0.152   y =  3.097   z =  1.535
19 x =  1.893   y =  2.118   z =  0.419
20 x =  2.286   y =  0.997   z = -0.244
21 x = -0.169   y =  4.044   z =  0.930
22 x =  0.354   y =  3.298   z =  2.518
23 x = -1.207   y =  2.754   z =  1.720

    

Chapter 12. Structural Analysis

Root Mean Square Deviation (RMSD)

The RMSD is used to measure the distance between atoms for two structures and defined by:

The following programm reads a diamond like structure an print out all Atoms, that RMSD is less than 0.1.

from nanotechsoftware.base import *
from nanotechsoftware.chemistry import *
from nanotechsoftware.chemistry.framework import *
from nanotechsoftware.chemistry.core import *
from nanotechsoftware.chemistry.predicate import *
from nanotechsoftware.chemistry.processor import *
from nanotechsoftware.chemistry.io import *

import os

def buildRefmol():

    a = 3.57
    
    c = StaticAtom()
    c.setElement(getElements().fetchElement("C"))
    
    c1 = StaticAtom()
    c1.setElement(getElements().fetchElement("C"))
    
    c2 = StaticAtom()
    c2.setElement(getElements().fetchElement("C"))
    
    c3 = StaticAtom()
    c3.setElement(getElements().fetchElement("C"))
    
    c4 = StaticAtom()
    c4.setElement(getElements().fetchElement("C"))
    
    c1.setPosition(Vector3(a/4,a/4,a/4))
    c2.setPosition(Vector3(0,a/2,a/2))
    c3.setPosition(Vector3(a/2,0,a/2))
    c4.setPosition(Vector3(a/2,a/2,0))
    
    mol = Molecule()
    
    mol.insert(c)
    mol.insert(c1)
    mol.insert(c2)
    mol.insert(c3)
    mol.insert(c4)
    
    b1 = Bond(c, c1)
    b2 = Bond(c, c2)
    b3 = Bond(c, c3)
    b4 = Bond(c, c4)

    return mol


         + 'connectedcomponents.py.out', 'w')

mol = XYZ(os.environ['EXAMPLESPATH'] + os.sep + 'diamond.xyz')

AssignIndexProcessor()(mol)
BuildBondsProcessor()(mol)

pg = createPredicateGraphFromSmartsString("[CX4]~[C](~[C])(~[C])(~[C])")
g = createGraph(createGraphEdit(mol))

p = RMSDeviationProcessor(buildRefmol())
for m in matchTotalUniqueFromPredicateGraph(pg, g):
    if p(m) < 0.1:
        print "Atoms ",
        for a in ComponentIteratorFactory.create(Atom, m):
            print a.getIndex(),
        print



    

Output of deviation.py

Atoms  42 39 4 35 46