Appendix C: Substructure Notation
Maestro 10.2 User Manual
433
In this pattern spec, the bond orders remain explicit. If the first bond is generalized, using a
wildcard symbol,
C0*C0-O0-C0-C0
since * can match a single, double, or triple bond, this more generalized pattern can match
diethyl ether, ethyl vinyl ether, or ethoxy acetylene.
C.2
Ring closure
Ring-forming bonds are expressed as a bond symbol followed by a number, where the number
is the pattern position of the symbol for the other atom in the ring bond. For example, cyclo-
pentane can be expressed as
C0-C0-C0-C0-C0-1
The final -1 indicates that the fifth carbon has a single bond to the atom specified at position 1
in the pattern.
Pattern positions are numbered from 1 at the leftmost atom type symbol, and increment to the
right. Bond symbols (and any other punctuation) do not affect the numbering. A ring closure
must be referred from right to left, that is, from an atom specified later in the pattern to an atom
specified earlier. The ring-closure indications themselves are not counted as positions. This
makes sense since a closure introduces no additional atom.
C.3
Chain branching
To allow specification of branched structures, mmsubs accepts parenthesized pattern sections.
An opening parenthesis initiates a branch, and the corresponding closing parenthesis ends the
branch. A bond symbol after the closing parenthesis is tied to the atom symbol just before the
opening parenthesis.
For example, methylcyclobutane could be expressed like this:
C0(-C0)-C0-C0-C0-1
The parentheses indicate that the 2nd carbon is bonded to the 1st carbon, and that the 3rd
carbon is also bonded to the 1st carbon. The 5th carbon is bonded to the 1st carbon, forming a
ring of 4 (not 5) carbons.
Branches off branches can be specified by nesting parenthesized sections. For example, isopro-
pylbenzene can be expressed as
C0=C0(-C0(-C0)-C0)-C0=C0-C0=C0-1
Appendix C: Substructure Notation
Schrödinger Software Release 2015-2
434
and threonine can be expressed as
N0-C0(-C0(-C0)-O0-H0)-C0=O0
Arbitrarily deep nesting of branches is allowed.
In patterns with more complex branching, the chance of error due to unbalanced parentheses is
greater. The mmsubs software treats excess closing parentheses as erroneous, but quietly
accepts patterns with excess opening parentheses, and implicitly inserts the matching closing
parentheses at the end of the pattern. This may not be what you intended, so you should check
carefully that the parentheses match.
C.4
Optional atoms
Optional atoms can be specified in mmsubs notation so that the substructure will match
whether or not the optional atom is found. The syntax is to put the optional atom and its
preceding bond symbol in square brackets. With this pattern
N0([-H0])-C0(-C0-O0[-H0])-C0=O0
serine would be identified whether or not the optional hydrogens were present in the structure
searched. Note that the first optional hydrogen in that pattern is also in a chain branch, but
alone. In cases like this, the parentheses should go on the outside, as above.
The square bracket syntax can only be applied to individual atoms. It cannot be applied to
chains, and the usage cannot be nested. It is not permitted to specify the first atom in a pattern
as optional.
C.5
Special Cases
The MacroModel atom type C2 needs special attention. This type covers sp
2
carbon as it occurs
both in aromatic rings and in carbonyl groups. In cases where you want to match only one C2
subtype, you need to specify an atom attached to the C2, to exclude matches on the other
subtype.
For example, if an aromatic ring pattern contains C2*C2*C2 and it is undesirable for it to
match a structure containing C2-C2(=O2)-C2 then specifying an attached atom that is not O2
will exclude the undesirable matches: for example, C2*C2(-H0)*C2 might suffice.
The most general way to exclude unwanted carbonyl matches on a C2 in an aromatic ring
pattern is to require that the C2 have a single bond to an atom of any type. In this example, the
pattern to use is C2*C2(-00)*C2.
Appendix C: Substructure Notation
Maestro 10.2 User Manual
435
C.6
Examples
In this section, some more complex examples are presented. The first is norbornane, which is a
bicyclic compound, and can be represented as follows:
C0-C0-C0-C0(-C0-1)-C0-C0-1
Two ring closures are indicated, as the compound is bicyclic. In this pattern, both closures, of
the bridge and of the main ring, refer to the same position. This is not required. For example,
these patterns are valid and equivalent to the above:
C0-C0-C0-C0-C0(-C0-2)-C0-1
C0-C0-C0-C0-C0-C0(-C0-3)-1
Now note something about branches, which is actually independent of the ring closures: If you
swap a parenthesized branch with the part after it, and then parenthesize the new front part, the
connectivity expressed is exactly the same. For example, these two forms are equivalent:
C0-C0-C0-C0-C0(-C0-2)-C0-1
C0-C0-C0-C0-C0(-C0-1)-C0-2
Norbornane also provides an opportunity to emphasize one fundamental point about linear
substructure representations. The appearance of some patterns may seem to imply geometric
properties. The norbornane patterns above, for example, imply a 6-membered ring bridged by
one atom with two bonds on either side of the ring. It is important to keep in mind that the
pattern specifies connectivity only, and not geometry.
Though it would be somewhat unconventional, it is in fact possible to specify norbornane as a
5-membered ring bridged by a chain of 2 atoms, which is topologically equivalent:
C0-C0-C0(-C0-C0-1)-C0-C0-1
C0-C0-C0-C0(-C0-C0-1)-C0-1
C0-C0-C0-C0(-C0-C0-2)-C0-1
C0-C0-C0-C0-C0(-C0-C0-2)-1
C0-C0-C0-C0-C0(-C0-C0-3)-1
Any of these patterns will give exactly the same matching as one of the more conventional
representations. The point is that these patterns specify topology, not geometry.
It should be clear that there can be many valid mmsubs expressions to match a single, non-
trivial substructure. The mmsubs notation does not have a canonical representation, that is,
mmsubs
has no counterpart to Unique SMILES (USMILES).
The next example is of a compound containing fused heterocyclic aromatic rings. One way to
express quinoline is the following:
Dostları ilə paylaş: |