Maestro User Manual



Yüklə 8,42 Mb.
Pdf görüntüsü
səhifə157/173
tarix28.06.2018
ölçüsü8,42 Mb.
#52154
1   ...   153   154   155   156   157   158   159   160   ...   173

Appendix C: Substructure Notation

Maestro 10.2 User Manual

433

In this pattern spec, the bond orders remain explicit. If the first bond is generalized, using a

wildcard symbol,

C0*C0-O0-C0-C0

since  * can match a single, double, or triple bond, this more generalized pattern can match

diethyl ether, ethyl vinyl ether, or ethoxy acetylene.



C.2

Ring closure

Ring-forming bonds are expressed as a bond symbol followed by a number, where the number

is the pattern position of the symbol for the other atom in the ring bond. For example, cyclo-

pentane can be expressed as

C0-C0-C0-C0-C0-1

The final -1 indicates that the fifth carbon has a single bond to the atom specified at position 1

in the pattern. 

Pattern positions are numbered from 1 at the leftmost atom type symbol, and increment to the

right. Bond symbols (and any other punctuation) do not affect the numbering. A ring closure

must be referred from right to left, that is, from an atom specified later in the pattern to an atom

specified earlier. The ring-closure indications themselves are not counted as positions. This

makes sense since a closure introduces no additional atom.



C.3

Chain branching

To allow specification of branched structures, mmsubs accepts parenthesized pattern sections.

An opening parenthesis initiates a branch, and the corresponding closing parenthesis ends the

branch. A bond symbol after the closing parenthesis is tied to the atom symbol just before the

opening parenthesis.

For example, methylcyclobutane could be expressed like this:

C0(-C0)-C0-C0-C0-1

The parentheses indicate that the 2nd carbon is bonded to the 1st carbon, and that the 3rd

carbon is also bonded to the 1st carbon. The 5th carbon is bonded to the 1st carbon, forming a

ring of 4 (not 5) carbons.

Branches off branches can be specified by nesting parenthesized sections. For example, isopro-

pylbenzene can be expressed as

C0=C0(-C0(-C0)-C0)-C0=C0-C0=C0-1 



Appendix C: Substructure Notation

Schrödinger Software Release 2015-2

434

and threonine can be expressed as

N0-C0(-C0(-C0)-O0-H0)-C0=O0

Arbitrarily deep nesting of branches is allowed. 

In patterns with more complex branching, the chance of error due to unbalanced parentheses is

greater. The mmsubs software treats excess closing parentheses as erroneous, but quietly

accepts patterns with excess opening parentheses, and implicitly inserts the matching closing

parentheses at the end of the pattern. This may not be what you intended, so you should check

carefully that the parentheses match. 

C.4

Optional atoms

Optional atoms can be specified in mmsubs notation so that the substructure will match

whether or not the optional atom is found. The syntax is to put the optional atom and its

preceding bond symbol in square brackets. With this pattern

N0([-H0])-C0(-C0-O0[-H0])-C0=O0

serine would be identified whether or not the optional hydrogens were present in the structure

searched. Note that the first optional hydrogen in that pattern is also in a chain branch, but

alone. In cases like this, the parentheses should go on the outside, as above.

The square bracket syntax can only be applied to individual atoms. It cannot be applied to

chains, and the usage cannot be nested. It is not permitted to specify the first atom in a pattern

as optional.

C.5

Special Cases

The MacroModel atom type C2 needs special attention. This type covers sp

2

 carbon as it occurs



both in aromatic rings and in carbonyl groups. In cases where you want to match only one C2

subtype, you need to specify an atom attached to the C2, to exclude matches on the other

subtype. 

For example, if an aromatic ring pattern contains C2*C2*C2 and it is undesirable for it to

match a structure containing C2-C2(=O2)-C2 then specifying an attached atom that is not O2

will exclude the undesirable matches: for example, C2*C2(-H0)*C2 might suffice.

The most general way to exclude unwanted carbonyl matches on a C2 in an aromatic ring

pattern is to require that the C2 have a single bond to an atom of any type. In this example, the

pattern to use is C2*C2(-00)*C2.



Appendix C: Substructure Notation

Maestro 10.2 User Manual

435

C.6

Examples 

In this section, some more complex examples are presented. The first is norbornane, which is a

bicyclic compound, and can be represented as follows:

C0-C0-C0-C0(-C0-1)-C0-C0-1

Two ring closures are indicated, as the compound is bicyclic. In this pattern, both closures, of

the bridge and of the main ring, refer to the same position. This is not required. For example,

these patterns are valid and equivalent to the above:

C0-C0-C0-C0-C0(-C0-2)-C0-1

C0-C0-C0-C0-C0-C0(-C0-3)-1

Now note something about branches, which is actually independent of the ring closures: If you

swap a parenthesized branch with the part after it, and then parenthesize the new front part, the

connectivity expressed is exactly the same. For example, these two forms are equivalent:

C0-C0-C0-C0-C0(-C0-2)-C0-1

C0-C0-C0-C0-C0(-C0-1)-C0-2

Norbornane also provides an opportunity to emphasize one fundamental point about linear

substructure representations. The appearance of some patterns may seem to imply geometric

properties. The norbornane patterns above, for example, imply a 6-membered ring bridged by

one atom with two bonds on either side of the ring. It is important to keep in mind that the

pattern specifies connectivity only, and not geometry.

Though it would be somewhat unconventional, it is in fact possible to specify norbornane as a

5-membered ring bridged by a chain of 2 atoms, which is topologically equivalent:

C0-C0-C0(-C0-C0-1)-C0-C0-1 

C0-C0-C0-C0(-C0-C0-1)-C0-1 

C0-C0-C0-C0(-C0-C0-2)-C0-1 

C0-C0-C0-C0-C0(-C0-C0-2)-1 

C0-C0-C0-C0-C0(-C0-C0-3)-1 

Any of these patterns will give exactly the same matching as one of the more conventional

representations. The point is that these patterns specify topology, not geometry. 

It should be clear that there can be many valid mmsubs expressions to match a single, non-

trivial substructure. The mmsubs notation does not have a canonical representation, that is,

mmsubs

 has no counterpart to Unique SMILES (USMILES).



The next example is of a compound containing fused heterocyclic aromatic rings. One way to

express quinoline is the following:




Yüklə 8,42 Mb.

Dostları ilə paylaş:
1   ...   153   154   155   156   157   158   159   160   ...   173




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə