Simplified molecular input line entry specification or smiles



Yüklə 444 b.
tarix26.05.2018
ölçüsü444 b.
#46218



The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings

  • The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings

  • SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules



SMILES

  • Simplified Molecular Input Line Entry System (SMILES)

  • Widely used AND computationally efficient

  • Uses atomic symbols and a set of intuitive rules

  • Uses hydrogen-suppressed molecular graphs (HSMG)



The term Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation

  • The term Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation

    • A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database
  • The term Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds

    • A notable feature of these rules is that they allow rigorous partial specification of chirality.


In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph

  • In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph

  • The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree

  • Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes

  • Parentheses are used to indicate points of branching on the tree



SMILES Bonds



SMILES Branches

  • Represented by enclosure in parentheses

  • Can be nested or stacked

  • Examples:

      • CC(O)CC is 2-Butanol
      • OCC(C)C is iso-Butanol
      • OC(C)(C)C is tert-Butanol


SMILES Bonds

  • Ethene

  • Chloroethene

  • 1,1-Dichloroethene

  • cis-1,2-Dichloroethene

  • Trichloroethene

  • Perchloroethene



SMILES Symbols

  • String of alphanumeric characters and certain punctuation symbols

  • Terminates at the first space encountered when read left to right

  • The ORGANIC SUBSET:

  • B, C, N, O, P, S, F, Cl, Br, I



Other SMILES Atoms

  • Aliphatic or nonaromatic carbon: C

  • Atom in aromatic ring: lowercase letter

  • Designate ring closure with pairs of matching digits, e.g.

      • c1ccccc1 is Benzene, whereas
      • C1CCCCC1 is Cyclohexane


SMILES Charges

  • Specify attached hydrogens and charges in square brackets

  • Number of attached hydrogens is the symbol H followed by optional digit



SMILES Charges

  • [H+]

  • [OH-]

  • [OH3+]

  • [Fe++]

  • [NH4+]



SMILES Cyclic Structures



Cyclic Structures

  • Numbers indicate start and stop of ring

  • Same number indicates start and end of the ring, entered immediately following the start/end atoms

  • Only numbers 1 – 9 are used

  • A number should appear only twice

  • Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2



SMILES Conventions

  • Avoid two consecutive left parentheses if possible

  • Strive for the fewest number of possible branches

  • Tautomeric bonds are not designated; enter the appropriate form



Further Restrictions

  • A branch cannot begin a SMILES notation

  • A branch cannot immediately follow a double- or triple-bond symbol

  • Example: C=(CC)C is invalid, but

  • C(=CC)C or C(CC)=C are valid SMILES



SMILES Fragments

  • Nitro

  • Nitrate

  • Nitrite

  • Sulfonic acid

  • Cyanide/Nitrile

  • Azide

  • Azido





Disconnected Structures

  • Tetramethyl ammonium bromide

  • C[N+]C(C)C.[Br-]



Isomeric and Chiral SMILES

  • Isomeric configuration indicated by forward and backward slashes: / \

  • Examples:

    • trans-1,2-dibromoethene: Br/C=C/Br
    • cis-1,2-dibromoethene: Br/C=C\Br
  • Chirality indicated by the “@” symbol



Another Application

  • SMILESCAS Database

    • http://esc.syrres.com/interkow/smilecas.htm
  • Over 103,000 SMILES notations

  • Input CAS Registry Number

  • Leads to SMILES and thence to a structure search



































Yüklə 444 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə