"how to develop sdf from smiles using babel"



Yüklə 10,34 Kb.
tarix26.05.2018
ölçüsü10,34 Kb.
#46201

How to create SD/SDF files from SMILES using freely available software tools

Molecular Design Limited (MDL) MOL/SD files, i.e. files with extension *.mol (Molecular), *.sd (Structure-Data) or *.sdf (Structure-Data-Format) are the files containing the structural information and associated data for single molecule (*.mol) or for any number of molecules (*sd./*sdf). The SD/SDF format currently serves as the most common standard to exchange information about chemicals. Hence, one of the most significant steps during QSAR Model Reporting Formats (QMRFs) completing is to generate and provide SDF files including all necessary information about training/test set molecules (i.e. identifiers of all compounds, e.g. CAS/InChI/name/formula; visualised 3D structures; experimental and predicted values of target properties/parameters; the values of utilized molecular descriptors).



The following step-by-step description of an SDF file generating procedure is intended to serve as a practical guide on how to obtain (in an easy and fast way, starting with SMILES) the proper attachments, necessary to make each QMRF a reliable source of information about reported QSAR/QSPR modelling.

  1. The first step is to create a SMI file (i.e. a file with extension *.smi), which must be a text file including one or more molecular structures (each structure represented by SMILES should be placed in a separate row, e.g. file containing 50 structures should have 50 rows). The SMI file should begin with the list of compounds (SMILES), without any heading/empty lines. It can contain the IDs of molecules (in columns), but they have to be separated from the SMILES by TAB or SPACE. Also other information can be optionally added (e.g. to which set (training/test) the compound belongs, if the compound is active or inactive, what are the values of descriptors, etc.) and each column (information) has to be TAB/SPACE delimited.

The most common starting point is a table with compounds listed in separate rows and associated data in columns (e.g. example.xls table). First, such table should be saved, without a heading row, as TAB delimited text file (e.g. example.txt). Subsequently the extension of obtained TXT file should be changed into SMI (by simply opening the example.txt file and saving it again but as an example.smi file). The content of each SMI file can be browsed by opening the file with Notepad/WordPad, etc. 

  1. Properly prepared SMI file can be subjected to conversion. Currently there are several software tools which can operate SMI to SD/SDF transformation; two freely available ones will be discussed here.

  1. OpenBabel version 2.2.3 for Windows, freely available at http://openbabel.org/

  1. A properly prepared SMI file has to be indicated as OpenBabel input – its content is visualised if the file is properly recognized.

  2. Open Babel does not generate coordinates, unless the box "Generate 3D coordinates" is ticked. It is necessary to select this option, as the SDF files without calculated coordinates (x, y and z coordinates are equal 0) cannot be recognized by the overwhelming majority of software as well as by the SDF files browser (recommended in point 4).

  3. Before the conversion it is necessary to specify the path to and the name of the output file in order to save the results.

  4. The conversion procedure can be a bit time consuming due to calculation of 3D coordinates (about 3 minutes for 60 relatively small molecules).

The disadvantage of OpenBabel is that it does not allow to add various attributes of compounds (different IDs, values of descriptors, etc.) to the output file. The only way to include all information in SDF seems to be to prepare the input SMI file containing TAB/SPACE delimited columns with all necessary information about the molecules. Thus, the SMI file should consist of all rows (except the heading row) and columns from the initial (XLS) data table. However, in the output file all attributes are placed in one row without heading and it is difficult to recognize the meaning of particular textual/numerical information. Hence, OpenBabel could be recommended to produce SDF files including 3D structures with only one ID (e.g. name) associated. As far as QMRFs are concerned, providing the remaining data in a separate attachment (e.g. XLS file) would be necessary.

b) Accelrys Discovery Studio Visualizer version 2.5, freely available at http://accelrys.com/



  1. The input SMI file should include SMILES and – optionally – one structure ID (preferably name since thesoftware, by default, will recognize this initial ID as name).

  2. Other attributes (like CAS, InChI, descriptors values, etc.) can be easily added (Edit  Add Attribute…) and copy-pasted from the XLS data table. They will be transparently placed in separate rows in the output SDF file

  3. The file can be subsequently saved as SDF (File  Save as…  MDL MOL/SD files). The software generates 3D coordinates, but the procedure is much faster than the one performed by OpenBabel.

Accelrys Discovery Studio is able to produce SDF files including 3D structures as well as all other textual/numerical information about studied compounds (additional attachments are not necessary).

4. The content of the final SDF files can be browsed either with Accelrys Discovery Studio or with easy to use SDF files browser Hyleos (currently version 0.2.9.3), freely available at http://www.hyleos.net/

With the Hyleos application it is possible to screen the content of SDFs (visualised 3D structures and all associated data) as well as to merge/split the files.

 

JRC Computational Toxicology Group



14 December 2009




Yüklə 10,34 Kb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə