The Molecular Modeling Pro Database Help
Contents:
Viewing molecules and reactions in a database (including with ChemSite)
Creating databases of mixtures and reactions
Adding records (molecules)
Editing records (values for a molecule)
Adding fields (properties)
Menu Descriptions |
|
- Copy |
|
- Open |
- Paste |
- Records (Molecules) |
|
-- MDL SD Files |
|
-- XML files |
- Fields (Properties) |
- Save |
|
- Find |
|
- Sort |
|
-- MDL SD Files |
|
- Subset |
|
-- XML files |
|
-- HTML files |
|
-- EXCEL |
|
- Close |
|
|
|
|
|
|
- XY Plot |
|
- 3-D Plot |
|
|
|
|
|
Creation of data bases will be a familiar task for past users of Molecular Modeling Pro. It is still done from the file menu (Save Database) of the Drawing window. However, underlying the similarity of appearance are many changes and additions that make the program more powerful, flexible and consistent.
Saving the drawn molecule(s) to a database: Draw in the molecule or molecules. Then, while still in the Drawing window, choose the File menu, then Save Database, then choose Save this molecule to the database.
If you have only drawn one molecule: The program will first ask you for the data base name. You have a choice of four file formats at this point:
You will probably not be able to save data to an ACCESS data base created by some other program from the Save Database feature of the DRAWING window (unless your fields have the same data types as expected by MMP for fields with the same name - MMP uses the memo, text and single data types from ACCESS 97). You may view and modify such databases from the DATABASE window (MMP will use the data types contained in the existing database).
Next the program will ask for the name of the file to save the structural information.
If you are saving to an ACCESS database you have the choice of saving the structural data internally in the .mdb file (by selecting the same file name that you chose for the database file), or saving to an external connection table file. If you are saving to an MAP 3 csv file or a tab-delimited text file you will automatically create external connection table files in one of four formats (see below). SD Files automatically save internally, and if you chose this option for data base storage you will not be asked how to save the structural information.
The file formats for external connection tables are:
Next you will be asked to choose the calculated fields to save. Note there is a check box in the lower right part of the screen that allows you to check off most of the properties. The properties not checked off are the ones that require interaction between the user and the program and the properties calculated by CNDO (which take a little longer to calculate). There is one change from MMP 4 that should be noted, and that is that the 3-D solubility parameters are now calculated with the van Krevelen method by default, instead of the empirically based Hansen method used previously. There are a few new properties that can be calculated too.
If you chose to save to an ACCESS data base, the program next will ask you if you would like to create some additional fields for experimental or literature values. This would be a good time to say yes if you plan to add some of your own data to the data base. Fields can be added later, but it is easiest to do it now.
At this point, you will be asked if you want to view the data base. If yes, it will open up in the data base window, where you can further edit it or view the data and structures it contains. Or you can say no, and proceed with adding more molecules from the Drawing window. Note that there is a major difference between adding molecules from the Drawing window and the Database window: Saving from the drawing window saves directly to the data base causing a permanent addition there. Adding a molecule in the Database window only adds it to the database kept in temporary memory. You must Save the database for the change to become permanent (similar to how the program MAP works).
Note on Adding more records to the database from the Drawing window: You can add more molecules to the data base using the same menus. MMP will no longer give you the option of saving structures internally or externally, but will figure out what you did before and do the same again. The same applies to saving mixtures and reactions as one record or multiple records (see below). If you wish to create a data base of mixtures or reactions with each molecule stored as a separate record (with its own properties) that also contains some individual molecules, create the data base with a mixture or reaction, not a single molecule. Otherwise, on subequent additions to the database, MMP will save everything to a single record.
Creating databases of mixtures and reactions - if you have drawn more than one molecule:
Read over the information above for saving single molecules, for it all still applies.
When creating a new database, In addition, you have a decision to make on how to store the mixture and reaction information. When the program asks you if you want to store the molecules in a reaction or mixture:
Choosing yes has the added advantage of being able to reproduce the reaction drawings from the MMP Database window. Check the View molecules as mixture or reaction menu item in the Options menu.
Viewing Molecules and Reactions in a data base:
Special note: Double clicking on any line will display and load the structure into the MMP drawing window. Some of the viewing options, listed below will cause MMP to update the drawing window automatically (which will cause the molecule currently displayed to vanish).
Editing Records (Molecules): Just type the new values into the spreadsheet or the file card text boxes. The changes do not become permanent until you save the data.
Note that the column width of the spreadsheet column can be changed. Click down on the place at the top of the spreadsheet between two field names and drag the column margin to the desired location.
If you have an incorrect structure and want to edit the line this can be done. If your structures are stored as MACROMODEL files or in some other connection table file, then edit the structure, then save to this file name. If the structures are stored as memo fields in the data base itself, create the correct structure, then copy it to the clipboard and paste it into the cell containing the incorrect structural table. After editing the structure, you will want to update all the calculated fields since they will not have correct values. Do this with the following menu selections: Edit & Records & Update all fields for this record. Remember to save the data base.
Reactions: Two types of databases can display reactions. One is a database created with MMP 5 when you have chosen to say yes to the question "do you want to save the molecules to a mixture or reaction?" The second is an MMP/MAP reaction connection table (*.rct). The rct connection tables will automatically display as reactions or mixtures with the molar ratios of the components listed. To get a database created with the "Yes" reaction option to display reactions, in the Options menu of the database window, check the menu item "View structures as mixtures or reactions." Then double-clicking as normal will display the reactions instead of the individual molecules. Unchecking this option will return to individual molecule viewing.
The File Menu
Open - opens a data base file
Use the standard field names with these files to make life simpler. Here are some of the standard names:
Standard Name |
What it is (type) |
Name |
Chemical name (memo field) |
Smiles |
Smiles Notation (memo field) |
CAS_Registry_Number |
CAS Registry number (char * 256) |
ID |
Index number (32 bit integer - in VB this is type "long") |
Connection_Table_Name |
Connection table file name (memo field) |
Chemistry |
An MDL Molfile embedded as text in the data base (memo) |
Information |
A memo field containing information about the molecule |
Formula |
Molecular formula (e.g. butane = C4 H10)(char * 256) |
Molecular_weight |
Molecular weight (single, floating point) |
Molecular_volume |
Volume contained in spheres of Van der waal's radius |
Log_Kow |
Log 10 of the water/octanol partition coefficient |
Boiling_Point |
Boiling point in degrees C |
Etc. |
|
A complete list can be found in the file "fieldnames.txt".
Note that there are two ways to store connection tables (as a separate file or as text in the database). For most uses we recommend separate files. Performance will be somewhat faster with text in the database, but out of memory errors will also be more frequent. The exception to this is when using the Microsoft ACCESS data format, when storing the molecular structures as memo fields in the databases should be well handled.
Formats supported:
Molecular Analysis Pro .csv text file. This file format can be used with Molecular Analysis Pro. It has its own peculiar structure. The first line is a header that tells the programs what type of data the numeric fields have (e.g. molecular weight = 1). The second line has the field names. Each subsequent line has the data for one record. The format of the file is comma-separated value ASCII text. This file format always stores 80 fields, no more, no less. The first seven fields are string fields up to 64kb in length. The next 72 fields are single precision floating point numbers (can contain integers too). This format always stores the molecules as separate files and stores the name of the connection table file in the last field of each record. Numerous unused fields are typically found in MAP databases, which the user can feel free to add new values to. This format will probably slowly go away in the future as in future versions of Molecular Analysis Pro, it is likely that the ACCESS data base format and some client-server data bases like SQL Server and Oracle will be emphasized. The limitation on number of fields (7 text, 72 numeric) is thought to be a limitation to overcome in the future.
Microsoft ACCESS databases (.mdb). This is the most versatile format for use with Molecular Modeling Pro as it gives you features not available with other formats. Be careful with using this format with data bases shared with other people, as the changes you make will become permanent after you save the data or hit the Update button. The extra features available with this format area a result of the fact that the Microsoft Jet Engine Relational Data Base software underlies all that happens in MMP when using this format. Performance with large databases will be best using this format. There is also an opportunity for those who know SQL, to directly interact with the database in an almost unlimited fashion. (See the SQL editor in the Edit menu, described below). Use of SQL is the only way to add new fields using this format.
MDL SD File. Standard chemical text format used for exchanging data between otherwise incompatible programs. It is not recommended to use this as your standard data storage format. Performance with databases containing large numbers of molecules or very large molecules will be problematical. For instance, MMP will set aside 64 KB for every structure in the database, which may quickly run your computer out of memory. However, you are welcome to try it out. On smaller databases, with smaller (<150 atoms) molecules there should be no problem. Note that the SD Files include the Molfiles (structural connection table) for all of the molecules in the database.
Tab delimited text file that can be input by most any program. This is the standard text format used by EXCEL and many other programs. MMP expects the first line to contain the field names and each subsequent line to contain one record.
Comma separated values text (.csv). This is the alternative way of saving text files (to tab-delimitation), that has been used by some programs. Again, MMP expects the first line to be the field's names and each subsequent line to be a record (one molecule with data). It is recommended to use the option of storing each molecule in its own file with the file name being included in the .csv file.
XML files: (aka Extensible Mark-up Language). Microsoft and others are pushing this format as the internet standard for data base transfer. MMP reads and writes to these files. MMP uses the ADO (active-x data objects) 2.6 DLL from Microsoft to read and write the files. Structures can be stored as a text field in the xml file containing the Molfiles. The files created are text files and can be linked to or embedded into HTML files, if you know how.
MMP molecular fragment files (.ssf). MMP creates these files as keys for the substructure searches, but they may be quite useful for building methods for calculating physical properties from structure. They contain information on the fragments and atom types for each molecule. For instance, typical fields include the number of carbons in aromatic rings, the number of amide groups, the number of N-N groups, etc. To create this file for a database, run a substructure search. Do not add new things to this database, as MMP will automatically update it when you add new molecules.
Paste tab-delimited text from the Clipboard: This is a way to get data from EXCEL or other programs which copy data to the clipboard as tab-delimited text. Select the area in EXCEL containing the field names and all the data. Copy it to the clipboard. Then paste the data into MMP. MMP expects the first row it encounters to contain the field names and each subsequent row to contain a record (one molecule). Combined with MMP's Save/Excel option this procedure allows you to easily add new fields to a data base, or manipulate the column order or rename fields.
Save - saves the data base with changes
Molecular Analysis Pro .csv text file. A description of this file type is under Open above. Use this file type when you expect to analyze the data with Molecular Analysis Pro
Microsoft ACCESS database (.mdb). This is the best general-purpose data base format supported by MMP. For a description see the Open method above. Note that at present, no other program except MMP will read the structural Molfiles embedded in the database as structures. When passing structures and data to other chemical programs, the MDL SD File may be your only choice.
MDL SD File. Standard chemical text format for exchanging data between otherwise incompatible programs. Not recommended as a general-purpose database file format.
Tab delimited text file that can be input by most any program. This is the standard form that EXCEL and the clipboard use for passing tables of data.
Comma separated values text (.csv). This is another standard format for passing data between MMP and many other programs.
XML files (.xml) for later use in HTML pages. These files can be used to paste XML tables into HTML documents. MMP also opens this data type, so you could use this as a means of storing the MMP data bases.
HTML files (.htm) creates an html page with a data table that can be viewed with a browser or amended further by another program.
EXCEL - opens EXCEL with the current data displayed. Note that to actually save the data, you must do this with EXCEL
Copy tab-delimited text: This places the entire data base on the clipboard where it can be copied into any program that supports this format. The first line is the field names and each subequent line is one molecule (record). The fields in the records are separated by the tab character and the end of the line is a single line feed character (ASCII character 10).
Close - Closes the data window. MMP remains open, with the Drawing window still there.
The Edit Menu
Copy - Copies the data in the cell selected in the spread sheet to the clipboard. To copy an entire database, use the File/Open/Copy database to clipboard menu items instead of this one.
Paste - Copies the data on the clipboard into the current cell in the spread sheet. To paste an entire database use the File/Save/Paste database menu item instead of this one.
Records (molecules)
-- Add a molecule: The molecule currently displayed in the drawing window will be automatically added to the record set in memory. Note that it does not become a permanent part of the data base until you have saved the data base. The structure will be saved if the data base saves structures to connection table files. All recognizable calculated fields will be given values. There must be a recognizable chemical structure associated with the record, or MMP will be unable to calculate any physical properties.
Remember that if you have specified interatomic distances, angles, dihedral angles, sterimol parameters or Hammett sigma values, they require user input on the Drawing Window. If the data base appears to have gone away and the program is just sitting there, it may be awaiting your input on the Drawing Window.
Note that there is a second way of adding a record. At the bottom of the spreadsheet is a row with an asterisk at the left. Typing in data in this row, then moving to a different row will cause a new record to be made. Make sure you add the structure to the appropriate column. If the structural field contains embedded Molfiles, you may copy a Molfile from the clipboard (e.g. from ChemSite) into the column containing the structural information. If the structural field contains connection table file names, then put the file name in the column instead. After inserting the structure and moving to a different row, move back to the newly created row to calculate the fields that MMP calculates. Do this with the Update record's field menu item.
IMPORTANT NOTE: You can also save molecules to the database from the Drawing window. This works in a totally different way and can get one into a little trouble. For instance, if you have not saved data typed into the spreadsheet , then use the Drawing window's data base save method, it will write the new record directly to the data base saved on disk. If you then open this data base, the changes made in the spreadsheet will be lost. Conversely, if you don't open it, then save the changes made in the spreadsheet to the data base, the new molecule added will be lost. In other words, save your work on the spreadsheet before adding a new molecule with the menu on the drawing window.
-- Delete a molecule; Deletes a molecule from the record set in memory. This change does not become permanent until the data base is saved. The molecule deleted is the one currently selected.
-- Update molecule's calculated fields: This will recalculate all the fields that MMP recognizes with the latest algorithms of MMP. This is included to give consistency to data bases created years ago or with a different tool. MMP recognizes fields based on their field names. To see the list of field names that MMP recognizes, examine the file fldnames.txt. Note that the changes do not become permanent until you save the data base.
-- Replace with the current molecule drawn. This will replace the molecule in the selected row and its associated calculated properties with the structure and calculated properties of the molecule displayed in the drawing window.
Fields (Properties)
- Add Field (property). See the description later in this document.
-- Update a properties's values for all records: This will recalculate the values of one field for all the molecules in the data base. This is included to give consistency to data bases created years ago or with a different tool. MMP recognizes fields based on their field names. To see a list of field names that MMP recognizes, examine the file fldnames.txt. Changes do not become permanent until you save the data base.
Find - Finds a string of text or numbers anywhere in the spreadsheet.
Sort - Sorts all rows by the currently selected column from low to high. Note that you should click on a column in the spreadsheet before selecting Sort from the menu.
Update All Calculated Properties for all molecules - This updates every calculated field for every molecule (record) in the database. It updates molecular formula and SMILES notation as well as all the calculated numeric fields. You might want to do this for two reasons:
1. Create new fields with names that MMP recognizes as calculated properties. Fill them automatically with this option.
2. Over time, methods for calculating properties with MMP have changed. For instance, the old methods of calculating water solubility and the solubility parameters have been replaced. To make current calculations consistent with older calculation in existing databases, update the old database with this option.
WARNING: MMP knows which fields to update by their name. If you have named a field of literature values or experimental values with the same name that MMP gives to a calculated field, then it will replace these values too, and if you save the changes, the data previously there will be overwritten! Rename these fields before using this routine so they won't be overwritten. To change the name of a field, use ACCESS or EXCEL or some other program that reads in one of the formats supported by MMP (for instance in tab-delimited text format). If you have an ACCESS reaction database and don't use ACCESS to modify the name, be careful not to close MMP while you do this if you have a reaction database, as you will want to keep the reaction table when you read the database back in. For a list of the field names that MMP recognizes as calculated fields see the file fieldnames.txt
Databases containing large molecules and CNDO calculated fields may take a considerable time to update. Hitting the Escape key will get you out of the calculation loops and return control to you (with only some of the molecules having updated records). There may be a delay after hitting the escape key as CNDO calculations will run to completion for the molecule being analyzed.
Substructure Search - Draw in the substructure that you wish to use for the search. Then select this off the Edit menu of the data base window. If there is no keys file for the database, then one will be created and this can take some time. The keys file contains lots of structural information on all the molecules in the database. Once it is created, subsequent searches will be quite fast, even on large databases. The keys file has the same file name as the data base, except that it has the file extension .ssf. An option to do a more exhaustive search after the key search is given, but this is usually not needed. The Records will be modified with the structures found. Take care to saving this data, as the records not in the subset will be deleted, if you use the same file name to save to.
After running the substructure search a new menu item will appear on the Edit window entitled "Restore after substructure search. This will restore the database to how it was before the restoration including any subsetting done prior to the substructure search.
Subset - This is accomplished with the Active X Data Objects Filter command, so the record set is not actually altered. Changing the subset criteria to a more generous span will retrieve records made "invisible" by a previous subset. Saves made after subsetting will apply to the saved data base, so you may want to give the saved database a new file name.
Aft running the subset a new menu item will appear entitled "Restore after subset". Running this routine will restore the database to how it was before the last subset was done.
The View Menu
Note that memo sized text cannot always be viewed easily in the spreadsheet or even in the file card text boxes. To see them, select the page and click on the gray line at the bottom of the Database window. This will display the text over the entire database window. Click on the text again and the gray line will return to its normal size.
SpreadSheet - Displays the data base in a spreadsheet format. To display the molecule, double click on its row in the spreadsheet.
Structure in Drawing Window - Displays the data base molecule names in the left column and places the MMP Drawing Window over the top of the other spread sheet columns. Single clicking on a molecule name will bring the spreadsheet back. Hitting the < and > buttons at the lower corners of the screen will scroll through the pictures of the molecules. Double clicking will display the molecules. If you want to only see the molecule structures and do not care about the underlying data in the spreadsheet, then resize the spreadsheet window so that only the column of names is showing. Then clicking on the name will cause the molecule structure in the drawing window to change to the selected molecule. Clicking on the left gray margin of the spreadsheet will also cause the molecule to appear in front of the spreadsheet window with the selected molecule drawn.
Selected Molecule with Chemsite - Open Chemsite. Place the Chemsite window where you want it and size it to taste. Selecting a molecule from the spreadsheet will place a molfile on the clipboard that will be picked up by CHEMSITE. Note that this requires version 3 or better of CHEMSITE. Single clicking on a compound name brings the CHEMSITE window to the fore, while double clicking brings the MMP Drawing window to the fore.
File Card - Displays the data for the selected molecule or record in a more readable format, by placing a file card containing its data over the spreadsheet. Supports a maximum of 237 fields.
Reaction Table - This will display the reaction conditions that appear above and below the reaction arrows and allow you to edit them, as well as the solvent volume used in the reaction. It also contains the file name used by the next menu item "Edit Reaction Description." Going to this view will not lose any information in the main data table. If you save after going to this view, the changes made in the main table will still be remembered and stored. This menu item is only available to ACCESS data bases containing a reaction_table.
Edit Reaction Description- This allows you to store a full description of the reaction, molecule or mixture in rtf (rich text format). You can store pictures in this format. You can drag and drop items into the document. The document will remember and will display formatting such as several fonts in the same document, underlining and bold fonts. You can create this document with a different program, then tie it to the MMP database (this file name goes into a field in the reaction table. This menu item is only available to ACCESS data bases containing a reaction table.
The Graphics Menu
XY Plot - draws an x-y plot of two fields selected off a menu in a separate window. Two different graphics windows alternately display the plots, so that two plots can be viewed simultaneously and compared. The regresion statistics are displayed in a separate window usually located near the bottom of the screen. Double clicking on a data point reveals the point's identity.
3-D Plot - a rotatable 3-D plot of the points for three fields is generated. Selecting menu items from the Rotate menu in the Graphics window initiates rotations. Holding down the left mouse button will temporarily stop the rotations. This plot can be valuable for looking for clusters of data. Planes indicate linear relationships between 2 of the variables. Lines indicate relationships between all three variables.
The Options Menu
Create Back-up on Opening - when this is checked, MMP will create a back up of the database opened automatically. The file name will be like the file name opened, but with the file extension .bak.
View structures as mixtures or reactions: This menu item is only available to ACCESS data bases with a reaction_table (molecules stored as reactions during database creation). If checked, the molecules will display as reactions. If unchecked the molecules will display individually.
Handling Fields
Adding new fields from the Edit menu (Fields/Add fields) is simple enough. A field creation window appears and you type in the names of the field and select the type of data from the list (text, memo, single precision number, integer etc.). Hit the OK button on the bottom left after typing in the field name and type. You can then type in additional field names with data types (add as many new fields as you like). Hit the 'Done' button at the lower right when finished and the fields will be created. If you use field names that MMP understands as the names of fields it calculates then you can use the Edit/Fields/Update all records in a field menu item to automatically populate a field. A list of names the MMP understands is in the file fieldnames.txt. Remember to save the database or the new field names and any new data types into the field's records will not be saved!
Alternative method for adding new fields: Open the data base. Save it to the EXCEL option. This opens the data base in EXCEL. Type in a new field name in the first unused column. Type in any data that you want (you can do this in MMP too). Select the area in EXCEL that contains the field names and all the data. Copy it to the clipboard. In the MMP DATABASE window, Open the data base by Pasting tab-delimited text from the clipboard (the last item on the Open menu). If you are adding fields to an ACCESS reaction data base, click on the "Yes" button the program asks if you would like to keep the current reaction table. Otherwise this table will be deleted (ACCESS data bases created by MMP contain three tables "Data_Table, Pindex_Table and Reaction_Table). The new fields and data do not become permanent until you save the data.
Note that you can also use EXCEL to delete fields, change the field names and the order in which the columns are displayed.
There are many other ways to add fields to the data base, but all involve going to another program to do it. For instance, you could copy the database to the clipboard (File/Save/Copy tab-delimited database to the clipboard menu items) and paste it into another program. Then proceed as above (the program has mainly been tested with EXCEL though). If you are using the ACCESS data base format, note that you cannot copy the reaction table, only the data table.
Need more help? You can call ChemSW at 707-864-0845 for product support. Visit our web site at www.norgwyn.com or www.chemsw.com for information about our products. For suggestions about improving the product send e-mail to Dr. James Quinn, President of Norgwyn Montgomery Software Inc.