############################################################## # $Id: ionizer.ini,v 1.40 2004/02/06 19:46:18 reboul Exp $ # Control specs for ionization state expander # Copyright 2002-2003 Schrodinger, LLC. # All rights reserved. ############################################################## ############################################################## # # Ionization pattern spec syntax # ============================== # # In the Ionizer's processing of this data file, blank lines # are ignored, and any text to the right of a '#' is treated # as a comment. # # All other text is treated as an ionization spec. Each valid # ionization spec consists of a command name, followed by the # command's required parameters. These ion spec commands are # supported -- # # Acid: Specifies an acid group, and how to ionize it # # Base: Specifies a base group, and how to ionize it # # Exclude: Specifies a group to exclude from consideration # as ionizable; used to selectively "forget" group # matches based on prior Acid/Base specs # # Ionizable groups are specified using Schrodinger's linear # substructure notation, which goes by the name "mmsubs". That # syntax is described in this installed document -- # # $SCHRODINGER/services-v#####/doc/mmsubs_syntax.txt # ############################################################## # # Note Well # ========= # # Every input CT is tested for any substructures matching # the specified ionizable groups, in the order in which they # appear below. The effect of a later pattern match will # supersede the effect of an earlier pattern match. Thus, # anyone crafting pattern specs like these must think very # carefully about the proper order for the desired effects. # # For example -- look below -- the sulfonic pattern has to # come after the sulfinic pattern, because the sulfonic # pattern is a specialization of the sulfinic pattern. Any # group which matches sulfonic will also match sulfinic, but # the opposite assertion is not necessarily true. # # If the order of the two patterns were reversed below, then # a sulfonic match would always be superseded by the sulfinic # match, thereby losing the intended distinction between the # two. # # Because of the great generality of this syntax, the inter- # pattern precedences can be much less obvious than for the # sulfinic vs. sulfonic case. Be careful! # ############################################################## # # We now describe the command parameter syntax.... # # Acid # Base # # For each input CT, in any substructure found to match the # specified mmsubs pattern, the atom matching the pattern's # leading atom will be considered for ionization via fusion # of the specified ionized fragment. The ion group will be # treated as acid or base according to the command used. # # The fragment named must be in this program's custom # fragment library. The library is named "ionized", and it # currently contains fragments named Ammonium, Hydroxide, # N-minus, and Thiolate. # # The Acid/Base spec's 3rd parameter, a pKa value for the # matching group, is used in deciding which ion combinations # to actually generate in the output, based on pK and/or # pH considerations at run time. Note that pKa is used for # both acids and bases. # # You cannot specify an Acid/Base pattern in this file # without supplying a pKa value for it. It is assumed that # a pKa value is at least somewhat accurate, as it is going # to be used in the program's restriction determinations. # # When one Acid/Base pattern is a specialization of another # Acid/Base pattern, it is probably the case that the pKa # value for the more specialized pattern can be given more # precisely. Assuming that the patterns are presented in # order of increasing specificity, the more precise pKa will # be assigned ultimately. # # Exclude specs don't specify a fragment or pKa.... # # Exclude # # For each input CT, in any substructure matching the # specified pattern, the atom matching the pattern's # leading atom will be _excluded_ from consideration for # ionization. # # An Exclude pattern match causes an effect only if the # leading atom coincides with a prior Acid/Base pattern # match's leading atom. If there is no corresponding prior # Acid/Base match, the Exclude match is simply ignored. # # Because of the Ionizer's top-to-bottom processing of these # pattern specs, each Exclude pattern must be placed after # all the Acid/Base specs whose matches it might negate. # # It is possible for a given input CT atom to be matched as # an ionization center due to some Acid/Base pattern spec, # then excluded from consideration due to a later Exclude # pattern, then re-matched due to an even later Acid/Base # spec. # ############################################################## # # Important note # ============== # # The following specifications are _not_ an encyclopedic list # of ionizable groups! # # Some users will wish to prepare their own customized data # file, presumably by adapting a copy from this one, and then # running with the customized data, specified via command- # line option. # # Such users must understand Schrodinger's "mmsubs" linear # substructure notation. Correct use of the syntax is not # trivially easy. Users may need to contact Schrodinger for # assistance. # ############################################################################## ############################################################## # Specs for acids, to be deprotonated # carboxylic Acid O0(-H0)-C2(=O0)-C0 Hydroxide 4.0 # phosphoric Acid O0(-H0)-P0(=O0)(-O0)-O0 Hydroxide 2.1 # phosphorylamide Acid O0(-H0)-P0(=O0)(-N0)-O0 Hydroxide 2.6 # phosphonic Acid O0(-H0)-P0(=O0)(-O0)-C0 Hydroxide 2.5 # phosphonamide Acid O0(-H0)-P0(=O0)(-N0)-C0 Hydroxide 2.9 # sulfuric Acid O0(-H0)-S0(=O0)(=O0)-O0 Hydroxide -2.0 # sulfinic Acid O0(-H0)-S0(=O0)-C0 Hydroxide 2.0 # sulfonic # (supersedes match on sulfinic above) Acid O0(-H0)-S0(=O0)(=O0)-C0 Hydroxide -1.0 # hydroxamic Acid O0(-H0)-N0(-H0)-C0=O0 Hydroxide 8.5 Acid O0(-H0)-N0(-C3)-C0=O0 Hydroxide 8.5 # estimated # sulfonamides # aromatic or alkene Acid N0(-H0)(-C2=C2)-S0(=O0)(=O0)-C0 N-minus 8.2 # pyridyl Acid N0(-H0)(-C2=N2)-S0(=O0)(=O0)-C0 N-minus 8.2 # carbonyl Acid N0(-H0)(-C2=O2)-S0(=O0)(=O0)-C0 N-minus 4.5 # alkyl Acid N0(-H0)(-C3)-S0(=O0)(=O0)-C0 N-minus 11.6 # Chlorine Acid N0(-H0)(-Cl)-S0(=O0)(=O0)-C0 N-minus 4.5 # tetrazole Acid N0(-H0)-N0=N0-N0=C2-1 N-minus 4.5 Acid N0(-H0)-N0=N0-C2=N0-1 N-minus 4.5 # phenol Acid O3(-H0)-C2*C2(-00)*C2(-00)*C2(-00)*C2(-00)*C2(-00)*3 Hydroxide 10.0 # Regarding the phenol pattern, one co-worker said -- # # "The phenol pattern above correctly works with most of my tested # phenols and non-phenols. It is great for single carbon aromatic # rings, and avoids greedy matches with compounds it should not. # However, the pattern above does not match some canonical resonance # structures of polycyclic benzenoid aromatics." # # "My opinion is that the pattern is pretty good, and certainly # better than not having it at all. The polycyclic benzenoid # compounds are vexing." # 2-nitrosophenol/2-nitrophenol Acid O0(-H0)-C2*C2(-N0=O0)*C2(-00)*C2(-00)*C2(-00)*C2(-00)*3 Hydroxide 6.5 # 4-nitrosophenol/4-nitrophenol Acid O0(-H0)-C2*C2(-00)*C2(-00)*C2(-N0=O0)*C2(-00)*C2(-00)*3 Hydroxide 6.5 # 3,5-dinitrosophenol/3,5-dinitrophenol Acid O0(-H0)-C2*C2(-00)*C2(-N0=O0)*C2(-00)*C2(-N0=O0)*C2(-00)*3 Hydroxide 6.7 # alkylthiol Acid S0(-H0)-C3 Thiolate 9.5 # thiophenol Acid S0(-H0)-C2*C2(-00)*C2(-00)*C2(-00)*C2(-00)*C2(-00)*3 Thiolate 6.6 Acid S0(-H0)-C2*N2*C2(-00)*C2(-00)*C2(-00)*C2(-00)*3 Thiolate 6.6 Acid S0(-H0)-C2*C2(-00)*N2*C2(-00)*C2(-00)*C2(-00)*3 Thiolate 6.6 Acid S0(-H0)-C2*C2(-00)*C2(-00)*N2*C2(-00)*C2(-00)*3 Thiolate 6.6 Acid S0(-H0)-C2*N2*C2(-00)*N2*C2(-00)*C2(-00)*3 Thiolate 6.6 # Don't match sulfinic acid tautomer Exclude S0(-H0)(=O0)(=O0) ############################################################## # In many ring patterns, both above and below here, note the # use of *C2(-00)*, where -00 (two zeroes) signifies single # bond to any atom. This restricts the matches to only those # substructures with aromatic carbons. We have to do this in # light of our atom types. As one co-worker explained -- # # "[B]ecause of our broad definition of the aromatic C (*C2*), # our patterns would [otherwise] match quinone or uracyl type # compounds. Hence we [...] have to impose a single-bonded # substituent on every aromatic C to distinguish between them # and carbonyl type C2." ############################################################## ############################################################## # Specs for bases, to be protonated # We require Base pattern leading Nitrogens to be uncharged, # to ignore some input molecules' N+ atoms otherwise matching # these patterns; hence the "{0}" qualifiers, which are the # mmsubs-extension syntax for zero formal charge. # dialkylaniline Base N0{0}(-C3)(-C3)-C2*C2(-00)*C2(-00)*C2(-00)*C2(-00)*C2(-00)*4 Ammonium 4.5 # amine Base N0{0}(-H0)(-H0)-C3 Ammonium 10.5 # primary Base N0{0}(-H0)(-C3)-C3 Ammonium 11.0 # secondary Base N0{0}(-C3)(-C3)-C3 Ammonium 10.0 # tertiary # hydrazine Base N0{0}(-H0)(-H0)-N3 Ammonium 10.0 # primary Base N0{0}(-H0)(-C3)-N3 Ammonium 10.0 # secondary Base N0{0}(-C3)(-C3)-N3 Ammonium 10.0 # tertiary # imine Base N0{0}(-H0)=C2(-H0)-C3 Ammonium 11.5 Base N0{0}(-H0)=C2(-C3)-C3 Ammonium 11.5 Base N0{0}(-C3)=C2(-H0)-C3 Ammonium 11.5 Base N0{0}(-C3)=C2(-C3)-C3 Ammonium 11.5 # amidine Base N0{0}(-H0)=C2(-H0)-N0-H0 Ammonium 12.0 Base N0{0}(-H0)=C2(-H0)-N0-C3 Ammonium 12.0 Base N0{0}(-H0)=C2(-C0)-N0-H0 Ammonium 12.0 Base N0{0}(-H0)=C2(-C0)-N0-C3 Ammonium 12.0 Base N0{0}(-C3)=C2(-H0)-N0-H0 Ammonium 12.0 Base N0{0}(-C3)=C2(-H0)-N0-C3 Ammonium 12.0 Base N0{0}(-C3)=C2(-C0)-N0-H0 Ammonium 12.0 Base N0{0}(-C3)=C2(-C0)-N0-C3 Ammonium 12.0 # guanidine Base N0{0}(-H0)=C2(-N0-H0)-N0-H0 Ammonium 12.5 Base N0{0}(-H0)=C2(-N0-H0)-N0-C3 Ammonium 12.5 Base N0{0}(-H0)=C2(-N0-C3)-N0-H0 Ammonium 12.5 Base N0{0}(-H0)=C2(-N0-C3)-N0-C3 Ammonium 12.5 Base N0{0}(-C3)=C2(-N0-H0)-N0-H0 Ammonium 12.5 Base N0{0}(-C3)=C2(-N0-H0)-N0-C3 Ammonium 12.5 Base N0{0}(-C3)=C2(-N0-C3)-N0-H0 Ammonium 12.5 Base N0{0}(-C3)=C2(-N0-C3)-N0-C3 Ammonium 12.5 # enamine Base N0{0}(-H0)(-H0)-C2(-H0)=C2 Ammonium 10.5 # primary Base N0{0}(-H0)(-H0)-C2(-C3)=C2 Ammonium 10.5 # primary Base N0{0}(-H0)(-C3)-C2(-H0)=C2 Ammonium 10.5 # primary Base N0{0}(-H0)(-C3)-C2(-C3)=C2 Ammonium 10.5 # primary Base N0{0}(-C3)(-C3)-C2(-H0)=C2 Ammonium 10.5 # primary Base N0{0}(-C3)(-C3)-C2(-C3)=C2 Ammonium 10.5 # primary # # Screen out some matches on enamines above -- # Exclude N0{0}(-H0)(-H0)-C2=C2-C2=O0 Exclude N0{0}(-H0)(-C3)-C2=C2-C2=O0 Exclude N0{0}(-C3)(-C3)-C2=C2-C2=O0 # Exclude N0{0}(-H0)(-H0)-C2=C2-C1%N0 Exclude N0{0}(-H0)(-C3)-C2=C2-C1%N0 Exclude N0{0}(-C3)(-C3)-C2=C2-C1%N0 # aniline Base N0{0}(-H0)(-H0)-C2*C2(-00)*C2(-00)*C2(-00)*C2(-00)*C2(-00)*4 Ammonium 4.5 # primary Base N0{0}(-H0)(-C0)-C2*C2(-00)*C2(-00)*C2(-00)*C2(-00)*C2(-00)*4 Ammonium 4.5 # secondary #the problem with the rule above is that compounds also get protonated, if the N is part of a ring Exclude N0*C2*C2*C2*C2*1 #anniliertes pyrrole Exclude N0*C2*N0*C2*C2*1 #anniliertes imidazole Exclude N0*N0*C2*C2*C2*1 #anniliertes pyrazole (?) #Base N0{0}(-C0)(-C0)-C2*C2(-00)*C2(-00)*C2(-00)*C2(-00)*C2(-00)*4 Ammonium 4.5 # tertiary # 1,4-diaminobenzene Base N0{0}(-H0)(-H0)-C2*C2(-00)*C2(-00)*C2(-N0(-H0)(-H0))*C2(-00)*C2(-00)*4 Ammonium 6.2 Base N0{0}(-H0)(-H0)-C2*C2(-00)*C2(-00)*C2(-N0(-H0)(-C3))*C2(-00)*C2(-00)*4 Ammonium 6.2 Base N0{0}(-H0)(-H0)-C2*C2(-00)*C2(-00)*C2(-N0(-C3)(-C3))*C2(-00)*C2(-00)*4 Ammonium 6.2 Base N0{0}(-H0)(-C3)-C2*C2(-00)*C2(-00)*C2(-N0(-H0)(-H0))*C2(-00)*C2(-00)*4 Ammonium 6.0 Base N0{0}(-H0)(-C3)-C2*C2(-00)*C2(-00)*C2(-N0(-H0)(-C3))*C2(-00)*C2(-00)*4 Ammonium 6.0 Base N0{0}(-H0)(-C3)-C2*C2(-00)*C2(-00)*C2(-N0(-C3)(-C3))*C2(-00)*C2(-00)*4 Ammonium 6.0 Base N0{0}(-C3)(-C3)-C2*C2(-00)*C2(-00)*C2(-N0(-H0)(-H0))*C2(-00)*C2(-00)*4 Ammonium 6.0 Base N0{0}(-C3)(-C3)-C2*C2(-00)*C2(-00)*C2(-N0(-H0)(-C3))*C2(-00)*C2(-00)*4 Ammonium 6.0 Base N0{0}(-C3)(-C3)-C2*C2(-00)*C2(-00)*C2(-N0(-C3)(-C3))*C2(-00)*C2(-00)*4 Ammonium 6.0 # N-heterocycles.... # imidazole Base N0{0}=C2-N0-C2=C2-1 Ammonium 7.0 # pyridine Base N2{0}*C2(-00)*C2(-00)*C2(-00)*C2(-00)*C2(-00)*1 Ammonium 5.5 # 2-aminopyridine (ACD labs says 6.7) Base N2{0}*C2(-N0(-H0)(-H0))*C2(-00)*C2(-00)*C2(-00)*C2(-00)*1 Ammonium 7.5 Base N2{0}*C2(-N0(-H0)(-C3))*C2(-00)*C2(-00)*C2(-00)*C2(-00)*1 Ammonium 7.5 Base N2{0}*C2(-N0(-C3)(-C3))*C2(-00)*C2(-00)*C2(-00)*C2(-00)*1 Ammonium 7.5 #3-aminopyridine has got an pKa of 6.2 in ACD labs #RB Base N2{0}*C2(-00)*C2(-N0(-H0)(-H0))*C2(-00)*C2(-00)*C2(-00)*1 Ammonium 6.2 # 4-aminopyridine Base N2{0}*C2(-00)*C2(-00)*C2(-N0(-H0)(-H0))*C2(-00)*C2(-00)*1 Ammonium 9.0 Base N2{0}*C2(-00)*C2(-00)*C2(-N0(-H0)(-C3))*C2(-00)*C2(-00)*1 Ammonium 9.0 Base N2{0}*C2(-00)*C2(-00)*C2(-N0(-C3)(-C3))*C2(-00)*C2(-00)*1 Ammonium 9.0 # 4-methoxypyridine Base N2{0}*C2(-00)*C2(-00)*C2(-O0-C3)*C2(-00)*C2(-00)*1 Ammonium 6.5 # new rules for 3-aminopyridazine (pyridazine itself has got a pKa of about 2.5) Base N2{0}*N2*C2(-N0(-H0)(-H0))*C2*C2*C2*1 Ammonium 5.0 # new rules for 4-aminopyridazine (pyridazine itself has got a pKa of about 2.5) Base N2{0}*N2*C2*C2(-N0(-H0)(-H0))*C2*C2*1 Ammonium 6.5 # new rules for 2-aminopyrazine (pyrazine itself has got a pKa of about -0.5) Base N2{0}*C2(-N0(-H0)(-H0))*C2*N2*C2*C2*1 Ammonium 3.0 #new rule for 4-aminopyrimidne Base N2{0}*C2(-00)*N0*C2(-N0(-H0)(-H0))*C2*C2*1 Ammonium 5.5 # new rule for 2,4-aminopyrimidine (2-aminopyrimidine has got a pKa of about 2.6) Base N2{0}*C2(-N0(-H0)(-H0))*N2*C2(-N0(-H0)(-H0))*C2*C2*1 Ammonium 6.5 #new rule for 2amino-thiazole Base N2{0}*C2(-N0(-H0)(-H0))*S0*C2(-00)*C2(-00)*1 Ammonium 5.5 # purine # (supersedes match on imidazole above) and other heterocycles (new placement, RB) #Exclude N0=C2-N0-C2*N0*C2(-00)*N0*C2*C2(-1)(*5) #Exclude N2*C2*N2*C2*C2*1 Exclude N0*C0*N0*C0*N0*C0*N0*C0*C0(*1)(*4) #Why does this rule not get applied????? ############################################################## # Alternative method for not matching N+ atoms # # Don't specify "{0}" in the Base N* patterns above, but # then, after all the Base N* patterns, Exclude matches on # Nitrogens with total bond order 4, which must be N+ atoms. # # With "{0}" qualifiers still in place above, the following # is redundant, that is, it has no effect. # # Because the following patterns are so simple, they may hit # some Nitrogens not matched by the Base patterns above. Any # such matches are simply ignored, since there are no prior # matches to undo. Exclude N0(-00)(-00)(-00)(-00) # 4 single bonds Exclude N0(-00)(-00)(=00) # 2 single + 1 double Exclude N0(=00)(=00) # 2 double Exclude N0(-00)(%00) # 1 single + 1 triple # exclude amides Exclude N0*C0-O0-H0 # one aromatic, one OH Exclude N0-C0=O0 # one aromatic, one OH ############################################################## # Please do not alter or remove the comments below capturing # the RCS revision info on this data file! The Ionizer's # handling of the -v|-ver|-version option expects to find # these here, in exactly this form -- # # VERSION $Revision: 1.40 $ # VERSION $Date: 2004/02/06 19:46:18 $ ############################################################## ############################# TI - Section ################################### # read vi /usr/local2/schrodinger/services-v10034/doc/mmsubs_syntax.txt # Own joha for TI complete deprotonation (peptide TI - C mit 2xO und N) # must be at the end to ensure that everything previous is overidden #Acid O0(-H0)-C3(-O0{-})-N0 Hydroxide -5.0 # deprotoniert oh-attacked amids #Acid O0(-H0)-C3(-O0-H0)-N0 Hydroxide -5.0 # deprotoniert oh-attacked amids #Acid O0(-H0)-C3(-N0)-N0 Hydroxide -5.0 # deprotoniert oh-attacked amidines #Acid O0(-H0)-C3(-O0{-})-N0 Hydroxide -5.0 # deprotoniert oh-attacked amids #Acid O0(-H0)-C3(-O0-H0)-N0 Hydroxide -5.0 # deprotoniert oh-attacked amids (if twice OH) Acid O0(-H0)-C3(-O0{-})-O0 Hydroxide 10.0 # leave O-H at oh-attacked ester,lactones (if once OH and once O-) #Acid O0(-H0)-C3(-O0-H0)-O0 Hydroxide 10.0 # leave O-H at oh-attacked ester,lactones #Acid O0(-H0)-C3(-O0-H0)-O0 Hydroxide -5.0 # deprotoniert oh-attacked ester,lactones #Acid O0(-H0)-C3(-O0{-})-O0 Hydroxide -5.0 # deprotoniert oh-attacked ester,lactones Acid O3(-H0)-C3-N0 Hydroxide -5.0 # leave O- on attacked C-N bonds Acid O3(-H0)-C3-S0 Hydroxide -5.0 # leave O- on attacked C-S bonds # Nicht-protonierung von O- attacked guanidinium bzw. deprotonierung von OH Acid O0(-H0)-C3(-N0)(-N0)-N0 Hydroxide -5 ############## Overriding ######################### # keep carboxylates deprotonated Acid O0(-H0)-C2(=O0)-C0 Hydroxide 0.0 #keep all phosphates deprotonated Acid O0(-H0)-P0-O0 Hydroxide -5.0 #except TIs Acid O0(-H0)-P0(-O0{-})(-00)(-00)-00 Hydroxide 14.0 # leave OH on P-Tis #################### Easiest way, just EXCLUDE TI structures! # CAVE: Only leading atom will be excluded # pkas set above don't need to be commented out, because they are excluded # O3 is somehow not recognized -> O0 #### 1. TI - O-negative (On) Exclude O0{-}-C3-O0 # for On-C-O Exclude O0{-}-C3-N0 # for On-C-N Exclude O0{-}-C3-S0 # for On-C-S Exclude O0{-}-P0(-O0-H0)(-00)(-00)-00 # for On-C-P #### 2. TI - OH (carbonyl - O) Exclude O0(-H0)-C3-O0{-} # for On-C-OH Exclude O0(-H0)-P0(-O0{-})(-00)(-00)-00 # leave OH on P-Tis #### 3. Leaving groups (artificially neutral or protonated) Exclude O0{+}(-H0)-C3-O0{-} # Protonated O-Leaving groups Exclude N3{+}(-H0)-C3-O0{-} # Protonated N-Leaving groups Exclude S0{+}(-H0)-C3-O0{-} # Protonated S-Leaving groups Exclude N0{+}(-H0)-C3-O0{-} # Protonated N-Leaving groups #Exclude N3{0}-C3-O0{-} # Neutral N-Leaving groups Exclude N3{0}-C3-O0{-} # Neutral N-Leaving groups ###############################################################################