Aim:
Cheminformatics models are able to predict different outputs (activity, property,
chemical reactivity) in single molecules or complex molecular systems (catalyzed organic synthesis,
metabolic reactions, nanoparticles, etc.).
Background:
Cheminformatics models are able to predict different outputs (activity, property, chemical
reactivity) in single molecules or complex molecular systems (catalyzed organic synthesis, metabolic
reactions, nanoparticles, etc.).
Objective:
Cheminformatics prediction of complex catalytic enantioselective reactions is a major goal
in organic synthesis research and chemical industry. Markov Chain Molecular Descriptors (MCDs)
have been largely used to solve Cheminformatics problems. There are different types of Markov chain
descriptors such as Markov-Shannon entropies (Shk), Markov Means (Mk), Markov Moments (πk), etc.
However, there are other possible MCDs that have not been used before. In addition, the calculation of
MCDs is done very often using specific software not always available for general users and there is not
an R library public available for the calculation of MCDs. This fact, limits the availability of MCMDbased
Cheminformatics procedures.
Methods:
We studied the enantiomeric excess ee(%)[Rcat] for 324 α-amidoalkylation reactions. These
reactions have a complex mechanism depending on various factors. The model includes MCDs of the
substrate, solvent, chiral catalyst, product along with values of time of reaction, temperature, load of
catalyst, etc. We tested several Machine Learning regression algorithms. The Random Forest regression
model has R2 > 0.90 in training and test. Secondly, the biological activity of 5644 compounds against
colorectal cancer was studied.
Results:
We developed very interesting model able to predict with Specificity and Sensitivity 70-82%
the cases of preclinical assays in both training and validation series.
Conclusion:
The work shows the potential of the new tool for computational studies in organic and medicinal
chemistry.