Predictive Power of Time-Series Based Machine Learning Models for DMPK Measurements in Drug Discovery

The ability to predict corruption is crucial to policy. Using rich micro-data from Brazil, we show that multiple machine learning models display high levels of performance in predicting municipality-level corruption in public spending. We then quantify which individual municipality features and groups of similar characteristics have the highest predictive power. We find that measures of private sector activity, financial development, and human capital are the strongest predictors of corruption, while public sector and political features play a secondary role. Our findings have implications for the design and cost-effectiveness of various anti-corruption policies.

Download Full-text

Multivariate Classification of Drugs using Parametric and Nonparametric Machine Learning Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8740.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2021-2027

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Biological Activities ◽

Biological Effects ◽

Recursive Feature Elimination ◽

Drug Candidate ◽

Learning Models ◽

Machine Learning Models ◽

Non Parametric

In pharmaceutical research, traditional drug discovery process is time consuming and expensive, where several compounds are experimentally tested for their biological activities. Series of lab experiments are conducted to analyze newly synthesized drug’s pharmaceutical activities and its biological effects on human. With every new drug discovery, the required clinical properties can be determined using machine learning models and this greatly reduces the experimental cost. This paper explores parametric and non-parametric machine learning models to classify administration properties of drugs and its toxicity. The multinomial classification of drugs was based on their physicochemical and ADMET properties. Balanced data samples were drawn from chEMBL and was pre-processed. Features were reduced using Recursive Feature Elimination and the attributes were ranked based on their importance to reduce highly correlated attributes. The performance of parametric and non-parametric machine learning models was analyzed on cheminformatic data that includes physiochemical, biological and pharmaceutical properties of the drug molecules. Selecting the potent drug candidate along with its administration properties greatly reduces wet lab experimental time and cost. Multiclass classification can be determined efficiently using non-parametric machine learning model. Optimal feature engineering, tuning hyperparameters and adopting hybrid algorithms would result in more accurate predictions in future for cheminformatics data.

Download Full-text

Discriminating Postural Control Behaviors from Posturography with Statistical Tests and Machine Learning Models: Does Time Series Length Matter?

Lecture Notes in Computer Science - Computational Science – ICCS 2018 ◽

10.1007/978-3-319-93713-7_28 ◽

2018 ◽

pp. 350-357

Author(s):

Luiz H. F. Giovanini ◽

Elisangela F. Manffra ◽

Julio C. Nievola

Keyword(s):

Machine Learning ◽

Time Series ◽

Postural Control ◽

Statistical Tests ◽

Learning Models ◽

Series Length ◽

Machine Learning Models

Download Full-text

Intra-domain and cross-domain transfer learning for time series

10.5194/egusphere-egu21-12142 ◽

2021 ◽

Author(s):

Erik Otović ◽

Marko Njirjak ◽

Dario Jozinović ◽

Goran Mauša ◽

Alberto Michelini ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Transfer Learning ◽

Time Series Data ◽

The Other ◽

Series Data ◽

Sound Recognition ◽

Transfer Of Knowledge ◽

Learning Models ◽

Machine Learning Models

In this study, we compared the performance of machine learning models trained using transfer learning and those that were trained from scratch - on time series data. Four machine learning models were used for the experiment. Two models were taken from the field of seismology, and the other two are general-purpose models for working with time series data. The accuracy of selected models was systematically observed and analyzed when switching within the same domain of application (seismology), as well as between mutually different domains of application (seismology, speech, medicine, finance). In seismology, we used two databases of local earthquakes (one in counts, and the other with the instrument response removed) and a database of global earthquakes for predicting earthquake magnitude; other datasets targeted classifying spoken words (speech), predicting stock prices (finance) and classifying muscle movement from EMG signals (medicine). In practice, it is very demanding and sometimes impossible to collect datasets of tagged data large enough to successfully train a machine learning model. Therefore, in our experiment, we use reduced data sets of 1,500 and 9,000 data instances to mimic such conditions. Using the same scaled-down datasets, we trained two sets of machine learning models: those that used transfer learning for training and those that were trained from scratch. We compared the performances between pairs of models in order to draw conclusions about the utility of transfer learning. In order to confirm the validity of the obtained results, we repeated the experiments several times and applied statistical tests to confirm the significance of the results. The study shows when, within the set experimental framework, the transfer of knowledge brought improvements in terms of model accuracy and in terms of model convergence rate. Our results show that it is possible to achieve better performance and faster convergence by transferring knowledge from the domain of global earthquakes to the domain of local earthquakes; sometimes also vice versa. However, improvements in seismology can sometimes also be achieved by transferring knowledge from medical and audio domains. The results show that the transfer of knowledge between other domains brought even more significant improvements, compared to those within the field of seismology. For example, it has been shown that models in the field of sound recognition have achieved much better performance compared to classical models and that the domain of sound recognition is very compatible with knowledge from other domains. We came to similar conclusions for the domains of medicine and finance. Ultimately, the paper offers suggestions when transfer learning is useful, and the explanations offered can provide a good starting point for knowledge transfer using time series data.

Download Full-text

Structure–Property Relationships and Machine Learning Models for Addressing CYP3A4-Mediated Victim Drug–Drug Interaction Risk in Drug Discovery

Molecular Pharmaceutics ◽

10.1021/acs.molpharmaceut.0c00637 ◽

2020 ◽

Vol 17 (9) ◽

pp. 3600-3608

Author(s):

Bingjie Hu ◽

Xin Zhou ◽

Michael A. Mohutsky ◽

Prashant V. Desai

Keyword(s):

Machine Learning ◽

Drug Interaction ◽

Drug Discovery ◽

Structure Property ◽

Learning Models ◽

Structure Property Relationships ◽

Drug Drug Interaction ◽

Machine Learning Models

Download Full-text

An intelligent hybridization of ARIMA with machine learning models for time series forecasting

Knowledge-Based Systems ◽

10.1016/j.knosys.2019.03.011 ◽

2019 ◽

Vol 175 ◽

pp. 72-86 ◽

Cited By ~ 23

Author(s):

Domingos S. de O. Santos Júnior ◽

João F.L. de Oliveira ◽

Paulo S.G. de Mattos Neto

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Forecasting ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Multi-step Time Series Forecasting of Electric Load Using Machine Learning Models

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-91253-0_15 ◽

2018 ◽

pp. 148-159 ◽

Cited By ~ 7

Author(s):

Shamsul Masum ◽

Ying Liu ◽

John Chiverton

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Forecasting ◽

Electric Load ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Predicting Benzene Concentration Using Machine Learning and Time Series Algorithms

Mathematics ◽

10.3390/math8122205 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2205

Author(s):

Luis Alfonso Menéndez García ◽

Fernando Sánchez Lasheras ◽

Paulino José García Nieto ◽

Laura Álvarez de Prado ◽

Antonio Bernardo Sánchez

Keyword(s):

Machine Learning ◽

Time Series ◽

Moving Average ◽

Environmental Pollutants ◽

Multivariate Adaptive Regression Splines ◽

Support Vector ◽

Learning Models ◽

Vector Autoregressive ◽

Benzene Concentration ◽

Machine Learning Models

Benzene is a pollutant which is very harmful to our health, so models are necessary to predict its concentration and relationship with other air pollutants. The data collected by eight stations in Madrid (Spain) over nine years were analyzed using the following regression-based machine learning models: multivariate linear regression (MLR), multivariate adaptive regression splines (MARS), multilayer perceptron neural network (MLP), support vector machines (SVM), autoregressive integrated moving-average (ARIMA) and vector autoregressive moving-average (VARMA) models. Benzene concentration predictions were made from the concentration of four environmental pollutants: nitrogen dioxide (NO2), nitrogen oxides (NOx), particulate matter (PM10) and toluene (C7H8), and the performance measures of the model were studied from the proposed models. In general, regression-based machine learning models are more effective at predicting than time series models.

Download Full-text