Comparison of Multi-class and Binary Classification Machine Learning Models in Identifying Strong Gravitational Lenses

AbstractBackgroundMachine learning models have proven to be useful tools for the analysis of genetic data. However, with the availability of a wide variety of such methods, model selection has become increasingly difficult, both from the human and computational perspective.ResultsWe present the R package FRESA.CAD Binary Classification Benchmarking that performs systematic comparisons between a collection of representative machine learning methods for solving binary classification problems on genetic datasets.ConclusionsFRESA.CAD Binary Benchmarking demonstrates to be a useful tool over a variety of binary classification problems comprising the analysis of genetic data showing both quantitative and qualitative advantages over similar packages.

Download Full-text

Predicting Anesthetic Infusion Events Using Machine Learning

10.21203/rs.3.rs-783161/v1 ◽

2021 ◽

Author(s):

Naoki Miyaguchi ◽

Koh Takeuchi ◽

Hisashi Kashima ◽

Mizuki Morita ◽

Hiroshi Morimatsu

Keyword(s):

Machine Learning ◽

Flow Rate ◽

Short Term Memory ◽

Binary Classification ◽

Classification Problem ◽

Clinical Findings ◽

Support Vector ◽

Learning Models ◽

Continuous Administration ◽

Machine Learning Models

Abstract Recently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; this demonstrates the potential to predict the decisions made by anesthesiologists using machine learning. Furthermore, we examined the importance and contribution of the features of each model using shapley additive explanations—a method for interpreting predictions made by machine learning models. The trends indicated by the results were partially consistent with known clinical findings.

Download Full-text

Detecting Arsenic Contamination Using Satellite Imagery and Machine Learning

Toxics ◽

10.3390/toxics9120333 ◽

2021 ◽

Vol 9 (12) ◽

pp. 333

Author(s):

Ayush Agrawal ◽

Mark R. Petersen

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mean Squared Error ◽

Binary Classification ◽

Arsenic Concentration ◽

Arsenic Contamination ◽

Hyperspectral Data ◽

Detection Methods ◽

Learning Models ◽

Machine Learning Models

Arsenic, a potent carcinogen and neurotoxin, affects over 200 million people globally. Current detection methods are laborious, expensive, and unscalable, being difficult to implement in developing regions and during crises such as COVID-19. This study attempts to determine if a relationship exists between soil’s hyperspectral data and arsenic concentration using NASA’s Hyperion satellite. It is the first arsenic study to use satellite-based hyperspectral data and apply a classification approach. Four regression machine learning models are tested to determine this correlation in soil with bare land cover. Raw data are converted to reflectance, problematic atmospheric influences are removed, characteristic wavelengths are selected, and four noise reduction algorithms are tested. The combination of data augmentation, Genetic Algorithm, Second Derivative Transformation, and Random Forest regression (R2=0.840 and normalized root mean squared error (re-scaled to [0,1]) = 0.122) shows strong correlation, performing better than past models despite using noisier satellite data (versus lab-processed samples). Three binary classification machine learning models are then applied to identify high-risk shrub-covered regions in ten U.S. states, achieving strong accuracy (=0.693) and F1-score (=0.728). Overall, these results suggest that such a methodology is practical and can provide a sustainable alternative to arsenic contamination detection.

Download Full-text

An Explainable Machine Learning Model for Material Backorder Prediction in Inventory Management

Sensors ◽

10.3390/s21237926 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7926

Author(s):

Charis Ntakolia ◽

Christos Kokkotis ◽

Patrik Karlsson ◽

Serafeim Moustakidis

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Inventory Management ◽

Historical Data ◽

Binary Classification ◽

Production Costs ◽

Correct Prediction ◽

Learning Models ◽

Future Production ◽

Machine Learning Models

Global competition among businesses imposes a more effective and low-cost supply chain allowing firms to provide products at a desired quality, quantity, and time, with lower production costs. The latter include holding cost, ordering cost, and backorder cost. Backorder occurs when a product is temporarily unavailable or out of stock and the customer places an order for future production and shipment. Therefore, stock unavailability and prolonged delays in product delivery will lead to additional production costs and unsatisfied customers, respectively. Thus, it is of high importance to develop models that will effectively predict the backorder rate in an inventory system with the aim of improving the effectiveness of the supply chain and, consequentially, the performance of the company. However, traditional approaches in the literature are based on stochastic approximation, without incorporating information from historical data. To this end, machine learning models should be employed for extracting knowledge of large historical data to develop predictive models. Therefore, to cover this need, in this study, the backorder prediction problem was addressed. Specifically, various machine learning models were compared for solving the binary classification problem of backorder prediction, followed by model calibration and a post-hoc explainability based on the SHAP model to identify and interpret the most important features that contribute to material backorder. The results showed that the RF, XGB, LGBM, and BB models reached an AUC score of 0.95, while the best-performing model was the LGBM model after calibration with the Isotonic Regression method. The explainability analysis showed that the inventory stock of a product, the volume of products that can be delivered, the imminent demand (sales), and the accurate prediction of the future demand can significantly contribute to the correct prediction of backorders.

Download Full-text

The use of machine learning methods in the development of nasal dosage forms with cerebroprotective action

Current issues in pharmacy and medicine science and practice ◽

10.14739/2409-2932.2021.2.232053 ◽

2021 ◽

Vol 14 (2) ◽

pp. 232-238

Author(s):

B. S. Burlaka ◽

I. F. Bielenichev

Keyword(s):

Machine Learning ◽

In Silico ◽

High Reliability ◽

Binary Classification ◽

Dosage Forms ◽

Training Dataset ◽

Learning Models ◽

Pharmaceutical Ingredients ◽

Machine Learning Models ◽

Rational Composition

In order to save resource of active pharmaceutical ingredients and excipients, in the early stages of research, when planning an experiment, it is advisable to use data of the predicted and experimental physicochemical properties stored in different aggregation databases. The information found will reduce the time for composition development and for technology processing. However, the variety of active compounds characteristics and excipients is not always reflected in these services. Recently, machine learning models have been widely used in various scientific fields; they allow to obtain predictions with high reliability. Given the above, it is relevant and promising to develop models of machine learning to predict the presence of pharmaceutical incompatibilities in the formulation of nasal dosage forms. The aim of the study is to develop models of machine learning for in silico forecast of the rational composition of nasal dosage forms with cerebroprotective action. Materials and methods. A dataset, containing data on compounds (active and auxiliary) and characteristics on the presence or absence of interaction (pharmaceutical incompatibility), was used as material. Training datasets were filled by content analysis of PubMed library data (pubmed.ncbi.nlm.nih.gov) manually, by keywords “pharmaceutical incompatibilities”, “physico-chemical compatibility”, “incompatible excipients”) for the last 10 years. The resulting dataset comprises 1185 lines. The methods employed were a set of methods for binary classification of machine learning (pycaret.org) using the programming language Python 3.8 (python.org) in the package management environment Miniconda (conda.io). Pipeline programming was performed using Jupyter notebook package (jupyter.org). The generation of MACCS (Molecular ACCess System keys) in the training dataset was performed using RDKit package (rdkit.org). Specifications of the simplified representation of molecules in the input line (SMILES), in automatic mode, were searched using PubChem service (pubchem.ncbi.nlm.nih.gov). Results. The obtained data allowed to choose two perspective models of machine learning of binary classification, whose quality was checked on a dataset for verification. Statistical evaluations of the selected models indicate a high probability of in silico prognosis for the presence or absence of pharmaceutical incompatibilities in the development of nasal formulations of cerebroprotective dosage forms. They are posted on the web server of the expert system ExpSys Nasalia (nasalia.zsmu.zp.ua) in the calculations section. Conclusions. As a result of our research, we have developed machine learning models for in silico prediction of the rational composition of nasal dosage forms with cerebroprotective action. Confirmation of the quality of the pharmaceutical incompatibilities prediction, using the developed models, is checked on a dataset for check. The statistical indicators of the tree_blender (AUC 0.9521, F1 0.9747, MCC 0.9094) and boost_blender (AUC 0.9593, F1 0.9821, MCC 0.9352) models were obtained. The use of machine learning models in pharmaceutical development will contribute to resource conservation and optimization of the composition of the formulation.

Download Full-text

Using a Binary Classification Approach to Assess the Accuracy of Hand Posture and Force Estimation with Machine Learning Models

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181321651205 ◽

2021 ◽

Vol 65 (1) ◽

pp. 1248-1249

Author(s):

Mengcheng Wang ◽

Chuan Zhao ◽

Alan Barr ◽

Suihuai Yu ◽

Jay Kapellusch ◽

...

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Confusion Matrix ◽

Strong Relationship ◽

Performance Comparison ◽

Learning Models ◽

Force Estimation ◽

Classification Approach ◽

Future Performance ◽

Machine Learning Models

Recent studies have successfully reported the accuracy of using artificial neural networks to predict grip force in controlled settings. However, only relying on accuracy to evaluate the machine learning models may lead to overoptimistic results, especially on imbalanced datasets. The Matthews correlation coefficient (MCC) showed an advantage in capturing all the data characteristics in the confusion matrix. Therefore, a binary classification approach and the MCC value were introduced to assess the performance of previously proposed machine learning models. Our results show that the overall correlations ranging between 0.48 and 0.59 indicate a strong relationship between predictions and actual scenarios. The binary classification approach and the MCC values could be used for future performance comparison with other machine learning models.

Download Full-text

Predicting anesthetic infusion events using machine learning

Scientific Reports ◽

10.1038/s41598-021-03112-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Naoki Miyaguchi ◽

Koh Takeuchi ◽

Hisashi Kashima ◽

Mizuki Morita ◽

Hiroshi Morimatsu

Keyword(s):

Machine Learning ◽

Flow Rate ◽

Short Term Memory ◽

Binary Classification ◽

Classification Problem ◽

Clinical Findings ◽

Support Vector ◽

Learning Models ◽

Continuous Administration ◽

Machine Learning Models

AbstractRecently, research has been conducted to automatically control anesthesia using machine learning, with the aim of alleviating the shortage of anesthesiologists. In this study, we address the problem of predicting decisions made by anesthesiologists during surgery using machine learning; specifically, we formulate a decision making problem by increasing the flow rate at each time point in the continuous administration of analgesic remifentanil as a supervised binary classification problem. The experiments were conducted to evaluate the prediction performance using six machine learning models: logistic regression, support vector machine, random forest, LightGBM, artificial neural network, and long short-term memory (LSTM), using 210 case data collected during actual surgeries. The results demonstrated that when predicting the future increase in flow rate of remifentanil after 1 min, the model using LSTM was able to predict with scores of 0.659 for sensitivity, 0.732 for specificity, and 0.753 for ROC-AUC; this demonstrates the potential to predict the decisions made by anesthesiologists using machine learning. Furthermore, we examined the importance and contribution of the features of each model using Shapley additive explanations—a method for interpreting predictions made by machine learning models. The trends indicated by the results were partially consistent with known clinical findings.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text