Discovery of novel Li SSE and anode coatings using interpretable machine learning and high-throughput multi-property screening

AbstractAll-solid-state batteries with Li metal anode can address the safety issues surrounding traditional Li-ion batteries as well as the demand for higher energy densities. However, the development of solid electrolytes and protective anode coatings possessing high ionic conductivity and good stability with Li metal has proven to be a challenge. Here, we present our informatics approach to explore the Li compound space for promising electrolytes and anode coatings using high-throughput multi-property screening and interpretable machine learning. To do this, we generate a database of battery-related materials properties by computing $$\hbox {Li}^+$$ Li + migration barriers and stability windows for over 15,000 Li-containing compounds from Materials Project. We screen through the database for candidates with good thermodynamic and electrochemical stabilities, and low $$\hbox {Li}^+$$ Li + migration barriers, identifying promising new candidates such as $$\hbox {Li}_9\hbox {S}_3$$ Li 9 S 3 N, $$\hbox {LiAlB}_2\hbox {O}_5$$ LiAlB 2 O 5 , $$\hbox {LiYO}_2$$ LiYO 2 , $$\hbox {LiSbF}_4$$ LiSbF 4 , and $$\hbox {Sr}_4\hbox {Li}(\hbox {BN}_2)_3$$ Sr 4 Li ( BN 2 ) 3 , among others. We train machine learning models, using ensemble methods, to predict migration barriers and oxidation and reduction potentials of these compounds by engineering input features that ensure accuracy and interpretability. Using only a small number of features, our gradient boosting regression models achieve $$\mathrm {R}^2$$ R 2 values of 0.95 and 0.92 on the oxidation and reduction potential prediction tasks, respectively, and 0.86 on the migration barrier prediction task. Finally, we use Shapley additive explanations and permutation feature importance analyses to interpret our machine learning predictions and identify materials properties with the largest impact on predictions in our models. We show that our approach has the potential to enable rapid discovery and design of novel solid electrolytes and anode coatings.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

Interpretable Machine Learning for Early Neurological Deterioration Prediction in Atrial Fibrillation-Related Stroke

10.21203/rs.3.rs-446890/v1 ◽

2021 ◽

Author(s):

Seong Hwan Kim ◽

Eun-Tae Jeon ◽

Sungwook Yu ◽

Kyungmi O ◽

Chi Kyung Kim ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Neurological Deterioration ◽

Gradient Boosting ◽

Support Vector ◽

Light Gradient ◽

Interpretable Machine Learning ◽

Extreme Gradient Boosting ◽

Early Neurological Deterioration ◽

Feature Importance

Abstract We aimed to develop a novel prediction model for early neurological deterioration (END) based on an interpretable machine learning (ML) algorithm for atrial fibrillation (AF)-related stroke and to evaluate the prediction accuracy and feature importance of ML models. Data from multi-center prospective stroke registries in South Korea were collected. After stepwise data preprocessing, we utilized logistic regression, support vector machine, extreme gradient boosting, light gradient boosting machine (LightGBM), and multilayer perceptron models. We used the Shapley additive explanations (SHAP) method to evaluate feature importance. Of the 3,623 stroke patients, the 2,363 who had arrived at the hospital within 24 hours of symptom onset and had available information regarding END were included. Of these, 318 (13.5%) had END. The LightGBM model showed the highest area under the receiver operating characteristic curve (0.778, 95% CI, 0.726 - 0.830). The feature importance analysis revealed that fasting glucose level and the National Institute of Health Stroke Scale score were the most influential factors. Among ML algorithms, the LightGBM model was particularly useful for predicting END, as it revealed new and diverse predictors. Additionally, the SHAP method can be adjusted to individualize the features’ effects on the predictive power of the model.

Download Full-text

Building more accurate decision trees with the additive tree

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1816748116 ◽

2019 ◽

Vol 116 (40) ◽

pp. 19887-19893 ◽

Cited By ~ 15

Author(s):

José Marcio Luna ◽

Efstathios D. Gennatas ◽

Lyle H. Ungar ◽

Eric Eaton ◽

Eric S. Diffenderfer ◽

...

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Ensemble Methods ◽

Predictive Performance ◽

Additive Models ◽

Gradient Boosting ◽

Clear Understanding ◽

High Stakes ◽

Additive Tree ◽

Full Interaction

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.

Download Full-text

Garnet-type solid electrolyte: Advances of ionic transport performance and its application in all-solid-state batteries

Journal of Advanced Ceramics ◽

10.1007/s40145-021-0489-7 ◽

2021 ◽

Author(s):

P. M. Gonzalez Puente ◽

Shangbin Song ◽

Shiyu Cao ◽

Leana Ziwen Rannalter ◽

Ziwen Pan ◽

...

Keyword(s):

Solid State ◽

Solid Electrolyte ◽

Solid Electrolytes ◽

High Energy ◽

Environmental Stability ◽

Preparation Methods ◽

Metal Anode ◽

Safety Issues ◽

Electrochemical Window ◽

Increasing Demand

AbstractAll-solid-state lithium batteries (ASSLBs), which use solid electrolytes instead of liquid ones, have become a hot research topic due to their high energy and power density, ability to solve battery safety issues, and capabilities to fulfill the increasing demand for energy storage in electric vehicles and smart grid applications. Garnet-type solid electrolytes have attracted considerable interest as they meet all the properties of an ideal solid electrolyte for ASSLBs. The garnet-type Li7La3Zr2O12 (LLZO) has excellent environmental stability; experiments and computational analyses showed that this solid electrolyte has a high lithium (Li) ionic conductivity (10−4–10−3 S·cm−1), an electrochemical window as wide as 6 V, stability against Li metal anode, and compatibility with most of the cathode materials. In this review, we present the fundamentals of garnet-type solid electrolytes, preparation methods, air stability, some strategies for improving the conductivity based on experimental and computational results, interfacial issues, and finally applications and challenges for future developments of LLZO solid electrolytes for ASSLBs.

Download Full-text

Prediction of Employee Attrition Using Machine Learning and Ensemble Methods

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2021.11.2.1022 ◽

2021 ◽

Vol 11 (2) ◽

pp. 110-114

Author(s):

Aseel Qutub ◽

◽

Asmaa Al-Mehmadi ◽

Munirah Al-Hssan ◽

Ruyan Aljohani ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Professional Training ◽

Ensemble Methods ◽

Gradient Boosting ◽

Learning Models ◽

Retention Strategies ◽

Employee Attrition ◽

The Cost ◽

Machine Learning Models

Employees are the most valuable resources for any organization. The cost associated with professional training, the developed loyalty over the years and the sensitivity of some organizational positions, all make it very essential to identify who might leave the organization. Many reasons can lead to employee attrition. In this paper, several machine learning models are developed to automatically and accurately predict employee attrition. IBM attrition dataset is used in this work to train and evaluate machine learning models; namely Decision Tree, Random Forest Regressor, Logistic Regressor, Adaboost Model, and Gradient Boosting Classifier models. The ultimate goal is to accurately detect attrition to help any company to improve different retention strategies on crucial employees and boost those employee satisfactions.

Download Full-text

Interpretable machine learning for demand modeling with high-dimensional data using Gradient Boosting Machines and Shapley values

Journal of Revenue and Pricing Management ◽

10.1057/s41272-020-00236-4 ◽

2020 ◽

Vol 19 (5) ◽

pp. 355-364

Author(s):

Evgeny A. Antipov ◽

Elena B. Pokryshevskaya

Keyword(s):

Machine Learning ◽

High Dimensional Data ◽

High Dimensional ◽

Gradient Boosting ◽

Demand Modeling ◽

Interpretable Machine Learning ◽

Shapley Values

Download Full-text

Prediction of amyloid β PET positivity using machine learning in patients with suspected cerebral amyloid angiopathy markers

Scientific Reports ◽

10.1038/s41598-020-75664-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Young Hee Jung ◽

Hyejoo Lee ◽

Hee Jin Kim ◽

Duk L. Na ◽

Hyun Jeong Han ◽

...

Keyword(s):

Machine Learning ◽

Cerebral Amyloid Angiopathy ◽

Characteristic Curve ◽

Amyloid Β ◽

Machine Learning Algorithms ◽

Vascular Pathology ◽

Gradient Boosting ◽

Amyloid Angiopathy ◽

Interpretable Machine Learning ◽

Cerebral Amyloid

Abstract Amyloid-β(Aβ) PET positivity in patients with suspected cerebral amyloid angiopathy (CAA) MRI markers is predictive of a worse cognitive trajectory, and it provides insights into the underlying vascular pathology (CAA vs. hypertensive angiopathy) to facilitate prognostic prediction and appropriate treatment decisions. In this study, we applied two interpretable machine learning algorithms, gradient boosting machine (GBM) and random forest (RF), to predict Aβ PET positivity in patients with CAA MRI markers. In the GBM algorithm, the number of lobar cerebral microbleeds (CMBs), deep CMBs, lacunes, CMBs in dentate nuclei, and age were ranked as the most influential to predict Aβ positivity. In the RF algorithm, the absence of diabetes was additionally chosen. Cut-off values of the above variables predictive of Aβ positivity were as follows: (1) the number of lobar CMBs > 16.4(GBM)/14.3(RF), (2) no deep CMBs(GBM/RF), (3) the number of lacunes > 7.4(GBM/RF), (4) age > 74.3(GBM)/64(RF), (5) no CMBs in dentate nucleus(GBM/RF). The classification performances based on the area under the receiver operating characteristic curve were 0.83 in GBM and 0.80 in RF. Our study demonstrates the utility of interpretable machine learning in the clinical setting by quantifying the relative importance and cutoff values of predictive variables for Aβ positivity in patients with suspected CAA markers.

Download Full-text

Interpretable machine learning with an ensemble of gradient boosting machines

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.106993 ◽

2021 ◽

pp. 106993

Author(s):

Andrei V. Konstantinov ◽

Lev V. Utkin

Keyword(s):

Machine Learning ◽

Gradient Boosting ◽

Interpretable Machine Learning

Download Full-text

MrIML: Multi-response interpretable machine learning to map genomic landscapes

10.22541/au.160855820.09604024/v1 ◽

2020 ◽

Author(s):

Nichola Fountain-Jones ◽

Christopher Kozakiewicz ◽

Brenna Forester ◽

Erin Landguth ◽

Scott Carver ◽

...

Keyword(s):

Machine Learning ◽

Landscape Genetics ◽

Environmental Gradients ◽

Population Level ◽

Genetic Change ◽

Lynx Rufus ◽

Gradient Boosting ◽

Interpretable Machine Learning ◽

Wide Range ◽

Extreme Gradient Boosting

We introduce a new R package ‘MrIML’ (Multi-response Interpretable Machine Learning). MrIML provides a powerful and interpretable framework that enables users to harness recent advances in machine learning to map multi-locus genomic relationships, to identify loci of interest for future landscape genetics studies and to gain new insights into adaptation across environmental gradients. Relationships between genetic change and environment are often non-linear, interactive and autocorrelated. Our package helps capture this complexity and offers functions that construct, fit and conduct inference on a wide range of highly flexible models that are routinely used for single-locus landscape genetics studies but are rarely extended to estimate response functions for multiple loci. To demonstrate the package’s broad functionality, we test its ability to recover landscape relationships from simulated genomic data. We also apply the package to two empirical case studies. In the first we estimate variation in the population-level genetic composition of North American balsam poplar (Populus balsamifera, Salicaceae) and in the second we recover individual-level landscapes while estimating host drivers of feline immunodeficiency virus genetic spread in bobcats (Lynx rufus). The ability to model thousands of loci collectively and compare models from linear regression to extreme gradient boosting, within the same analytical framework, has the potential to be transformative. The MrIML framework is also extendable and not limited to mapping genetic change, for example, it can be used to quantify the environmental driver sof microbiomes and coinfection dynamics.

Download Full-text

A Novel Machine Learning Strategy for Prediction of Antihypertensive Peptides Derived from Food with High Efficiency

10.1101/2020.08.12.248955 ◽

2020 ◽

Author(s):

Liyang Wang ◽

Dantong Niu ◽

Xiaoya Wang ◽

Qun Shen ◽

Yong Xue

Keyword(s):

Machine Learning ◽

High Throughput ◽

High Efficiency ◽

Characteristic Curve ◽

Bovine Milk ◽

Structural Features ◽

Protein Docking ◽

Gradient Boosting ◽

Extreme Gradient Boosting ◽

Antihypertensive Peptides

AbstractStrategies to screen antihypertensive peptides with high throughput and rapid speed will be doubtlessly contributed to the treatment of hypertension. The food-derived antihypertensive peptides can reduce blood pressure without side effects. In present study, a novel model based on Extreme Gradient Boosting (XGBoost) algorithm was developed using the primary structural features of the food-derived peptides, and its performance in the prediction of antihypertensive peptides was compared with the dominating machine learning models. To further reflect the reliability of the method in real situation, the optimized XGBoost model was utilized to predict the antihypertensive degree of k-mer peptides cutting from 6 key proteins in bovine milk and the peptide-protein docking technology was introduced to verify the findings. The results showed that the XGBoost model achieved outstanding performance with the accuracy of 0.9841 and the area under the receiver operating characteristic curve of 0.9428, which were better than the other models. Using the XGBoost model, the prediction of antihypertensive peptides derived from milk protein was consistent with the peptide-protein docking results, and was more efficient. Our results indicate that using XGBoost algorithm as a novel auxiliary tool is feasible for screening antihypertensive peptide derived from food with high throughput and high efficiency.

Download Full-text