Determining Usefulness of Machine Learning in Materials Discovery Using Simulated Research Landscapes

Author(s):  
Marcos del Cueto ◽  
Alessandro Troisi

When existing experimental data are combined with machine learning (ML) to predict the performance of new materials, the data acquisition bias determines ML usefulness and the prediction accuracy. In this...

Author(s):  
Anik Das ◽  
Mohamed M. Ahmed

Accurate lane-change prediction information in real time is essential to safely operate Autonomous Vehicles (AVs) on the roadways, especially at the early stage of AVs deployment, where there will be an interaction between AVs and human-driven vehicles. This study proposed reliable lane-change prediction models considering features from vehicle kinematics, machine vision, driver, and roadway geometric characteristics using the trajectory-level SHRP2 Naturalistic Driving Study and Roadway Information Database. Several machine learning algorithms were trained, validated, tested, and comparatively analyzed including, Classification And Regression Trees (CART), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naïve Bayes (NB) based on six different sets of features. In each feature set, relevant features were extracted through a wrapper-based algorithm named Boruta. The results showed that the XGBoost model outperformed all other models in relation to its highest overall prediction accuracy (97%) and F1-score (95.5%) considering all features. However, the highest overall prediction accuracy of 97.3% and F1-score of 95.9% were observed in the XGBoost model based on vehicle kinematics features. Moreover, it was found that XGBoost was the only model that achieved a reliable and balanced prediction performance across all six feature sets. Furthermore, a simplified XGBoost model was developed for each feature set considering the practical implementation of the model. The proposed prediction model could help in trajectory planning for AVs and could be used to develop more reliable advanced driver assistance systems (ADAS) in a cooperative connected and automated vehicle environment.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1503
Author(s):  
Minsu Kim ◽  
Hongmyeong Kim ◽  
Jae Hak Jung

Various equations are being developed and applied to predict photovoltaic (PV) module generation. Currently, quite diverse methods for predicting module generation are available, with most equations showing accuracy with ≤5% error. However, the accuracy can be determined only when the module temperature and the value of irradiation that reaches the module surface are precisely known. The prediction accuracy of outdoor generation is actually extremely low, as the method for predicting outdoor module temperature has extremely low accuracy. The change in module temperature cannot be predicted accurately because of the real-time change of irradiation and air temperature outdoors. Calculations using conventional equations from other studies show a mean error of temperature difference of 4.23 °C. In this study, an equation was developed and verified that can predict the precise module temperature up to 1.64 °C, based on the experimental data obtained after installing an actual outdoor module.


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A166-A166
Author(s):  
Ankita Paul ◽  
Karen Wong ◽  
Anup Das ◽  
Diane Lim ◽  
Miranda Tan

Abstract Introduction Cancer patients are at an increased risk of moderate-to-severe obstructive sleep apnea (OSA). The STOP-Bang score is a commonly used screening questionnaire to assess risk of OSA in the general population. We hypothesize that cancer-relevant features, like radiation therapy (RT), may be used to determine the risk of OSA in cancer patients. Machine learning (ML) with non-parametric regression is applied to increase the prediction accuracy of OSA risk. Methods Ten features namely STOP-Bang score, history of RT to the head/neck/thorax, cancer type, cancer stage, metastasis, hypertension, diabetes, asthma, COPD, and chronic kidney disease were extracted from a database of cancer patients with a sleep study. The ML technique, K-Nearest-Neighbor (KNN), with a range of k values (5 to 20), was chosen because, unlike Logistic Regression (LR), KNN is not presumptive of data distribution and mapping function, and supports non-linear relationships among features. A correlation heatmap was computed to identify features having high correlation with OSA. Principal Component Analysis (PCA) was performed on the correlated features and then KNN was applied on the components to predict the risk of OSA. Receiver Operating Characteristic (ROC) - Area Under Curve (AUC) and Precision-Recall curves were computed to compare and validate performance for different test sets and majority class scenarios. Results In our cohort of 174 cancer patients, the accuracy in determining OSA among cancer patients using STOP-Bang score was 82.3% (LR) and 90.69% (KNN) but reduced to 89.9% in KNN using all 10 features mentioned above. PCA + KNN application using STOP-Bang score and RT as features, increased prediction accuracy to 94.1%. We validated our ML approach using a separate cohort of 20 cancer patients; the accuracies in OSA prediction were 85.57% (LR), 91.1% (KNN), and 92.8% (PCA + KNN). Conclusion STOP-Bang score and history of RT can be useful to predict risk of OSA in cancer patients with the PCA + KNN approach. This ML technique can refine screening tools to improve prediction accuracy of OSA in cancer patients. Larger studies investigating additional features using ML may improve OSA screening accuracy in various populations Support (if any):


Proceedings ◽  
2020 ◽  
Vol 78 (1) ◽  
pp. 5
Author(s):  
Raquel de Melo Barbosa ◽  
Fabio Fonseca de Oliveira ◽  
Gabriel Bezerra Motta Câmara ◽  
Tulio Flavio Accioly de Lima e Moura ◽  
Fernanda Nervo Raffin ◽  
...  

Nano-hybrid formulations combine organic and inorganic materials in self-assembled platforms for drug delivery. Laponite is a synthetic clay, biocompatible, and a guest of compounds. Poloxamines are amphiphilic four-armed compounds and have pH-sensitive and thermosensitive properties. The association of Laponite and Poloxamine can be used to improve attachment to drugs and to increase the solubility of β-Lapachone (β-Lap). β-Lap has antiviral, antiparasitic, antitumor, and anti-inflammatory properties. However, the low water solubility of β-Lap limits its clinical and medical applications. All samples were prepared by mixing Tetronic 1304 and LAP in a range of 1–20% (w/w) and 0–3% (w/w), respectively. The β-Lap solubility was analyzed by UV-vis spectrophotometry, and physical behavior was evaluated across a range of temperatures. The analysis of data consisted of response surface methodology (RMS), and two kinds of machine learning (ML): multilayer perceptron (MLP) and support vector machine (SVM). The ML techniques, generated from a training process based on experimental data, obtained the best correlation coefficient adjustment for drug solubility and adequate physical classifications of the systems. The SVM method presented the best fit results of β-Lap solubilization. In silico tools promoted fine-tuning, and near-experimental data show β-Lap solubility and classification of physical behavior to be an excellent strategy for use in developing new nano-hybrid platforms.


Author(s):  
Siwei Song ◽  
Fang Chen ◽  
Yi Wang ◽  
Kangcai Wang ◽  
Mi Yan ◽  
...  

With the growth of chemical data, computation power and algorithms, machine learning-assisted high-throughput virtual screening (ML-assisted HTVS) is revolutionizing the research paradigm of new materials. Herein, a combined ML-assisted HTVS...


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Andrew K. C. Wong ◽  
Pei-Yuan Zhou ◽  
Zahid A. Butt

AbstractMachine Learning has made impressive advances in many applications akin to human cognition for discernment. However, success has been limited in the areas of relational datasets, particularly for data with low volume, imbalanced groups, and mislabeled cases, with outputs that typically lack transparency and interpretability. The difficulties arise from the subtle overlapping and entanglement of functional and statistical relations at the source level. Hence, we have developed Pattern Discovery and Disentanglement System (PDD), which is able to discover explicit patterns from the data with various sizes, imbalanced groups, and screen out anomalies. We present herein four case studies on biomedical datasets to substantiate the efficacy of PDD. It improves prediction accuracy and facilitates transparent interpretation of discovered knowledge in an explicit representation framework PDD Knowledge Base that links the sources, the patterns, and individual patients. Hence, PDD promises broad and ground-breaking applications in genomic and biomedical machine learning.


2021 ◽  
Author(s):  
Hussain AlBahrani ◽  
Nobuo Morita

Abstract In many drilling scenarios that include deep wells and highly stressed environments, the mud weight required to completely prevent wellbore instability can be impractically high. In such cases, what is known as risk-controlled wellbore stability criterion is introduced. This criterion allows for a certain level of wellbore instability to take place. This means that the mud weight calculated using this criterion will only constrain wellbore instability to a certain manageable level, hence the name risk-controlled. Conventionally, the allowable level of wellbore instability in this type of models has always been based on the magnitude of the breakout angle. However, wellbore enlargements, as seen in calipers and image logs, can be highly irregular in terms of its distribution around the wellbore. This irregularity means that risk-controlling the wellbore instability through the breakout angle might not be always sufficient. Instead, the total volume of cavings is introduced as the risk control parameter for wellbore instability. Unlike the breakout angle, the total volume of cavings can be coupled with a suitable hydraulics model to determine the threshold of manageable instability. The expected total volume of cavings is determined using a machine learning (ML) assisted 3D elasto-plastic finite element model (FEM). The FEM works to model the interval of interest, which eventually provides a description of the stress distribution around the wellbore. The ML algorithm works to learn the patterns and limits of rock failure in a supervised training manner based on the wellbore enlargement seen in calipers and image logs from nearby offset wells. Combing the FEM output with the ML algorithm leads to an accurate prediction of shear failure zones. The model is able to predict both the radial and circumferential distribution of enlargements at any mud weight and stress regime, which leads to a determination of the expected total volume of cavings. The model implementation is first validated through experimental data. The experimental data is based on true-triaxial tests of bored core samples. Next, a full dataset from offset wells is used to populate and train the model. The trained model is then used to produce estimations of risk-controlled stability mud weights for different drilling scenarios. The model results are compared against those produced by conventional methods. Finally, both the FEM-ML model and the conventional methods results are compared against the drilling experience of the offset wells. This methodology provides a more comprehensive and new solution to risk controlling wellbore instability. It relies on a novel process which learns rock failure from calipers and image logs.


Sign in / Sign up

Export Citation Format

Share Document