Improving prediction accuracy of high-performance materials via modified machine learning strategy

2022 ◽  
Vol 204 ◽  
pp. 111181
Wei Yong ◽  
Hongtao Zhang ◽  
Huadong Fu ◽  
Yaliang Zhu ◽  
Jie He ◽  
2020 ◽  
Vol 6 (1) ◽  
Zhichao Lu ◽  
Xin Chen ◽  
Xiongjun Liu ◽  
Deye Lin ◽  
Yuan Wu ◽  

AbstractFe-based metallic glasses (MGs) have been extensively investigated due to their unique properties, especially the outstanding soft-magnetic properties. However, conventional design of soft-magnetic Fe-based MGs is heavily relied on “trial and error” experiments, and thus difficult to balance the saturation flux density (Bs) and thermal stability due to the strong interplay between the glass formation and magnetic interaction. Herein, we report an eXtreme Gradient Boosting (XGBoost) machine-learning (ML) model for developing advanced Fe-based MGs with a decent combination of Bs and thermal stability. While it is an attempt to apply ML for exploring soft-magnetic property and thermal stability, the developed XGBoost model based on the intrinsic elemental properties (i.e., atomic size and electronegativity) can well predict Bs and Tx (the onset crystallization temperature) with an accuracy of 93.0% and 94.3%, respectively. More importantly, we derived the key features that primarily dictate Bs and Tx of Fe-based MGs from the ML model, which enables the revelation of the physical origins underlying the high Bs and thermal stability. As a proof of concept, several Fe-based MGs with high Tx (>800 K) and high Bs (>1.4 T) were successfully developed in terms of the ML model. This work demonstrates that the XGBoost ML approach is interpretable and feasible in the extraction of decisive parameters for properties of Fe-based magnetic MGs, which might allow us to efficiently design high-performance glassy materials.

Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Diagnostics ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 574
Gennaro Tartarisco ◽  
Giovanni Cicceri ◽  
Davide Di Pietro ◽  
Elisa Leonardi ◽  
Stefania Aiello ◽  

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.

Anik Das ◽  
Mohamed M. Ahmed

Accurate lane-change prediction information in real time is essential to safely operate Autonomous Vehicles (AVs) on the roadways, especially at the early stage of AVs deployment, where there will be an interaction between AVs and human-driven vehicles. This study proposed reliable lane-change prediction models considering features from vehicle kinematics, machine vision, driver, and roadway geometric characteristics using the trajectory-level SHRP2 Naturalistic Driving Study and Roadway Information Database. Several machine learning algorithms were trained, validated, tested, and comparatively analyzed including, Classification And Regression Trees (CART), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naïve Bayes (NB) based on six different sets of features. In each feature set, relevant features were extracted through a wrapper-based algorithm named Boruta. The results showed that the XGBoost model outperformed all other models in relation to its highest overall prediction accuracy (97%) and F1-score (95.5%) considering all features. However, the highest overall prediction accuracy of 97.3% and F1-score of 95.9% were observed in the XGBoost model based on vehicle kinematics features. Moreover, it was found that XGBoost was the only model that achieved a reliable and balanced prediction performance across all six feature sets. Furthermore, a simplified XGBoost model was developed for each feature set considering the practical implementation of the model. The proposed prediction model could help in trajectory planning for AVs and could be used to develop more reliable advanced driver assistance systems (ADAS) in a cooperative connected and automated vehicle environment.

Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 656
Xavier Larriva-Novo ◽  
Víctor A. Villagrá ◽  
Mario Vega-Barbas ◽  
Diego Rivera ◽  
Mario Sanz Rodrigo

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.

2021 ◽  
Vol 209 ◽  
pp. 104493
Haili Liao ◽  
Hanyu Mei ◽  
Gang Hu ◽  
Bo Wu ◽  
Qi Wang

2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A166-A166
Ankita Paul ◽  
Karen Wong ◽  
Anup Das ◽  
Diane Lim ◽  
Miranda Tan

Abstract Introduction Cancer patients are at an increased risk of moderate-to-severe obstructive sleep apnea (OSA). The STOP-Bang score is a commonly used screening questionnaire to assess risk of OSA in the general population. We hypothesize that cancer-relevant features, like radiation therapy (RT), may be used to determine the risk of OSA in cancer patients. Machine learning (ML) with non-parametric regression is applied to increase the prediction accuracy of OSA risk. Methods Ten features namely STOP-Bang score, history of RT to the head/neck/thorax, cancer type, cancer stage, metastasis, hypertension, diabetes, asthma, COPD, and chronic kidney disease were extracted from a database of cancer patients with a sleep study. The ML technique, K-Nearest-Neighbor (KNN), with a range of k values (5 to 20), was chosen because, unlike Logistic Regression (LR), KNN is not presumptive of data distribution and mapping function, and supports non-linear relationships among features. A correlation heatmap was computed to identify features having high correlation with OSA. Principal Component Analysis (PCA) was performed on the correlated features and then KNN was applied on the components to predict the risk of OSA. Receiver Operating Characteristic (ROC) - Area Under Curve (AUC) and Precision-Recall curves were computed to compare and validate performance for different test sets and majority class scenarios. Results In our cohort of 174 cancer patients, the accuracy in determining OSA among cancer patients using STOP-Bang score was 82.3% (LR) and 90.69% (KNN) but reduced to 89.9% in KNN using all 10 features mentioned above. PCA + KNN application using STOP-Bang score and RT as features, increased prediction accuracy to 94.1%. We validated our ML approach using a separate cohort of 20 cancer patients; the accuracies in OSA prediction were 85.57% (LR), 91.1% (KNN), and 92.8% (PCA + KNN). Conclusion STOP-Bang score and history of RT can be useful to predict risk of OSA in cancer patients with the PCA + KNN approach. This ML technique can refine screening tools to improve prediction accuracy of OSA in cancer patients. Larger studies investigating additional features using ML may improve OSA screening accuracy in various populations Support (if any):

Sign in / Sign up

Export Citation Format

Share Document