Machine Learning Assisted Design for Active Cathode Materials

Volume 3: Advanced Materials: Design, Processing, Characterization, and Applications ◽

10.1115/imece2020-23963 ◽

2020 ◽

Author(s):

Sihan Yong ◽

Zhuoyuan Zheng ◽

Pingfeng Wang ◽

Yumeng Li

Keyword(s):

Machine Learning ◽

Random Forest ◽

Mean Squared Error ◽

Computational Simulation ◽

Material Design ◽

Machine Learning Algorithms ◽

Training Data ◽

Coefficient Of Determination ◽

Crystal System ◽

Wide Range

Abstract The traditional way of designing materials, including experimental measurement and computational simulation, are not efficient. Machine learning is considered a promising solution for material design in the recent years. By observing from previous data, machine learning finds patterns, learns from the patterns and predict the material properties. In this study, machine learning methods are used for discovering new cathode with better properties, includes crystal system learning and the property prediction. K-Folder cross-validation is used for finding the best training data with a limited dataset, nevertheless increasing the percentage of training data would ultimately result in better performance on prediction. It is found that, random forest gives the highest average accuracy in crystal system classification, meanwhile, extra randomized tree algorithm provides a higher averaged coefficient of determination and lower mean squared error in the regression model predicting electrical properties of cathodes. The random forest algorithm is chosen from a wide range of machine learning algorithms with the implementation of Monte Carlo validation. Based on the feature importance evaluation, oxygen contents are found to have the highest effects in determining capacity gravity and volume change in properties prediction.

PSIX-15 Assessment of machine learning algorithms for prediction of Aleutian disease in American mink

Journal of Animal Science ◽

10.1093/jas/skab235.484 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 264-265

Author(s):

Duy Ngoc Do ◽

Guoyu Hu ◽

Younes Miar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Models ◽

American Mink ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Enzyme Linked Immunosorbent Assay ◽

Linear Discriminant ◽

Machine Learning Classification

Abstract American mink (Neovison vison) is the major source of fur for the fur industries worldwide and Aleutian disease (AD) is causing severe financial losses to the mink industry. Different methods have been used to diagnose the AD in mink, but the combination of several methods can be the most appropriate approach for the selection of AD resilient mink. Iodine agglutination test (IAT) and counterimmunoelectrophoresis (CIEP) methods are commonly employed in test-and-remove strategy; meanwhile, enzyme-linked immunosorbent assay (ELISA) and packed-cell volume (PCV) methods are complementary. However, using multiple methods are expensive; and therefore, hindering the corrected use of AD tests in selection. This research presented the assessments of the AD classification based on machine learning algorithms. The Aleutian disease was tested on 1,830 individuals using these tests in an AD positive mink farm (Canadian Centre for Fur Animal Research, NS, Canada). The accuracy of classification for CIEP was evaluated based on the sex information, and IAT, ELISA and PCV test results implemented in seven machine learning classification algorithms (Random Forest, Artificial Neural Networks, C50Tree, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) using the Caret package in R. The accuracy of prediction varied among the methods. Overall, the Random Forest was the best-performing algorithm for the current dataset with an accuracy of 0.89 in the training data and 0.94 in the testing data. Our work demonstrated the utility and relative ease of using machine learning algorithms to assess the CIEP information, and consequently reducing the cost of AD tests. However, further works require the inclusion of production and reproduction information in the models and extension of phenotypic collection to increase the accuracy of current methods.

Random forest and extreme gradient boosting algorithms for streamflow modeling using vessel features, and tree-rings

10.21203/rs.3.rs-303081/v1 ◽

2021 ◽

Author(s):

Hossein Sahour ◽

Vahid Gholami ◽

Javad Torkman ◽

Mehdi Vazifedan ◽

Sirwe Saeedi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Tree Rings ◽

Test Site ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Growing Seasons ◽

Extreme Gradient Boosting ◽

Streamflow Modeling

Abstract Monitoring temporal variation of streamflow is necessary for many water resources management plans, yet, such practices are constrained by the absence or paucity of data in many rivers around the world. Using a permanent river in the north of Iran as a test site, a machine learning framework was proposed to model the streamflow data in the three periods of growing seasons based on tree-rings and vessel features of the Zelkova carpinifolia species. First, full-disc samples were taken from 30 trees near the river, and the samples went through preprocessing, cross-dating, standardization, and time series analysis. Two machine learning algorithms, namely random forest (RF) and extreme gradient boosting (XGB), were used to model the relationships between dendrochronology variables (tree-rings and vessel features in the three periods of growing seasons) and the corresponding streamflow rates. The performance of each model was evaluated using statistical coefficients (coefficient of determination (R-squared), Nash-Sutcliffe efficiency (NSE), and root-mean-square error (NRMSE)). Findings demonstrate that consideration should be given to the XGB model in streamflow modeling given its apparent enhanced performance (R-squared: 0.87; NSE: 0.81; and NRMSE: 0.43) over the RF model (R-squared: 0.82; NSE: 0.71; and NRMSE: 0.52). Further, the results showed that the models perform better in modeling the normal and low flows compared to extremely high flows. Finally, the tested models were used to reconstruct the temporal streamflow during the past decades (1970–1981).

Prediction of Lung Cancer Risk using Random Forest Algorithm Based on Kaggle Data Set

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7879.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1623-1630

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Random Forest ◽

Naive Bayes ◽

Early Stage ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Data Set ◽

Wide Range

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.

A Novel GIS-Based Random Forest Machine Algorithm for the Spatial Prediction of Shallow Landslide Susceptibility

Forests ◽

10.3390/f11010118 ◽

2020 ◽

Vol 11 (1) ◽

pp. 118 ◽

Cited By ~ 6

Author(s):

Viet-Hung Dang ◽

Nhat-Duc Hoang ◽

Le-Mai-Duyen Nguyen ◽

Dieu Tien Bui ◽

Pijush Samui

Keyword(s):

Machine Learning ◽

Random Forest ◽

Landslide Susceptibility ◽

Spatial Prediction ◽

Shallow Landslide ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Conditioning Factors ◽

Susceptibility Modeling

This study developed and verified a new hybrid machine learning model, named random forest machine (RFM), for the spatial prediction of shallow landslides. RFM is a hybridization of two state-of-the-art machine learning algorithms, random forest classifier (RFC) and support vector machine (SVM), in which RFC is used to generate subsets from training data and SVM is used to build decision functions for these subsets. To construct and verify the hybrid RFM model, a shallow landslide database of the Lang Son area (northern Vietnam) was prepared. The database consisted of 101 shallow landslide polygons and 14 conditioning factors. The relevance of these factors for shallow landslide susceptibility modeling was assessed using the ReliefF method. Experimental results pointed out that the proposed RFM can help to achieve the desired prediction with an F1 score of roughly 0.96. The performance of the RFM was better than those of benchmark approaches, including the SVM, RFC, and logistic regression. Thus, the newly developed RFM is a promising tool to help local authorities in shallow landslide hazard mitigations.

Energy Audit System for Households using Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g8895.0510721 ◽

2021 ◽

Vol 10 (7) ◽

pp. 33-36

Author(s):

Nagesh* A.

Keyword(s):

Machine Learning ◽

Energy Consumption ◽

Random Forest ◽

Energy Demand ◽

Predictive Accuracy ◽

Machine Learning Algorithms ◽

Training Data ◽

Energy Audit ◽

Household Level ◽

Audit System

the growth in population and economics the global demand for energy is increased considerably. The large amount of energy demand comes from houses. Because of this the energy efficiency in houses in considered most important aspect towards the global sustainability. The machine learning algorithms contributed heavily in predicting the amount of energy consumed in household level. In this paper, a energy audit system using machine learning are developed to estimate the amount of energy consumed at household level in order to identify probable areas to plug wastage of energy in household. Each energy audit system is trained using one machine leaning algorithm with previous power consumption history of training data. By converting this data into knowledge, gratification of analysis of energy consumption is attained. The performance of energy audit Linear Regression system is 82%, Decision Tree system is 86% and Random Forest 91% are predicted energy consumption and the performance of learning methods were evaluated based on the heir predictive accuracy, ease of learning and user friendly characteristics. The Random Forest energy audit system is superior when compare to other energy audit system.

Design and Manufacture of a Multiband Rectangular Spiral-Shaped Microstrip Antenna Using EM-Driven and Machine Learning

Elektronika ir Elektrotechnika ◽

10.5755/j02.eie.27583 ◽

2021 ◽

Vol 27 (1) ◽

pp. 29-40

Author(s):

Ashrf Aoad

Keyword(s):

Machine Learning ◽

Microstrip Antenna ◽

Machine Learning Algorithms ◽

Mobile Systems ◽

Training Data ◽

Learning Models ◽

Prediction Ability ◽

Antenna Structure ◽

Wide Range ◽

Machine Learning Models

This paper presents a multiband rectangular microstrip antenna using spiral-shaped configurations. The antenna has been designed by combining two configurations of microstrip and spiral with consideration of careful selection of the substrate material, the dimension of the rectangular microstrip, the distance between the turned spiral, and the number of turns of the spiral. The efficiency and accuracy have been improved using machine learning algorithms as well. Machine learning has been studied to model the proposed antenna based on the performance requirements, which requires a sufficient training data to improve the accuracy. Three different machine learning models are applied to improve the accuracy and generalization performance and compared to simulation and measurement results. Simulation, measurement, and machine learning results confirm that the proposed antenna is a new electrically small and operating over a wide range of high-frequency bands between 1 GHz–4 GHz. Machine learning models have the best prediction ability with a mean square error (MSE) of 0.03, and 0.05. The antenna structure and size are compatible and suitable for several multi-band wireless mobile systems operating in L-band and S-band. The results, such as directivity, Half-Power Beamwidth, Voltage Standing Wave Ratio (VSWR), and S-parameter curves, are analysed and compared with the numerical formulation for both spiral and microstrip antennas.

Predicting Limit-Setting Behavior of Gamblers Using Machine Learning Algorithms: A Real-World Study of Norwegian Gamblers Using Account Data

International Journal of Mental Health and Addiction ◽

10.1007/s11469-019-00166-2 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael Auer ◽

Mark D. Griffiths

Keyword(s):

Machine Learning ◽

Random Forest ◽

Test Data ◽

Predictive Analytics ◽

Learning Algorithm ◽

Responsible Gambling ◽

Machine Learning Algorithms ◽

Training Data ◽

Training Dataset ◽

Limit Setting

AbstractPlayer protection and harm minimization have become increasingly important in the gambling industry along with the promotion of responsible gambling (RG). Among the most widespread RG tools that gaming operators provide are limit-setting tools that help players limit the amount of time and/or money they spend gambling. Research suggests that limit-setting significantly reduces the amount of money that players spend. If limit-setting is to be encouraged as a way of facilitating responsible gambling, it is important to know what variables are important in getting individuals to set and change limits in the first place. In the present study, 33 variables assessing the player behavior among Norsk Tipping clientele (N = 70,789) from January to March 2017 were computed. The 33 variables which reflect the players’ behavior were then used to predict the likelihood of gamblers changing their monetary limit between April and June 2017. The 70,789 players were randomly split into a training dataset of 56,532 and an evaluation set of 14,157 players (corresponding to an 80/20 split). The results demonstrated that it is possible to predict future limit-setting based on player behavior. The random forest algorithm appeared to predict limit-changing behavior much better than the other algorithms. However, on the independent test data, the random forest algorithm’s accuracy dropped significantly. The best performance on the test data along with a small decrease in accuracy in comparison to the training data was delivered by the gradient boost machine learning algorithm. The most important variables predicting future limit-setting using the gradient boost machine algorithm were players receiving feedback that they had reached 80% of their personal monthly global loss limit, personal monthly loss limit, the amount bet, theoretical loss, and whether the players had increased their limits in the past. With the help of predictive analytics, players with a high likelihood of changing their limits can be proactively approached.

Predicting Future Products Rate using Machine Learning Algorithms

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2020.05.04 ◽

2020 ◽

Vol 12 (5) ◽

pp. 41-51

Author(s):

Shaimaa Mahmoud ◽

◽

Mahmoud Hussein ◽

Arabi Keshk

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Regression ◽

Data Set ◽

Squared Error

Opinion mining in social networks data is considered as one of most important research areas because a large number of users interact with different topics on it. This paper discusses the problem of predicting future products rate according to users’ comments. Researchers interacted with this problem by using machine learning algorithms (e.g. Logistic Regression, Random Forest Regression, Support Vector Regression, Simple Linear Regression, Multiple Linear Regression, Polynomial Regression and Decision Tree). However, the accuracy of these techniques still needs to be improved. In this study, we introduce an approach for predicting future products rate using LR, RFR, and SVR. Our data set consists of tweets and its rate from 1:5. The main goal of our approach is improving the prediction accuracy about existing techniques. SVR can predict future product rate with a Mean Squared Error (MSE) of 0.4122, Linear Regression model predict with a Mean Squared Error of 0.4986 and Random Forest Regression can predict with a Mean Squared Error of 0.4770. This is better than the existing approaches accuracy.

Prediction of Potential Future IT Personnel in Bangladesh using Machine Learning Classifier

Global Disclosure of Economics and Business ◽

10.18034/gdeb.v6i1.112 ◽

2017 ◽

Vol 6 (1) ◽

pp. 7-18

Author(s):

Md. Hasnat Parvez ◽

Most. Moriom Khatun ◽

Sayed Mohsin Reza ◽

Md. Mahfujur Rahman ◽

Md. Fazlul Karim Patwary

Keyword(s):

Machine Learning ◽

Random Forest ◽

Direct Analysis ◽

Machine Learning Algorithms ◽

Training Data ◽

Accuracy Measurement ◽

Learning Classifier ◽

Future Potential ◽

Roc Area ◽

It Personnel

Bangladesh is one of the most promising developing countries in IT sector, where people from several disciplines and experiences are involved in this sector. However, no direct analysis in this sector is published yet, which covers the proper guideline for predicting future IT personnel. Hence this is not a simple solution, training data from real IT sector are needed and trained several classifiers for detecting perfect results. Machine learning algorithms can be used for predicting future potential IT personnel. In this paper, four different classifiers named as Naive Bayes, J48, Bagging and Random Forest in five different folds are experimented for that prediction. Results are pointed out that Random Forest performs better accuracy than other experimented classifier for future IT personnel prediction. It is mentioned that the standard accuracy measurement process named as Precision, Recall, F-Measure, ROC Area etc. are used for evaluating the results.

Use of machine learning techniques for predicting the bearing capacity of piles

Soils and Rocks ◽

10.28927/sr.2021.074921 ◽

2021 ◽

Vol 44 (4) ◽

pp. 1-14

Author(s):

Gomes Yago ◽

Filipe Verri ◽

Dimas Ribeiro

Keyword(s):

Machine Learning ◽

Bearing Capacity ◽

Mean Squared Error ◽

Precast Concrete ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Coefficient Of Determination ◽

Empirical Methods ◽

Learning Techniques ◽

Semi Empirical

Geotechnical engineers frequently rely on semi-empirical methods like Décourt-Quaresma and Meyehof’s to estimate the bearing capacity of piles. This paper proposes alternatives to these methods, presenting an approach using machine learning models for predicting the bearing capacity of precast concrete piles. It uses data samples including 165 load tests, each one accompanied with a SPT sounding. This study proposes two types of analysis using two separated datasets, one based on the Décourt-Quaresma method and the other based on the Meyerhof method. Six machine learning algorithms of distinct biases are trained and tested with a leave-one-out cross validation procedure and the models’ predictive performance is assessed through two metrics: root mean squared error (RMSE) and coefficient of determination (R2). The best performing technique was random forest (RF) using Décourt-Quaresma dataset, with an RMSE of 642.38. All other machine learning techniques obtained a RMSE below 710, overcoming Meyerhof’s and Décourt-Quaresma’s semi-empirical methods, which both obtained RMSE values close to 900. This study proposes 95% and 90% confidence intervals for the best technique employing a graphical interpretation, so that geotechnical engineers can choose which level of safety they wish to work with. Finally, the study presents a case study showing that the best performing models achieve a reasonable accuracy, surpassing the semi-empirical methods in two of the three piles considered. The representativity of the new examples within the used datasets explain the accuracy of the techniques.