Predicting Fitness Centre Dropout

The phenomenon of dropout is often found among customers of sports services. In this study we intend to evaluate the performance of machine learning algorithms in predicting dropout using available data about their historic use of facilities. The data relating to a sample of 5209 members was taken from a Portuguese fitness centre and included the variables registration data, payments and frequency, age, sex, non-attendance days, amount billed, average weekly visits, total number of visits, visits hired per week, number of registration renewals, number of members referrals, total monthly registrations, and total member enrolment time, which may be indicative of members’ commitment. Whilst the Gradient Boosting Classifier had the best performance in predicting dropout (sensitivity = 0.986), the Random Forest Classifier was the best at predicting non-dropout (specificity = 0.790); the overall performance of the Gradient Boosting Classifier was superior to the Random Forest Classifier (accuracy 0.955 against 0.920). The most relevant variables predicting dropout were “non-attendance days”, “total length of stay”, and “total amount billed”. The use of decision trees provides information that can be readily acted upon to identify member profiles of those at risk of dropout, giving also guidelines for measures and policies to reduce it.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

Water ◽

10.3390/w12102927 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2927

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Joo Hyun Bae ◽

Jimin Lee ◽

Woon Ji Park ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Multilayer Perceptron ◽

Short Term Memory ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Dam Inflow

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.

Download Full-text

A Comparative Study using Feature Selection to Predict the Behaviour of Bank Customers

E3S Web of Conferences ◽

10.1051/e3sconf/202018401011 ◽

2020 ◽

Vol 184 ◽

pp. 01011

Author(s):

Sreethi Musunuru ◽

Mahaalakshmi Mukkamala ◽

Latha Kunaparaju ◽

N V Ganapathi Raju

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Random Forest Classifier ◽

Customer Behavior ◽

Machine Learning Algorithms ◽

The Status ◽

Personal Level ◽

Near Future ◽

Structure Communication

Though banks hold an abundance of data on their customers in general, it is not unusual for them to track the actions of the creditors regularly to improve the services they offer to them and understand why a lot of them choose to exit and shift to other banks. Analyzing customer behavior can be highly beneficial to the banks as they can reach out to their customers on a personal level and develop a business model that will improve the pricing structure, communication, advertising, and benefits for their customers and themselves. Features like the amount a customer credits every month, his salary per annum, the gender of the customer, etc. are used to classify them using machine learning algorithms like K Neighbors Classifier and Random Forest Classifier. On classifying the customers, banks can get an idea of who will be continuing with them and who will be leaving them in the near future. Our study determines to remove the features that are independent but are not influential to determine the status of the customers in the future without the loss of accuracy and to improve the model to see if this will also increase the accuracy of the results.

Download Full-text

Random forest and extreme gradient boosting algorithms for streamflow modeling using vessel features, and tree-rings

10.21203/rs.3.rs-303081/v1 ◽

2021 ◽

Author(s):

Hossein Sahour ◽

Vahid Gholami ◽

Javad Torkman ◽

Mehdi Vazifedan ◽

Sirwe Saeedi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Tree Rings ◽

Test Site ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Growing Seasons ◽

Extreme Gradient Boosting ◽

Streamflow Modeling

Abstract Monitoring temporal variation of streamflow is necessary for many water resources management plans, yet, such practices are constrained by the absence or paucity of data in many rivers around the world. Using a permanent river in the north of Iran as a test site, a machine learning framework was proposed to model the streamflow data in the three periods of growing seasons based on tree-rings and vessel features of the Zelkova carpinifolia species. First, full-disc samples were taken from 30 trees near the river, and the samples went through preprocessing, cross-dating, standardization, and time series analysis. Two machine learning algorithms, namely random forest (RF) and extreme gradient boosting (XGB), were used to model the relationships between dendrochronology variables (tree-rings and vessel features in the three periods of growing seasons) and the corresponding streamflow rates. The performance of each model was evaluated using statistical coefficients (coefficient of determination (R-squared), Nash-Sutcliffe efficiency (NSE), and root-mean-square error (NRMSE)). Findings demonstrate that consideration should be given to the XGB model in streamflow modeling given its apparent enhanced performance (R-squared: 0.87; NSE: 0.81; and NRMSE: 0.43) over the RF model (R-squared: 0.82; NSE: 0.71; and NRMSE: 0.52). Further, the results showed that the models perform better in modeling the normal and low flows compared to extremely high flows. Finally, the tested models were used to reconstruct the temporal streamflow during the past decades (1970–1981).

Download Full-text

Research on dairy products detection based on machine learning algorithm

MATEC Web of Conferences ◽

10.1051/matecconf/202235503008 ◽

2022 ◽

Vol 355 ◽

pp. 03008

Author(s):

Yang Zhang ◽

Lei Zhang ◽

Yabin Ma ◽

Jinsen Guan ◽

Zhaoxia Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Electronic Nose ◽

Milk Fat ◽

Dairy Products ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

In this study, an electronic nose model composed of seven kinds of metal oxide semiconductor sensors was developed to distinguish the milk source (the dairy farm to which milk belongs), estimate the content of milk fat and protein in milk, to identify the authenticity and evaluate the quality of milk. The developed electronic nose is a low-cost and non-destructive testing equipment. (1) For the identification of milk sources, this paper uses the method of combining the electronic nose odor characteristics of milk and the component characteristics to distinguish different milk sources, and uses Principal Component Analysis (PCA) and Linear Discriminant Analysis , LDA) for dimensionality reduction analysis, and finally use three machine learning algorithms such as Logistic Regression (LR), Support Vector Machine (SVM) and Random Forest (RF) to build a milk source (cow farm) Identify the model and evaluate and compare the classification effects. The experimental results prove that the classification effect of the SVM-LDA model based on the electronic nose odor characteristics is better than other single feature models, and the accuracy of the test set reaches 91.5%. The RF-LDA and SVM-LDA models based on the fusion feature of the two have the best effect Set accuracy rate is as high as 96%. (2) The three algorithms, Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost) and Random Forest (RF), are used to construct the electronic nose odor data for milk fat rate and protein rate. The method of estimating the model, the results show that the RF model has the best estimation performance( R2 =0.9399 for milk fat; R2=0.9301for milk protein). And it prove that the method proposed in this study can improve the estimation accuracy of milk fat and protein, which provides a technical basis for predicting the quality of dairy products.

Download Full-text

Abstract 13455: Predicting Emergency Department Disposition at Triage for Suspected Patients With Acute Coronary Syndrome Using Machine Learning Algorithms

Circulation ◽

10.1161/circ.142.suppl_3.13455 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Stephanie O Frisch ◽

Zeineb Bouzid ◽

Jessica Zègre-Hemsey ◽

Clifton W CALLAWAY ◽

Holli A Devon ◽

...

Keyword(s):

Machine Learning ◽

Acute Coronary Syndrome ◽

Critical Care ◽

Random Forest ◽

Characteristic Curve ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Coronary Syndrome ◽

Critical Care Admission

Introduction: Overcrowded emergency departments (ED) and undifferentiated patients make the provision of care and resources challenging. We examined whether machine learning algorithms could identify ED patients’ disposition (hospitalization and critical care admission) using readily available objective triage data among patients with symptoms suggestive of acute coronary syndrome (ACS). Methods: This was a retrospective observational cohort study of adult patients who were triaged at the ED for a suspected coronary event. A total of 162 input variables (k) were extracted from the electronic health record: demographics (k=3), mode of transportation (k=1), past medical/surgical history (k=57), first ED vital signs (k=7), home medications (k=31), symptomology (k=40), and the computer generated automatic interpretation of 12-lead electrocardiogram (k=23). The primary outcomes were hospitalization and critical care admission (i.e., admission to intensive or step-down care unit). We used 10-fold stratified cross validation to evaluate the performance of five machine learning algorithms to predict the study outcomes: logistic regression, naïve Bayes, random forest, gradient boosting and artificial neural network classifiers. We determined the best model by comparing the area under the receiver operating characteristic curve (AUC) of all models. Results: Included were 1201 patients (age 64±14, 39% female; 10% Black) with a total of 956 hospitalizations, and 169 critical care admissions. The best performing machine learning classifier for the outcome of hospitalization was gradient boosting machine with an AUC of 0.85 (95% CI, 0.82–0.89), 89% sensitivity, and F-score of 0.83; random forest classifier performed the best for the outcome of critical care admission with an AUC of 0.73 (95% CI, 0.70–0.77), 76% sensitivity, and F-score of 0.56. Conclusion: Predictive machine learning algorithms demonstrate excellent to good discriminative power to predict hospitalization and critical care admission, respectively. Administrators and clinicians could benefit from machine learning approaches to predict hospitalization and critical care admission, to optimize and allocate scarce ED and hospital resources and provide optimal care.

Download Full-text

RESEARCH OF APPLICATIONS OF MACHINE LEARNING ALGORITHMS IN IMPROVING OPC SOLUTIONS

International Forum “Microelectronics – 2020”. Joung Scientists Scholarship “Microelectronics – 2020”. XIII International conference «Silicon – 2020». XII young scientists scholarship for silicon nanostructures and devices physics, material science, process and analysis ◽

10.29003/m1647.silicon-2020/350-354 ◽

2020 ◽

Author(s):

Pavel Tryasoguzov ◽

Georgiy Teplov ◽

Alexey Kuzovkov

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Network Models ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Neural Network Models ◽

Machine Learning Methods ◽

Applications Of Machine Learning ◽

Topological Drawing

In this paper the effectiveness of machine learning methods for solving OPC problems was consider. The task was to determine the direction of displacement and the amount of displacement of the boundary of the segment of the topological drawing. The generated training database was used to train regression, random forest, gradient boosting, and feedforward convolutional neural network models.

Download Full-text

Heart Failure Detection Using Quantum-Enhanced Machine Learning and Traditional Machine Learning Techniques for Internet of Artificially Intelligent Medical Things

Wireless Communications and Mobile Computing ◽

10.1155/2021/1616725 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Yogesh Kumar ◽

Apeksha Koul ◽

Pushpendra Singh Sisodia ◽

Jana Shafi ◽

Verma Kavita ◽

...

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Random Forest ◽

Learning Algorithms ◽

Failure Detection ◽

Random Forest Classifier ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Research Progress ◽

Record Management

Quantum-enhanced machine learning plays a vital role in healthcare because of its robust application concerning current research scenarios, the growth of novel medical trials, patient information and record management, procurement of chronic disease detection, and many more. Due to this reason, the healthcare industry is applying quantum computing to sustain patient-oriented attention to healthcare patrons. The present work summarized the recent research progress in quantum-enhanced machine learning and its significance in heart failure detection on a dataset of 14 attributes. In this paper, the number of qubits in terms of the features of heart failure data is normalized by using min-max, PCA, and standard scalar, and further, has been optimized using the pipelining technique. The current work verifies that quantum-enhanced machine learning algorithms such as quantum random forest (QRF), quantum K nearest neighbour (QKNN), quantum decision tree (QDT), and quantum Gaussian Naïve Bayes (QGNB) are better than traditional machine learning algorithms in heart failure detection. The best accuracy rate is (0.89), which the quantum random forest classifier attained. In addition to this, the quantum random forest classifier also incurred the best results in F 1 score, recall and, precision by (0.88), (0.93), and (0.89), respectively. The computation time taken by traditional and quantum-enhanced machine learning algorithms has also been compared where the quantum random forest has the least execution time by 150 microseconds. Hence, the work provides a way to quantify the differences between standard and quantum-enhanced machine learning algorithms to select the optimal method for detecting heart failure.

Download Full-text

A Study of Machine Learning Algorithms for DDoS Detection

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.34922 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 174-178

Author(s):

Sheikh Shehzad Ahmed

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Random Forest Classifier ◽

Attack Detection ◽

Machine Learning Algorithms ◽

The Internet ◽

Ddos Attacks ◽

Decision Tree Classifier ◽

Tree Classifier

The Internet is used practically everywhere in today's digital environment. With the increased use of the Internet comes an increase in the number of threats. DDoS attacks are one of the most popular types of cyber-attacks nowadays. With the fast advancement of technology, the harm caused by DDoS attacks has grown increasingly severe. Because DDoS attacks may readily modify the ports/protocols utilized or how they function, the basic features of these attacks must be examined. Machine learning approaches have also been used extensively in intrusion detection research. Still, it is unclear what features are applicable and which approach would be better suited for detection. With this in mind, the research presents a machine learning-based DDoS attack detection approach. To train the attack detection model, we employ four Machine Learning algorithms: Decision Tree classifier (ID3), k-Nearest Neighbors (k-NN), Logistic Regression, and Random Forest classifier. The results of our experiments show that the Random Forest classifier is more accurate in recognizing attacks.

Download Full-text

Classification of hazelnut cultivars: comparison of DL4J and ensemble learning algorithms

Notulae Botanicae Horti Agrobotanici Cluj-Napoca ◽

10.15835/nbha48412041 ◽

2020 ◽

Vol 48 (4) ◽

pp. 2316-2327

Author(s):

Caner KOC ◽

Dilara GERDAN ◽

Maksut B. EMİNOĞLU ◽

Uğur YEGÜL ◽

Bulent KOC ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ensemble Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Performance Criteria ◽

Gradient Boosting ◽

Data Set

Classification of hazelnuts is one of the values adding processes that increase the marketability and profitability of its production. While traditional classification methods are used commonly, machine learning and deep learning can be implemented to enhance the hazelnut classification processes. This paper presents the results of a comparative study of machine learning frameworks to classify hazelnut (Corylus avellana L.) cultivars (‘Sivri’, ‘Kara’, ‘Tombul’) using DL4J and ensemble learning algorithms. For each cultivar, 50 samples were used for evaluations. Maximum length, width, compression strength, and weight of hazelnuts were measured using a caliper and a force transducer. Gradient boosting machine (Boosting), random forest (Bagging), and DL4J feedforward (Deep Learning) algorithms were applied in traditional machine learning algorithms. The data set was partitioned into a 10-fold-cross validation method. The classifier performance criteria of accuracy (%), error percentage (%), F-Measure, Cohen’s Kappa, recall, precision, true positive (TP), false positive (FP), true negative (TN), false negative (FN) values are provided in the results section. The results showed classification accuracies of 94% for Gradient Boosting, 100% for Random Forest, and 94% for DL4J Feedforward algorithms.

Download Full-text