Prediction of Problematic Smartphone Use: A Machine Learning Approach

While smartphone addiction is becoming a recent concern with the exponential increase in the number of smartphone users, it is difficult to predict problematic smartphone users based on the usage characteristics of individual smartphone users. This study aimed to explore the possibility of predicting smartphone addiction level with mobile phone log data. By Korea Internet and Security Agency (KISA), 29,712 respondents completed the Smartphone Addiction Scale developed in 2017. Integrating basic personal characteristics and smartphone usage information, the data were analyzed using machine learning techniques (decision tree, random forest, and Xgboost) in addition to hypothesis tests. In total, 27 variables were employed to predict smartphone addiction and the accuracy rate was the highest for the random forest (82.59%) model and the lowest for the decision tree model (74.56%). The results showed that users’ general information, such as age group, job classification, and sex did not contribute much to predicting their smartphone addiction level. The study can provide directions for future work on the detection of smartphone addiction with log-data, which suggests that more detailed smartphone’s log-data will enable more accurate results.

Download Full-text

Comparative Analysis of Machine Learning Techniques to Identify Churn for Telecom Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.34.19210 ◽

2018 ◽

Vol 7 (3.34) ◽

pp. 291

Author(s):

M Malleswari ◽

R.J Manira ◽

Praveen Kumar ◽

Murugan .

Keyword(s):

Machine Learning ◽

Big Data ◽

Random Forest ◽

Decision Tree ◽

Apache Spark ◽

Machine Learning Techniques ◽

Churn Prediction ◽

Learning Techniques ◽

Boosted Tree ◽

Customer Attrition

Big data analytics has been the focus for large scale data processing. Machine learning and Big data has great future in prediction. Churn prediction is one of the sub domain of big data. Preventing customer attrition especially in telecom is the advantage of churn prediction. Churn prediction is a day-to-day affair involving millions. So a solution to prevent customer attrition can save a lot. This paper propose to do comparison of three machine learning techniques Decision tree algorithm, Random Forest algorithm and Gradient Boosted tree algorithm using Apache Spark. Apache Spark is a data processing engine used in big data which provides in-memory processing so that the processing speed is higher. The analysis is made by extracting the features of the data set and training the model. Scala is a programming language that combines both object oriented and functional programming and so a powerful programming language. The analysis is implemented using Apache Spark and modelling is done using scala ML. The accuracy of Decision tree model came out as 86%, Random Forest model is 87% and Gradient Boosted tree is 85%.

Download Full-text

Classification of Agriculture Farm Machinery Using Machine Learning and Internet of Things

Symmetry ◽

10.3390/sym13030403 ◽

2021 ◽

Vol 13 (3) ◽

pp. 403

Author(s):

Muhammad Waleed ◽

Tai-Won Um ◽

Tariq Kamal ◽

Syed Muhammad Usman

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Farm Machinery ◽

Learning Techniques

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.

Download Full-text

Machine Learning (Neuronal Net, Random Forest, and C5.0 single decision tree) based on pXRF data as a tool to date sediment layers of the Nile Delta

10.5194/egusphere-egu21-15296 ◽

2021 ◽

Author(s):

Martin Seeliger ◽

Marina Altmeyer ◽

Andreas Ginau ◽

Robert Schiestl ◽

Jürgen Wunderlich

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Nile Delta ◽

Sediment Cores ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Surrounding Areas ◽

Sediment Layers

This paper presents the application of machine-learning techniques on pXRF data to establish a chronology for sediment cores around Tell Buto (Tell el-Fara&#180;in) in the northwestern Nile Delta. As modern laboratories for dating techniques like OSL or 14C are rare in Egypt and sample export is restricted, we are facing a lack of opportunities to create a robust chronology, which is indispensable in modern Geoarchaeology.Therefore, we present a new approach to transfer archaeological age information gained at the excavation at Buto to corings of the wider Buto area. Sediments of archaeological outcrops and pits with known age are measured using pXRF to create a geochemical &#8220;fingerprint&#8221; for several historic eras. Afterwards, these &#8220;fingerprints&#8221; are transferred to corings of the surrounding areas using machine-learning algorithms.This paper presents 1) the application of three different machine-learning approaches (Neuronal Net, Random Forest, and C5.0 decision tree) to check if archaeological age information can be transferred to sediments far off the settlement mounds using pXRF data, 2) the comparison of all approaches and the evaluation if the easily anticipated decision tree and Random Forest show similar results as the &#8220;black-box system&#8221; Neuronal Net, and finally, 3) a case study that provides the results of Altmeyer et al. (in review) for Kom el-Gir, a further settlement mound little north of Buto, with a chronostratigraphic framework based on this approach.Reference:Altmeyer, M., Seeliger, M., Ginau, A., Schiestl, R. & J. Wunderlich (in review):&#160; Reconstruction of former channel systems in the northwestern Nile Delta (Egypt) based on corings and electrical resistivity tomography (ERT). (Submitted to E & G Quaternary Science Journal).

Download Full-text

Network Intrusion Detection System Using Random Forest and Decision Tree Machine Learning Techniques

First International Conference on Sustainable Technologies for Computational Intelligence - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-0029-9_50 ◽

2019 ◽

pp. 637-643

Author(s):

T. Tulasi Bhavani ◽

M. Kameswara Rao ◽

A. Manohar Reddy

Keyword(s):

Machine Learning ◽

Random Forest ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Machine Learning Techniques ◽

Network Intrusion Detection ◽

Network Intrusion ◽

Learning Techniques

Download Full-text

Machine Learning Based Indoor Localisation Using Wi-Fi And Smartphone

Journal of Independent Studies and Research - Computing ◽

10.31645/06 ◽

2020 ◽

Author(s):

Zulqarnain Khokhar ◽

◽

Murtaza Ahmed Siddiqi ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Indoor Localization ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Devices ◽

Gradient Boosting ◽

Learning Techniques ◽

Indoor Localisation

Wi-Fi based indoor positioning with the help of access points and smart devices have become an integral part in finding a device or a person’s location. Wi-Fi based indoor localization technology has been among the most attractive field for researchers for a number of years. In this paper, we have presented Wi-Fi based in-door localization using three different machine-learning techniques. The three machine learning algorithms implemented and compared are Decision Tree, Random Forest and Gradient Boosting classifier. After making a fingerprint of the floor based on Wi-Fi signals, mentioned algorithms were used to identify device location at thirty different positions on the floor. Random Forest and Gradient Boosting classifier were able to identify the location of the device with accuracy higher than 90%. While Decision Tree was able to identify the location with accuracy a bit higher than 80%.

Download Full-text

Differentiating Thrombotic Microangiopathies Based on Laboratory Tests Other Than ADAMTS13 Using Machine Learning Technology

Blood ◽

10.1182/blood.v128.22.3749.3749 ◽

2016 ◽

Vol 128 (22) ◽

pp. 3749-3749

Author(s):

Youngil Koh ◽

SuYeon Lee ◽

Hong-Seok Yun ◽

Sung-Soo Yoon ◽

Inho Kim ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Correlation Coefficient ◽

Machine Learning Techniques ◽

Learning Technology ◽

Thrombotic Microangiopathies ◽

Random Forest Method ◽

Learning Techniques

Abstract Introduction: ADAMTS13 activity level is crucial for differentiating thrombotic microangiopathies. However, ADAMTS13 testing is not readily available at site in many parts of the world. Hence, we developed an innovative algorithm that allow differentiation of thrombotic thrombocytopenic purpura (TTP) from other TMA's based on laboratory results other than ADAMTS13 using machine learning. Methods: Two hundred- eight adult patients with either TTP (N=64) or TMA other than TTP (N=144) (ADAMTS13 cutoff level of 10%) were classified using three machine learning techniques (decision tree, random forest, and neural network), using a set of easily measured 19 clinical variables such as fever, Hb, ALT and so on. Basically, each clinical variable is not correlated with TTP (Absolute values of correlation coefficients are lower than 0.5), so we applied machine learning algorithms. First, we divided patient data into three parts, train, test and validation set. And then, we applied these 3 machine learning techniques, decision tree, random forest and neural network. Principal component analysis was also performed. Results: As a single variable, platelet count, BUN and total bilirubin were the most important three variables that are predictive of differentiating TTP from other TMA's with accuracy of 82%. Random forest method increased accuracy to 85% and precision, and recall statistic is 0.828, and 0.832, respectively. Neural network did not do better without optimization than random forest method. Conclusion: Machine learning technology seems promising in differentiating TTP from other TMA's if ADAMTS13 value is not available. These algorithms could support the physician in tailoring the management of TMA. Correlation coefficient in our study Correlation coefficient in our study Scheme of Random Forest method used in our study Scheme of Random Forest method used in our study Disclosures Lee: SamsungSDS: Employment. Yun:Samsung SDS: Employment.

Download Full-text

153 Creation of a feed composition database: Machine learning techniques for automated classification of corn grain products, preliminary results

Journal of Animal Science ◽

10.1093/jas/skz258.303 ◽

2019 ◽

Vol 97 (Supplement_3) ◽

pp. 148-148

Author(s):

Andres A Schlageter-Tello ◽

Phil S Miller

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Large Datasets ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Feed Composition ◽

Corn Gluten ◽

Grain Products ◽

Corn Grain

Abstract Feed composition tables are a commonly used to develop research projects and to develop animal diets. Currently, the National Animal Nutrition Program aims to create a living database containing feed composition information using large datasets provided by commercial laboratories. Using large datasets should ensure representative nutritional values for feeds included in the database; however, managing large datasets requires computer codes to manage and classify feeds correctly. Thus, the objective of this project was to develop 2 models based on supervised machine learning techniques for automated classification of corn grain product samples. The database used in the study contained 88,057 samples of corn grain products resulting from the screening procedure previously described by Tran et al (2016). Two types of supervised machine learning models were developed: decision tree and random forest. Parameters included for feed classification were: dry matter, crude protein, neutral detergent fiber, ash, fat, and starch. Models were trained and validated using 70 and 30% of the dataset, respectively. The decision tree and random forest correctly classified 98.3 and 98.8% of validation dataset, respectively. For each corn grain product the performance of the decision tree and random forest were: corn germ = 91 and 91%; corn germ meal = 97 and 95%; corn gluten feed, dry = 99 and 100%; corn gluten feed, wet = 100 and 100%; corn gluten meal = 99 and 100%; corn grain, dry = 99 and 99%; corn grain, high moisture = 100 and 100%; corn grain, steam-flaked = 34 and 53%; corn hominy feed = 83 and 88%; and corn screenings = 44 and 60%, respectively. In conclusion, the random forest was superior to the decision tree approach for classifying corn grain products. Further development is required to improve the performance of models for classifying corn grain steam-flaked and corn screenings

Download Full-text

Estimating the BIS Capital Adequacy Ratio for Korean Banks Using Machine Learning: Predicting by Variable Selection Using Random Forest Algorithms

Risks ◽

10.3390/risks9020032 ◽

2021 ◽

Vol 9 (2) ◽

pp. 32

Author(s):

Jaewon Park ◽

Minsoo Shin ◽

Wookjae Heo

Keyword(s):

Machine Learning ◽

Random Forest ◽

South Korea ◽

Ordinary Least Squares ◽

General Information ◽

Capital Adequacy ◽

Machine Learning Techniques ◽

Recursive Feature Elimination ◽

Learning Techniques ◽

Capital Adequacy Ratio

The purpose of this study is to find the most important variables that represent the future projections of the Bank of International Settlements’ (BIS) capital adequacy ratio, which is the index of financial soundness in a bank as a comprehensive and important measure of capital adequacy. This study analyzed the past 12 years of data from all domestic banks in South Korea. The research data include all financial information, such as key operating indicators, major business activities, and general information of the financial supervisory service of South Korea from 2008 to 2019. In this study, machine learning techniques, Random Forest Boruta algorithms, Random Forest Recursive Feature Elimination, and Bayesian Regularization Neural Networks (BRNN) were utilized. Among 1929 variables, this study found 38 most important variables for representing the BIS capital adequacy ratio. An additional comparison was executed to confirm the statistical validity of future prediction performance between BRNN and ordinary least squares (OLS) models. BRNN predicted the BIS capital adequacy ratio more robustly and accurately than the OLS models. We believe our findings would appeal to the readership of your journal such as the policymakers, managers and practitioners in the bank-related fields because this study highlights the key findings from the data-driven approaches using machine learning techniques.

Download Full-text

"Predicting Absenteeism at Work Using Machine Learning Algorithms

Muthanna Journal of Pure Science ◽

10.52113/2/07.01.2020/1-12 ◽

2019 ◽

Vol 7 (1) ◽

pp. 1-12

Author(s):

Samir Qaisar Ajmi

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Decision Tree ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Tree Model ◽

Learning Techniques ◽

Business Market ◽

Commercial Environment

"To work in the commercial environment, the company needs to be a major competitor in the business market, which depends mainly on the company's resources. One of the most important resources is the employees. Based on that, the absence of the employees from work leads to deterioration and reduce production in the institutions which leads to heavy losses. There are many reasons why employees are absent from work. Those may include health problems and social occasions. The purpose of this paper was to apply machine learning techniques to predict the absenteeism at work. There are four methods have been used in this research ( neural network(NN) technique ,decision tree (DT) technique, support vector machine (SVM) technique and logistic regression (LR) technique. . decision tree model has the highest accuracy equals to 83.33% with AUC 0.834 and the support vector machine has the lowest accuracy equals to 68.47 % with AUC 0.760."

Download Full-text

Analysis of Machine Learning Techniques for Anomaly-Based Intrusion Detection

International Journal of Distributed Artificial Intelligence ◽

10.4018/ijdai.2020010102 ◽

2020 ◽

Vol 12 (1) ◽

pp. 20-38

Author(s):

Winfred Yaokumah ◽

Isaac Wiafe

Keyword(s):

Machine Learning ◽

Random Forest ◽

Intrusion Detection ◽

Decision Tree ◽

Naive Bayes ◽

Weighted Average ◽

Absolute Error ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Machine Learning Techniques

Determining the machine learning (ML) technique that performs best on new datasets is an important factor in the design of effective anomaly-based intrusion detection systems. This study therefore evaluated four machine learning algorithms (naive Bayes, k-nearest neighbors, decision tree, and random forest) on UNSW-NB 15 dataset for intrusion detection. The experiment results showed that random forest and decision tree classifiers are effective for detecting intrusion. Random forest had the highest weighted average accuracy of 89.66% and a mean absolute error (MAE) value of 0.0252 whereas decision tree recorded 89.20% and 0.0242, respectively. Naive Bayes classifier had the worst results on the dataset with 56.43% accuracy and a MAE of 0.0867. However, contrary to existing knowledge, naïve Bayes was observed to be potent in classifying backdoor attacks. Observably, naïve Bayes performed relatively well in classes where tree-based classifiers demonstrated abysmal performance.

Download Full-text