Comparing machine learning and ensemble learning in the field of football

Football has been one of the most popular and loved sports since its birth on November 6th, 1869. The main reason for this is because it is highly unpredictable in nature. Predicting football matches results seems like the perfect problem for machine learning models. But there are various caveats such as picking the right features from an enormous number of available features. There have been many models which have been applied to various football-related datasets. This paper aims to compare Support Vector Machines a machine learning model and XGBoost an Ensemble learning model and how Ensemble Learning can greatly improve the accuracy of the predictions.

Download Full-text

Exploration of physiological sensors, features, and machine learning models for pain intensity estimation

PLoS ONE ◽

10.1371/journal.pone.0254108 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0254108

Author(s):

Fatemeh Pouromran ◽

Srinivasan Radhakrishnan ◽

Sagar Kamarthi

Keyword(s):

Machine Learning ◽

Pain Intensity ◽

Learning Model ◽

Support Vector ◽

Heat Pain ◽

Learning Models ◽

Intensity Estimation ◽

Machine Learning Model ◽

Physiological Sensors ◽

Machine Learning Models

In current clinical settings, typically pain is measured by a patient’s self-reported information. This subjective pain assessment results in suboptimal treatment plans, over-prescription of opioids, and drug-seeking behavior among patients. In the present study, we explored automatic objective pain intensity estimation machine learning models using inputs from physiological sensors. This study uses BioVid Heat Pain Dataset. We extracted features from Electrodermal Activity (EDA), Electrocardiogram (ECG), Electromyogram (EMG) signals collected from study participants subjected to heat pain. We built different machine learning models, including Linear Regression, Support Vector Regression (SVR), Neural Networks and Extreme Gradient Boosting for continuous value pain intensity estimation. Then we identified the physiological sensor, feature set and machine learning model that give the best predictive performance. We found that EDA is the most information-rich sensor for continuous pain intensity prediction. A set of only 3 features from EDA signals using SVR model gave an average performance of 0.93 mean absolute error (MAE) and 1.16 root means square error (RMSE) for the subject-independent model and of 0.92 MAE and 1.13 RMSE for subject-dependent. The MAE achieved with signal-feature-model combination is less than 1 unit on 0 to 4 continues pain scale, which is smaller than the MAE achieved by the methods reported in the literature. These results demonstrate that it is possible to estimate pain intensity of a patient using a computationally inexpensive machine learning model with 3 statistical features from EDA signal which can be collected from a wrist biosensor. This method paves a way to developing a wearable pain measurement device.

Download Full-text

TOPICAL ISSUES OF APPLICATION OF MACHINE LEARNING METHODS IN ECONOMY

Инновационные аспекты развития науки и техники. Сборник статей VIII Международной научно-практической конференции: сборник статей, [электронное издание сетевого распространения] / Под ред. Н.В. Емельянова. – М.: “КДУ”, “Добросвет”, 2021. – 149 с. ◽

10.31453/kdu.ru.978-5-7913-1176-4-2021-28-33 ◽

2021 ◽

Author(s):

Natalia Pavlovna Persteneva ◽

◽

Darya Dmitrievn Skryleva ◽

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Learning Model ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Model ◽

Popular Classes ◽

Machine Learning Models

The article discusses machine learning methods. Using the example of two popular classes: supervised learning and unsupervised learning. Variants of the main types of machine learning models for each method are presented. A generalized algorithm for building any machine learning model is formed.

Download Full-text

Improving Logging Prediction on Imbalanced Datasets

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.2016040103 ◽

2016 ◽

Vol 7 (2) ◽

pp. 43-71 ◽

Cited By ~ 3

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Open Source ◽

Class Imbalance ◽

Learning Model ◽

Learning Models ◽

Class Imbalance Problem ◽

Imbalanced Datasets ◽

Imbalance Problem ◽

Machine Learning Model ◽

Machine Learning Models

Logging is an important yet tough decision for OSS developers. Machine-learning models are useful in improving several steps of OSS development, including logging. Several recent studies propose machine-learning models to predict logged code construct. The prediction performances of these models are limited due to the class-imbalance problem since the number of logged code constructs is small as compared to non-logged code constructs. No previous study analyzes the class-imbalance problem for logged code construct prediction. The authors first analyze the performances of J48, RF, and SVM classifiers for catch-blocks and if-blocks logged code constructs prediction on imbalanced datasets. Second, the authors propose LogIm, an ensemble and threshold-based machine-learning model. Third, the authors evaluate the performance of LogIm on three open-source projects. On average, LogIm model improves the performance of baseline classifiers, J48, RF, and SVM, by 7.38%, 9.24%, and 4.6% for catch-blocks, and 12.11%, 14.95%, and 19.13% for if-blocks logging prediction.

Download Full-text

Learning to Identify At-Risk Students in Distance Education Using Interaction Counts

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.62211 ◽

2016 ◽

Vol 23 (2) ◽

pp. 124 ◽

Cited By ~ 2

Author(s):

Douglas Detoni ◽

Cristian Cechinel ◽

Ricardo Araujo Matsumura ◽

Daniela Francisco Brauner

Keyword(s):

Machine Learning ◽

At Risk ◽

At Risk Students ◽

Drop Out ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Student Dropout ◽

Vector Machines ◽

Machine Learning Models

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.

Download Full-text

Hybrid Machine Learning Model for Body Fat Percentage Prediction Based on Support Vector Regression and Emotional Artificial Neural Networks

Applied Sciences ◽

10.3390/app11219797 ◽

2021 ◽

Vol 11 (21) ◽

pp. 9797

Author(s):

Solaf A. Hussain ◽

Nadire Cavus ◽

Boran Sekeroglu

Keyword(s):

Machine Learning ◽

Body Fat ◽

Support Vector ◽

Body Fat Percentage ◽

Learning Models ◽

Fat Percentage ◽

Machine Learning Model ◽

Proposed Model ◽

Hybrid Machine ◽

Machine Learning Models

Obesity or excessive body fat causes multiple health problems and diseases. However, obesity treatment and control need an accurate determination of body fat percentage (BFP). The existing methods for BFP estimation require several procedures, which reduces their cost-effectivity and generalization. Therefore, developing cost-effective models for BFP estimation is vital for obesity treatment. Machine learning models, particularly hybrid models, have a strong ability to analyze challenging data and perform predictions by combining different characteristics of the models. This study proposed a hybrid machine learning model based on support vector regression and emotional artificial neural networks (SVR-EANNs) for accurate recent BFP prediction using a primary BFP dataset. SVR was applied as a consistent attribute selection model on seven properties and measurements, using the left-out sensitivity analysis, and the regression ability of the EANN was considered in the prediction phase. The proposed model was compared to seven benchmark machine learning models. The obtained results show that the proposed hybrid model (SVR-EANN) outperformed other machine learning models by achieving superior results in the three considered evaluation metrics. Furthermore, the proposed model suggested that abdominal circumference is a significant factor in BFP prediction, while age has a minor effect.

Download Full-text

Technical note: how to rationally compare the performances of different machine learning models?

10.7287/peerj.preprints.26714 ◽

2018 ◽

Cited By ~ 1

Author(s):

Terazima Maeda

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Learning Model ◽

Technical Note ◽

Learning Models ◽

Specific Data ◽

Machine Learning Model ◽

Specific Prediction ◽

Data Group ◽

Machine Learning Models

Nowadays, there is a large number of machine learning models that could be used for various areas. However, different research targets are usually sensitive to the type of models. For a specific prediction target, the predictive accuracy of a machine learning model is always dependent to the data feature, data size and the intrinsic relationship between inputs and outputs. Therefore, for a specific data group and a fixed prediction mission, how to rationally compare the predictive accuracy of different machine learning model is a big question. In this brief note, we show how should we compare the performances of different machine models by raising some typical examples.

Download Full-text

Daily Cryptocurrency Returns Forecasting and Trading via Machine Learning

Journal of Student Research ◽

10.47611/jsrhs.v10i4.2217 ◽

2021 ◽

Vol 10 (4) ◽

Author(s):

Andrew Falcon ◽

Tianshu Lyu

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Models ◽

Investor Attention ◽

Vector Machines ◽

Price Trends ◽

Sharpe Ratios ◽

Returns Forecasting ◽

Machine Learning Models ◽

Significant Factors

We execute a comparative analysis of machine learning models for the time-series forecasting of the sign of next-day cryptocurrency returns. We begin by compiling a proprietary dataset that encompasses a wide array of potential cryptocurrency valuation factors (price trends, liquidity, volatility, network, production, investor attention), subsequently identifying and evaluating the most significant factors. We apply eight machine learning models to the dataset, utilizing them as classifiers to predict the sign of next day price returns for the three largest cryptocurrencies by market capitalization: bitcoin, ethereum, and ripple. We show that the most significant valuation factors for cryptocurrency returns are price trend variables, seven and thirty-day reversal, to be specific. We conclude that support vector machines result in the most accurate classifications for all three cryptocurrencies. Additionally, we find that boosted models like AdaBoost and XGBoost have the poorest classification accuracy. At length, we construct a probability-based trading strategy that secures either a daily long or short position on one of the three examined cryptocurrencies. Ultimately, the strategy yields a Sharpe of 2.8 and a cumulative log return of 3.72. On average, the strategy’s log returns outperformed standalone investments in all three cryptocurrencies by a factor of 5.64, and Sharpe ratios more than threefold.

Download Full-text

Development and Validation of a Quick Sepsis-Related Organ Failure Assessment-Based Machine-Learning Model for Mortality Prediction in Patients with Suspected Infection in the Emergency Department

Journal of Clinical Medicine ◽

10.3390/jcm9030875 ◽

2020 ◽

Vol 9 (3) ◽

pp. 875

Author(s):

Young Suk Kwon ◽

Moon Seong Baek

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Learning Model ◽

Gradient Boosting ◽

Learning Models ◽

Suspected Infection ◽

Machine Learning Model ◽

Failure Assessment ◽

Qsofa Score ◽

Machine Learning Models

The quick sepsis-related organ failure assessment (qSOFA) score has been introduced to predict the likelihood of organ dysfunction in patients with suspected infection. We hypothesized that machine-learning models using qSOFA variables for predicting three-day mortality would provide better accuracy than the qSOFA score in the emergency department (ED). Between January 2016 and December 2018, the medical records of patients aged over 18 years with suspected infection were retrospectively obtained from four EDs in Korea. Data from three hospitals (n = 19,353) were used as training-validation datasets and data from one (n = 4234) as the test dataset. Machine-learning algorithms including extreme gradient boosting, light gradient boosting machine, and random forest were used. We assessed the prediction ability of machine-learning models using the area under the receiver operating characteristic (AUROC) curve, and DeLong’s test was used to compare AUROCs between the qSOFA scores and qSOFA-based machine-learning models. A total of 447,926 patients visited EDs during the study period. We analyzed 23,587 patients with suspected infection who were admitted to the EDs. The median age of the patients was 63 years (interquartile range: 43–78 years) and in-hospital mortality was 4.0% (n = 941). For predicting three-day mortality among patients with suspected infection in the ED, the AUROC of the qSOFA-based machine-learning model (0.86 [95% CI 0.85–0.87]) for three -day mortality was higher than that of the qSOFA scores (0.78 [95% CI 0.77–0.79], p < 0.001). For predicting three-day mortality in patients with suspected infection in the ED, the qSOFA-based machine-learning model was found to be superior to the conventional qSOFA scores.

Download Full-text

Evaluation of Driver's Cognitive Distracted State Considering the Ambient State of a Car

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2019010102 ◽

2019 ◽

Vol 13 (1) ◽

pp. 13-24

Author(s):

Hiroaki Koma ◽

Taku Harada ◽

Akira Yoshizawa ◽

Hirotoshi Iwasaki

Keyword(s):

Machine Learning ◽

Driving Simulator ◽

Support Vector ◽

Biometric Data ◽

State Data ◽

The Road ◽

Machine Learning Model ◽

Vector Machines ◽

Car Driving ◽

Machine Learning Models

The effectiveness of considering the ambient state of a driving car for evaluating the driver's cognitive distracted state is evaluated. In this article, Support Vector Machines and Random Forest, which are representative machine learning models, are applied. As input data for the machine learning model, in addition to a driver's biometric data and car driving data, an ambient state data of a driving car are used. The ambient state data of a driving car considered in this study are that of the preceding car and the shape of the road. Experiments using a driving simulator are conducted to evaluate the effectiveness of considering the ambient state of a driving car.

Download Full-text

Machine learning models for Hg prospecting in the Almadén mining district

10.5194/egusphere-egu21-7339 ◽

2021 ◽

Author(s):

Julio Alberto López-Gómez ◽

Daniel Carrasco Pardo ◽

Pablo Higueras ◽

Jose María Esbrí ◽

Saturnino Lorenzo

Keyword(s):

Machine Learning ◽

Learning Model ◽

Mining District ◽

Learning Models ◽

Geological Features ◽

Machine Learning Model ◽

Mineral Prospectivity ◽

Data Point ◽

The One ◽

Machine Learning Models

Traditionally, prospectivity models were designed using approaches mainly based on expert judgement. These models have been widely applied and they are also known as knowledge-driven prospectivity models (see Harris et al. (2015)). Currently, artificial intelligence approaches, especially machine learning models, are being applied to build prospectivity models since they have been proven to be successful in many other domains (see Sun et al., 2019 and Guerra Prado et al., 2020). They are also known as data-driven prospectivity models. Machine learning models allow to learn from data repositories in order to extract and detect relationships from the data to predict new instances.In this work, a geological dataset was collected by a team of expert geologists. The data collected includes the geographical coordinates as well as several geological features of points belonged to seventy-seven different mercury deposits in the Almad&#233;n mining district. The resulting dataset is composed by a total of 24798 points and 24 attributes for each point. In particular, we have collected geological and mining-related data regarding the Almad&#233;n mercury (Hg) mining district; these data include the location of the several Hg mineralizations, including their typology, size, mineralogy, and stratigraphic position, as well as other information associated to the metallogenetic model set up by Hern&#225;ndez et al. (1999).Later, few machine learning models are built to select the one which offers the best results. The aim of this work is twofold: on the one hand, it is intended to build a machine learning model capable of, given the geological features of a data point, to determine the mercury deposit to which it belongs. On the other hand, the aim is to build a machine learning model capable of, given the geological features of a data point, to identify the kind of deposit to which it belongs. The experiments conducted in this work have been properly designed, validating the results obtained using statistical techniques.Finally, the models built in this work will allow to generate mercury prospectivity maps. The final aim of this process is to get and train a system able to perform antimony prospection in the nearby Guadalmez syncline.This work was funded by the ANR (ANR-19-MIN2-0002-01), the AEI (MICIU/AEI/REF.: PCI2019-103779) and author&#8217;s institutions in the framework of the ERA-MIN2 AUREOLE project.ReferencesGuerra Prado E.M.; de Souza Filho C.R.; Carranza E.M.; Motta J.G. (2020). Modeling of Cu-Au prospectivity in the Caraj&#225;s mineral province (Brasil) through machine learning: Dealing with embalanced training data.Harris, J.R.; Grunsky, E.; Corrigan, D. (2015). Data- and knowledge-driven mineral prospectivity maps for Canda&#8217;s North.Hern&#225;ndez, A.; J&#233;brak, M.; Higueras, P.; Oyarzun, R.; Morata, D.; Munh&#225;, J. (1999). The Almad&#233;n mercury mining district, Spain. Mineralium Deposita, 34: 539-548.Sun, T.; Chen, F.; Zhong, L.; Liu, W.; Wang, Y. (2019). GIS-based mineral prospectivity mapping using machine learning methods: A case study from Tongling ore district, eastern China.

Download Full-text