Sport analytics for cricket game results using machine learning: An experimental study

Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is increasing each season, its viewership has increased markedly and the betting market for IPL is growing significantly every year. With cricket being a very dynamic game, bettors and bookies are incentivised to bet on the match results because it is a game that changes ball-by-ball. This paper investigates machine learning technology to deal with the problem of predicting cricket match results based on historical match data of the IPL. Influential features of the dataset have been identified using filter-based methods including Correlation-based Feature Selection, Information Gain (IG), ReliefF and Wrapper. More importantly, machine learning techniques including Naïve Bayes, Random Forest, K-Nearest Neighbour (KNN) and Model Trees (classification via regression) have been adopted to generate predictive models from distinctive feature sets derived by the filter-based methods. Two featured subsets were formulated, one based on home team advantage and other based on Toss decision. Selected machine learning techniques were applied on both feature sets to determine a predictive model. Experimental tests show that tree-based models particularly Random Forest performed better in terms of accuracy, precision and recall metrics when compared to probabilistic and statistical models. However, on the Toss featured subset, none of the considered machine learning algorithms performed well in producing accurate predictive models.

Download Full-text

Machine Learning Algorithms For Understanding The Determinants of Under-Five Mortality

10.21203/rs.3.rs-1021040/v1 ◽

2021 ◽

Author(s):

Rakesh Kumar Saroj ◽

Pawan Kumar Yadav ◽

Rajneesh Singh ◽

Obvious Nchimunya Chilyabanyama

Keyword(s):

Machine Learning ◽

Random Forest ◽

Information Gain ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Mortality Data ◽

Mortality Factors ◽

Under Five ◽

Learning Techniques

Abstract Background: The death rate of under-five children in India declined last few decades, but few bigger states have poor performance. This is a matter of serious concern for the child's health as well as social development. Nowadays, machine learning techniques play a crucial role in the smart health care system to capture the hidden factors and patterns of outcomes. In this paper, we used machine learning techniques to predict the important factors of under-five mortality.This study aims to explore the importance of machine learning techniques to predict under-five mortality and to find the important factors that cause under-five mortality.The data was taken from the National Family Health Survey-IV of Uttar Pradesh. We used four machine learning techniques like decision tree, support vector machine, random forest, and logistic regression to predict under-five mortality factors and model accuracy of each model. We have also used information gain to rank to know the important variables for accurate predictions in under-five mortality data.Result: Random Forest (RF) predicts the child mortality factors with the highest accuracy of 97.5 %, and the number of living children, births in the last five years, educational level, birth order, total children ever born, currently breastfeeding, and size of child at birth that identifying as essential factors for under-five mortality.Conclusion: The study focuses on machine learning techniques to predict and identify important factors for under-five mortality. The random forest model provides an excellent predictive result for estimating the risk factors of under-five mortality. Based on the resulting outcome, policymakers can make policies and plans to reduce under-five mortality.

Download Full-text

Clinical Data Analysis for Prediction of Cardiovascular Disease Using Machine Learning Techniques

Computational Intelligence and Neuroscience ◽

10.1155/2022/2973324 ◽

2022 ◽

Vol 2022 ◽

pp. 1-13

Author(s):

Rajkumar Gangappa Nadakinamani ◽

A. Reyana ◽

Sandeep Kautish ◽

A. S. Vibith ◽

Yogita Gupta ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Cardiac Risk ◽

Machine Learning Algorithms ◽

Random Tree ◽

Machine Learning Techniques ◽

Learning Technology ◽

Tree Model ◽

Learning Techniques ◽

Machine Learning Model

Cardiovascular disease is difficult to detect due to several risk factors, including high blood pressure, cholesterol, and an abnormal pulse rate. Accurate decision-making and optimal treatment are required to address cardiac risk. As machine learning technology advances, the healthcare industry’s clinical practice is likely to change. As a result, researchers and clinicians must recognize the importance of machine learning techniques. The main objective of this research is to recommend a machine learning-based cardiovascular disease prediction system that is highly accurate. In contrast, modern machine learning algorithms such as REP Tree, M5P Tree, Random Tree, Linear Regression, Naive Bayes, J48, and JRIP are used to classify popular cardiovascular datasets. The proposed CDPS’s performance was evaluated using a variety of metrics to identify the best suitable machine learning model. When it came to predicting cardiovascular disease patients, the Random Tree model performed admirably, with the highest accuracy of 100%, the lowest MAE of 0.0011, the lowest RMSE of 0.0231, and the fastest prediction time of 0.01 seconds.

Download Full-text

Machine Learning Modeling of Horizontal Photovoltaics Using Weather and Location Data

Energies ◽

10.3390/en13102570 ◽

2020 ◽

Vol 13 (10) ◽

pp. 2570

Author(s):

Christil Pasion ◽

Torrey Wagner ◽

Clay Koschnick ◽

Steven Schuldt ◽

Jada Williams ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ambient Temperature ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Solar Panels ◽

Power Prediction ◽

Location Data ◽

Learning Techniques ◽

Prior Literature

Solar energy is a key renewable energy source; however, its intermittent nature and potential for use in distributed systems make power prediction an important aspect of grid integration. This research analyzed a variety of machine learning techniques to predict power output for horizontal solar panels using 14 months of data collected from 12 northern-hemisphere locations. We performed our data collection and analysis in the absence of irradiation data—an approach not commonly found in prior literature. Using latitude, month, hour, ambient temperature, pressure, humidity, wind speed, and cloud ceiling as independent variables, a distributed random forest regression algorithm modeled the combined dataset with an R2 value of 0.94. As a comparative measure, other machine learning algorithms resulted in R2 values of 0.50–0.94. Additionally, the data from each location was modeled separately with R2 values ranging from 0.91 to 0.97, indicating a range of consistency across all sites. Using an input variable permutation approach with the random forest algorithm, we found that the three most important variables for power prediction were ambient temperature, humidity, and cloud ceiling. The analysis showed that machine learning potentially allowed for accurate power prediction while avoiding the challenges associated with modeled irradiation data.

Download Full-text

Real Time Efficient Accident Predictor System using Machine Learning Techniques (kNN, RF, LR, DT)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d6910.1210220 ◽

2020 ◽

Vol 10 (2) ◽

pp. 108-111

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

Classification Accuracy ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Classification Methods ◽

K Nearest Neighbors ◽

Learning Techniques

Real time crash predictor system is determining frequency of crashes and also severity of crashes. Nowadays machine learning based methods are used to predict the total number of crashes. In this project, prediction accuracy of machine learning algorithms like Decision tree (DT), K-nearest neighbors (KNN), Random forest (RF), Logistic Regression (LR) are evaluated. Performance analysis of these classification methods are evaluated in terms of accuracy. Dataset included for this project is obtained from 49 states of US and 27 states of India which contains 2.25 million US accident crash records and 1.16 million crash records respectively. Results prove that classification accuracy obtained from Random Forest (RF) is96% compared to other classification methods.

Download Full-text

A Novel Approach for Detecting DGA-Based Botnets in DNS Queries Using Machine Learning Techniques

Journal of Computer Networks and Communications ◽

10.1155/2021/4767388 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ali Soleymani ◽

Fatemeh Arabgol

Keyword(s):

Machine Learning ◽

Random Forest ◽

Text Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Detection Accuracy ◽

Domain Name ◽

Botnet Detection ◽

Learning Techniques

In today’s security landscape, advanced threats are becoming increasingly difficult to detect as the pattern of attacks expands. Classical approaches that rely heavily on static matching, such as blacklisting or regular expression patterns, may be limited in flexibility or uncertainty in detecting malicious data in system data. This is where machine learning techniques can show their value and provide new insights and higher detection rates. The behavior of botnets that use domain-flux techniques to hide command and control channels was investigated in this research. The machine learning algorithm and text mining used to analyze the network DNS protocol and identify botnets were also described. For this purpose, extracted and labeled domain name datasets containing healthy and infected DGA botnet data were used. Data preprocessing techniques based on a text-mining approach were applied to explore domain name strings with n-gram analysis and PCA. Its performance is improved by extracting statistical features by principal component analysis. The performance of the proposed model has been evaluated using different classifiers of machine learning algorithms such as decision tree, support vector machine, random forest, and logistic regression. Experimental results show that the random forest algorithm can be used effectively in botnet detection and has the best botnet detection accuracy.

Download Full-text

Machine Learning Based Indoor Localisation Using Wi-Fi And Smartphone

Journal of Independent Studies and Research - Computing ◽

10.31645/06 ◽

2020 ◽

Author(s):

Zulqarnain Khokhar ◽

◽

Murtaza Ahmed Siddiqi ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Indoor Localization ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Devices ◽

Gradient Boosting ◽

Learning Techniques ◽

Indoor Localisation

Wi-Fi based indoor positioning with the help of access points and smart devices have become an integral part in finding a device or a person’s location. Wi-Fi based indoor localization technology has been among the most attractive field for researchers for a number of years. In this paper, we have presented Wi-Fi based in-door localization using three different machine-learning techniques. The three machine learning algorithms implemented and compared are Decision Tree, Random Forest and Gradient Boosting classifier. After making a fingerprint of the floor based on Wi-Fi signals, mentioned algorithms were used to identify device location at thirty different positions on the floor. Random Forest and Gradient Boosting classifier were able to identify the location of the device with accuracy higher than 90%. While Decision Tree was able to identify the location with accuracy a bit higher than 80%.

Download Full-text

Predicting tile drainage discharge using machine learning algorithms

10.5194/hess-2019-650 ◽

2020 ◽

Author(s):

Saghar Khodadad Motarjemi ◽

Anders Bjørn Møller ◽

Finn Plauborg ◽

Bo Vangsø Iversen

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Clay Content ◽

Drainage Water ◽

Tile Drainage ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Agricultural Fields ◽

Drainage Systems

Abstract. Drainage systems can significantly improve the water management in agricultural fields. However, they may transport contaminants originating from fertilizers and pesticides and threaten ecosystems. Determining the quantity of drainage water is an important factor for constructed wetlands and other drainage mitigation techniques. This study was carried out in Denmark where tile drainage systems are implemented in more than half of the agricultural fields. The first aim of the study was to predict the annual discharge of tile drainage systems using machine-learning methods, which have been highly popular in recent years. The second objective was to assess the importance of the parameters and their impact on the predictions. Data from 53 drainage stations distributed in different regions of Denmark were collected and used for the analysis. The covariates contained 35 parameters including the calculated percolation and geographic variables such as drainage probability, clay content in different depth intervals, and elevation, all extracted from existing national maps. Random Forest and Cubist were selected as predictive models. Both models were trained on the dataset and used to predict yearly drainage discharge. Results highlighted the importance of the cross-validation methods and indicated that both Random Forest and Cubist can perform as predictive models with a low complexity and good correlation between predicted and observed discharge. Covariate importance analysis showed that among all of the used predictors, the percolation and elevation have the largest effect on the prediction of tile drainage discharge. This work opens up for a better understanding of the dynamics of tile drainage discharge and proves that machine-learning techniques can perform as predictive models in this specific concept. The developed models can be used in regard to a national mapping of expected tile drain discharge.

Download Full-text

Stance detection using diverse feature sets based on machine learning techniques

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202269 ◽

2021 ◽

pp. 1-20

Author(s):

Kashif Ayyub ◽

Saqib Iqbal ◽

Muhammad Wasif Nisar ◽

Saima Gulzar Ahmad ◽

Ehsan Ullah Munir

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Information Gain ◽

Real Life ◽

Machine Learning Techniques ◽

Base Line ◽

Feature Sets ◽

Part Of Speech ◽

Learning Techniques

Sentiment analysis is the field that analyzes sentiments, and opinions of people about entities such as products, businesses, and events. As opinions influence the people’s behaviors, it has numerous applications in real life such as marketing, politics, social media etc. Stance detection is the sub-field of sentiment analysis. The stance classification aims to automatically identify from the source text, whether the source is in favor, neutral, or opposed to the target. This research study proposed a framework to explore the performance of the conventional (NB, DT, SVM), ensemble learning (RF, AdaBoost) and deep learning-based (DBN, CNN-LSTM, and RNN) machine learning techniques. The proposed method is feature centric and extracted the (sentiment, content, tweet specific and part-of-speech) features from both datasets of SemEval2016 and SemEval2017. The proposed study has also explored the role of deep features such as GloVe and Word2Vec for stance classification which has not received attention yet for stance detection. Some base line features such as Bag of words, N-gram, TF-IDF are also extracted from both datasets to compare the proposed features along with deep features. The proposed features are ranked using feature ranking methods such as (information gain, gain ration and relief-f). Further, the results are evaluated using standard performance evaluation measures for stance classification with existing studies. The calculated results show that the proposed feature sets including sentiment, (part-of-speech, content, and tweet specific) are helpful for stance classification when applied with SVM and GloVe a deep feature has given the best results when applied with deep learning method RNN.

Download Full-text

Differentiating Thrombotic Microangiopathies Based on Laboratory Tests Other Than ADAMTS13 Using Machine Learning Technology

Blood ◽

10.1182/blood.v128.22.3749.3749 ◽

2016 ◽

Vol 128 (22) ◽

pp. 3749-3749

Author(s):

Youngil Koh ◽

SuYeon Lee ◽

Hong-Seok Yun ◽

Sung-Soo Yoon ◽

Inho Kim ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Correlation Coefficient ◽

Machine Learning Techniques ◽

Learning Technology ◽

Thrombotic Microangiopathies ◽

Random Forest Method ◽

Learning Techniques

Abstract Introduction: ADAMTS13 activity level is crucial for differentiating thrombotic microangiopathies. However, ADAMTS13 testing is not readily available at site in many parts of the world. Hence, we developed an innovative algorithm that allow differentiation of thrombotic thrombocytopenic purpura (TTP) from other TMA's based on laboratory results other than ADAMTS13 using machine learning. Methods: Two hundred- eight adult patients with either TTP (N=64) or TMA other than TTP (N=144) (ADAMTS13 cutoff level of 10%) were classified using three machine learning techniques (decision tree, random forest, and neural network), using a set of easily measured 19 clinical variables such as fever, Hb, ALT and so on. Basically, each clinical variable is not correlated with TTP (Absolute values of correlation coefficients are lower than 0.5), so we applied machine learning algorithms. First, we divided patient data into three parts, train, test and validation set. And then, we applied these 3 machine learning techniques, decision tree, random forest and neural network. Principal component analysis was also performed. Results: As a single variable, platelet count, BUN and total bilirubin were the most important three variables that are predictive of differentiating TTP from other TMA's with accuracy of 82%. Random forest method increased accuracy to 85% and precision, and recall statistic is 0.828, and 0.832, respectively. Neural network did not do better without optimization than random forest method. Conclusion: Machine learning technology seems promising in differentiating TTP from other TMA's if ADAMTS13 value is not available. These algorithms could support the physician in tailoring the management of TMA. Correlation coefficient in our study Correlation coefficient in our study Scheme of Random Forest method used in our study Scheme of Random Forest method used in our study Disclosures Lee: SamsungSDS: Employment. Yun:Samsung SDS: Employment.

Download Full-text

An Approach for Variable Selection and Prediction Model for Estimating the Risk-Based Capital (RBC) Based on Machine Learning Algorithms

Risks ◽

10.3390/risks10010013 ◽

2022 ◽

Vol 10 (1) ◽

pp. 13

Author(s):

Jaewon Park ◽

Minsoo Shin

Keyword(s):

Machine Learning ◽

Random Forest ◽

Business Performance ◽

Ordinary Least Squares ◽

Machine Learning Algorithms ◽

Capital Adequacy ◽

Machine Learning Techniques ◽

Insurance Companies ◽

Least Squares Regression ◽

Learning Techniques

The risk-based capital (RBC) ratio, an insurance company’s financial soundness system, evaluates the capital adequacy needed to withstand unexpected losses. Therefore, continuous institutional improvement has been made to monitor the financial solvency of companies and protect consumers’ rights, and improvement of solvency systems has been researched. The primary purpose of this study is to find a set of important predictors to estimate the RBC ratio of life insurance companies in a large number of variables (1891), which includes crucial finance and management indices collected from all Korean insurers quarterly under regulation for transparent management information. This study employs a combination of Machine learning techniques: Random Forest algorithms and the Bayesian Regulatory Neural Network (BRNN). The combination of Random Forest algorithms and BRNN predicts the next period’s RBC ratio better than the conventional statistical method, which uses ordinary least-squares regression (OLS). As a result of the findings from Machine learning techniques, a set of important predictors is found within three categories: liabilities and expenses, other financial predictors, and predictors from business performance. The dataset of 23 companies with 1891 variables was used in this study from March 2008 to December 2018 with quarterly updates for each year.

Download Full-text