Predicting Limit-Setting Behavior of Gamblers Using Machine Learning Algorithms: A Real-World Study of Norwegian Gamblers Using Account Data

International Journal of Mental Health and Addiction ◽

10.1007/s11469-019-00166-2 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael Auer ◽

Mark D. Griffiths

Keyword(s):

Machine Learning ◽

Random Forest ◽

Test Data ◽

Predictive Analytics ◽

Learning Algorithm ◽

Responsible Gambling ◽

Machine Learning Algorithms ◽

Training Data ◽

Training Dataset ◽

Limit Setting

AbstractPlayer protection and harm minimization have become increasingly important in the gambling industry along with the promotion of responsible gambling (RG). Among the most widespread RG tools that gaming operators provide are limit-setting tools that help players limit the amount of time and/or money they spend gambling. Research suggests that limit-setting significantly reduces the amount of money that players spend. If limit-setting is to be encouraged as a way of facilitating responsible gambling, it is important to know what variables are important in getting individuals to set and change limits in the first place. In the present study, 33 variables assessing the player behavior among Norsk Tipping clientele (N = 70,789) from January to March 2017 were computed. The 33 variables which reflect the players’ behavior were then used to predict the likelihood of gamblers changing their monetary limit between April and June 2017. The 70,789 players were randomly split into a training dataset of 56,532 and an evaluation set of 14,157 players (corresponding to an 80/20 split). The results demonstrated that it is possible to predict future limit-setting based on player behavior. The random forest algorithm appeared to predict limit-changing behavior much better than the other algorithms. However, on the independent test data, the random forest algorithm’s accuracy dropped significantly. The best performance on the test data along with a small decrease in accuracy in comparison to the training data was delivered by the gradient boost machine learning algorithm. The most important variables predicting future limit-setting using the gradient boost machine algorithm were players receiving feedback that they had reached 80% of their personal monthly global loss limit, personal monthly loss limit, the amount bet, theoretical loss, and whether the players had increased their limits in the past. With the help of predictive analytics, players with a high likelihood of changing their limits can be proactively approached.

Machine Learning Algorithms in Fraud Detection: Case Study on Retail Consumer Financing Company

Asia Pacific Fraud Journal ◽

10.21532/apfjournal.v6i2.216 ◽

2021 ◽

Vol 6 (2) ◽

pp. 213

Author(s):

Nadya Intan Mustika ◽

Bagus Nenda ◽

Dona Ramadhan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Historical Data ◽

Learning Algorithm ◽

Fraud Detection ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

Random Forest Algorithm ◽

Data Set

This study aims to implement a machine learning algorithm in detecting fraud based on historical data set in a retail consumer financing company. The outcome of machine learning is used as samples for the fraud detection team. Data analysis is performed through data processing, feature selection, hold-on methods, and accuracy testing. There are five machine learning methods applied in this study: Logistic Regression, K-Nearest Neighbor (KNN), Decision Tree, Random Forest, and Support Vector Machine (SVM). Historical data are divided into two groups: training data and test data. The results show that the Random Forest algorithm has the highest accuracy with a training score of 0.994999 and a test score of 0.745437. This means that the Random Forest algorithm is the most accurate method for detecting fraud. Further research is suggested to add more predictor variables to increase the accuracy value and apply this method to different financial institutions and different industries.

P–165 Using Artificial Intelligence to Classify Embryo Shape: An International Perspective

Human Reproduction ◽

10.1093/humrep/deab130.164 ◽

2021 ◽

Vol 36 (Supplement_1) ◽

Author(s):

R Hariharan ◽

P He ◽

C Hickman ◽

J Chambost ◽

C Jacques ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Test Data ◽

Focal Plane ◽

Learning Algorithm ◽

Cohort Analysis ◽

Training Data ◽

Training Dataset ◽

Blind Test ◽

Cell Embryo

Abstract Study question Is a pre-trained machine learning algorithm able to accurately detect cellular arrangement in 4-cell embryos from a different continent? Summary answer Artificial Intelligence (AI) analysis of 4-cell embryo classification is transferable across clinics globally with 79% accuracy. What is known already Previous studies observing four-cell human embryo configurations have demonstrated that non-tetrahedral embryos (embryos in which cells make contact with fewer than 3 other cells) are associated with compromised blastulation and implantation potential. Previous research by this study group has indicated the efficacy of AI models in classification of tetrahedral and non-tetrahedral embryos with 87% accuracy, with a database comprising 2 clinics both from the same country (Brazil). This study aims to evaluate the transferability and robustness of this model on blind test data from a different country (France). Study design, size, duration The study was a retrospective cohort analysis in which 909 4-cell embryo images (“tetrahedral”, n = 749; “non-tetrahedral”, n = 160) were collected from 3 clinics (2 Brazilian, 1 French). All embryos were captured at the central focal plane using Embryoscope™ time-lapse incubators. The training data consisted solely of embryo images captured in Brazil (586 tetrahedral; 87 non-tetrahedral) and the test data consisted exclusively of embryo images captured in France (163 tetrahedral; 72 non-tetrahedral). Participants/materials, setting, methods The embryo images were labelled as either “tetrahedral” or “non-tetrahedral” at their respective clinics. Annotations were then validated by three operators. A ResNet–50 neural network model pretrained on ImageNet was fine-tuned on the training dataset to predict the correct annotation for each image. We used the cross entropy loss function and the RMSprop optimiser (lr = 1e–5). Simple data augmentations (flips and rotations) were used during the training process to help counteract class imbalances. Main results and the role of chance Our model was capable of classifying embryos in the blind French test set with 79% accuracy when trained with the Brazilian data. The model had sensitivity of 91% and 51% for tetrahedral and non-tetrahedral embryos respectively; precision was 81% and 73%; F1 score was 86% and 60%; and AUC was 0.61 and 0.64. This represents a 10% decrease in accuracy compared to when the model both trained and tested on different data from the same clinics. Limitations, reasons for caution Although strict inclusion and exclusion criteria were used, inter-operator variability may affect the pre-processing stage of the algorithm. Moreover, as only one focal plane was used, ambiguous cases were interpoloated and further annotated. Analysing embryos at multiple focal planes may prove crucial in improving the accuracy of the model. Wider implications of the findings: Though the use of machine learning models in the analysis of embryo imagery has grown in recent years, there has been concern over their robustness and transferability. While previous results have demonstrated the utility of locally-trained models, our results highlight the potential for models to be implemented across different clinics. Trial registration number Not applicable

Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning within Google Earth Engine

Remote Sensing ◽

10.3390/rs13010010 ◽

2020 ◽

Vol 13 (1) ◽

pp. 10

Author(s):

Andrea Sulova ◽

Jamal Jokar Arsanjani

Keyword(s):

Climate Change ◽

Machine Learning ◽

Random Forest ◽

Google Earth ◽

Summer Season ◽

Driving Factors ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Training Dataset ◽

Google Earth Engine

Recent studies have suggested that due to climate change, the number of wildfires across the globe have been increasing and continue to grow even more. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. Hence, the main objective of this study was to develop an automatized and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available remote sensing data with a reasonable computational expense for injecting into machine learning models. As a result, a data-driven model was set up in Google Earth Engine platform, which is publicly accessible and open for further adjustments. The training dataset was applied to different machine learning algorithms, i.e., Random Forest, Naïve Bayes, and Classification and Regression Tree. The findings show that Random Forest outperformed other algorithms and hence it was used further to explore the driving factors using variable importance analysis. The study indicates the probability of fire occurrences across Australia as well as identifies the potential driving factors of Australian wildfires for the 2019–2020 summer season. The methodical approach and achieved results and drawn conclusions can be of great importance to policymakers, environmentalists, and climate change researchers, among others.

FLOOD MAPPING USING RANDOM FOREST AND IDENTIFYING THE ESSENTIAL CONDITIONING FACTORS; A CASE STUDY IN FREDERICTON, NEW BRUNSWICK, CANADA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2020-609-2020 ◽

2020 ◽

Vol V-3-2020 ◽

pp. 609-615 ◽

Cited By ~ 1

Author(s):

M. Esfandiari ◽

S. Jabari ◽

H. McGrath ◽

D. Coleman

Keyword(s):

Machine Learning ◽

Random Forest ◽

New Brunswick ◽

Urban Areas ◽

Learning Algorithm ◽

Satellite Image ◽

Machine Learning Algorithms ◽

Slope Aspect ◽

Flood Peak ◽

Conditioning Factors

Abstract. Flood is one of the most damaging natural hazards in urban areas in many places around the world as well as the city of Fredericton, New Brunswick, Canada. Recently, Fredericton has been flooded in two consecutive years in 2018 and 2019. Due to the complicated behaviour of water when a river overflows its bank, estimating the flood extent is challenging. The issue gets even more challenging when several different factors are affecting the water flow, like the land texture or the surface flatness, with varying degrees of intensity. Recently, machine learning algorithms and statistical methods are being used in many research studies for generating flood susceptibility maps using topographical, hydrological, and geological conditioning factors. One of the major issues that researchers have been facing is the complexity and the number of features required to input in a machine-learning algorithm to produce acceptable results. In this research, we used Random Forest to model the 2018 flood in Fredericton and analyzed the effect of several combinations of 12 different flood conditioning factors. The factors were tested against a Sentinel-2 optical satellite image available around the flood peak day. The highest accuracy was obtained using only 5 factors namely, altitude, slope, aspect, distance from the river, and land-use/cover with 97.57% overall accuracy and 95.14% kappa coefficient.

PSIX-15 Assessment of machine learning algorithms for prediction of Aleutian disease in American mink

Journal of Animal Science ◽

10.1093/jas/skab235.484 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 264-265

Author(s):

Duy Ngoc Do ◽

Guoyu Hu ◽

Younes Miar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Models ◽

American Mink ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Enzyme Linked Immunosorbent Assay ◽

Linear Discriminant ◽

Machine Learning Classification

Abstract American mink (Neovison vison) is the major source of fur for the fur industries worldwide and Aleutian disease (AD) is causing severe financial losses to the mink industry. Different methods have been used to diagnose the AD in mink, but the combination of several methods can be the most appropriate approach for the selection of AD resilient mink. Iodine agglutination test (IAT) and counterimmunoelectrophoresis (CIEP) methods are commonly employed in test-and-remove strategy; meanwhile, enzyme-linked immunosorbent assay (ELISA) and packed-cell volume (PCV) methods are complementary. However, using multiple methods are expensive; and therefore, hindering the corrected use of AD tests in selection. This research presented the assessments of the AD classification based on machine learning algorithms. The Aleutian disease was tested on 1,830 individuals using these tests in an AD positive mink farm (Canadian Centre for Fur Animal Research, NS, Canada). The accuracy of classification for CIEP was evaluated based on the sex information, and IAT, ELISA and PCV test results implemented in seven machine learning classification algorithms (Random Forest, Artificial Neural Networks, C50Tree, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) using the Caret package in R. The accuracy of prediction varied among the methods. Overall, the Random Forest was the best-performing algorithm for the current dataset with an accuracy of 0.89 in the training data and 0.94 in the testing data. Our work demonstrated the utility and relative ease of using machine learning algorithms to assess the CIEP information, and consequently reducing the cost of AD tests. However, further works require the inclusion of production and reproduction information in the models and extension of phenotypic collection to increase the accuracy of current methods.

Using Supervised Machine Learning Algorithms for Automated Lithology Prediction from Wireline Log Data

10.2118/208559-ms ◽

2021 ◽

Author(s):

Marian Popescu ◽

Rebecca Head ◽

Tim Ferriday ◽

Kate Evans ◽

Jose Montero ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Training Dataset ◽

Depth Interval ◽

Log Data ◽

Machine Learning Approach ◽

Lithology Prediction ◽

Logging While Drilling

Abstract This paper presents advancements in machine learning and cloud deployment that enable rapid and accurate automated lithology interpretation. A supervised machine learning technique is described that enables rapid, consistent, and accurate lithology prediction alongside quantitative uncertainty from large wireline or logging-while-drilling (LWD) datasets. To leverage supervised machine learning, a team of geoscientists and petrophysicists made detailed lithology interpretations of wells to generate a comprehensive training dataset. Lithology interpretations were based on applying determinist cross-plotting by utilizing and combining various raw logs. This training dataset was used to develop a model and test a machine learning pipeline. The pipeline was applied to a dataset previously unseen by the algorithm, to predict lithology. A quality checking process was performed by a petrophysicist to validate new predictions delivered by the pipeline against human interpretations. Confidence in the interpretations was assessed in two ways. The prior probability was calculated, a measure of confidence in the input data being recognized by the model. Posterior probability was calculated, which quantifies the likelihood that a specified depth interval comprises a given lithology. The supervised machine learning algorithm ensured that the wells were interpreted consistently by removing interpreter biases and inconsistencies. The scalability of cloud computing enabled a large log dataset to be interpreted rapidly; >100 wells were interpreted consistently in five minutes, yielding >70% lithological match to the human petrophysical interpretation. Supervised machine learning methods have strong potential for classifying lithology from log data because: 1) they can automatically define complex, non-parametric, multi-variate relationships across several input logs; and 2) they allow classifications to be quantified confidently. Furthermore, this approach captured the knowledge and nuances of an interpreter's decisions by training the algorithm using human-interpreted labels. In the hydrocarbon industry, the quantity of generated data is predicted to increase by >300% between 2018 and 2023 (IDC, Worldwide Global DataSphere Forecast, 2019–2023). Additionally, the industry holds vast legacy data. This supervised machine learning approach can unlock the potential of some of these datasets by providing consistent lithology interpretations rapidly, allowing resources to be used more effectively.

Design-Oriented Multifidelity Fluid Simulation Using Machine Learned Fidelity Mapping

ASME 2019 Conference on Smart Materials, Adaptive Structures and Intelligent Systems ◽

10.1115/smasis2019-5515 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kazuko Fuchi ◽

Eric M. Wolf ◽

David S. Makhija ◽

Nathan A. Wukie ◽

Christopher R. Schrock ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Fluid Simulation ◽

Machine Learning Algorithms ◽

Training Data ◽

Supervised Machine Learning ◽

High Fidelity ◽

Computational Domain ◽

Symmetry Properties ◽

High Fidelity Simulations

Abstract A machine learning algorithm that performs multifidelity domain decomposition is introduced. While the design of complex systems can be facilitated by numerical simulations, the determination of appropriate physics couplings and levels of model fidelity can be challenging. The proposed method automatically divides the computational domain into subregions and assigns required fidelity level, using a small number of high fidelity simulations to generate training data and low fidelity solutions as input data. Unsupervised and supervised machine learning algorithms are used to correlate features from low fidelity solutions to fidelity assignment. The effectiveness of the method is demonstrated in a problem of viscous fluid flow around a cylinder at Re ≈ 20. Ling et al. built physics-informed invariance and symmetry properties into machine learning models and demonstrated improved model generalizability. Along these lines, we avoid using problem dependent features such as coordinates of sample points, object geometry or flow conditions as explicit inputs to the machine learning model. Use of pointwise flow features generates large data sets from only one or two high fidelity simulations, and the fidelity predictor model achieved 99.5% accuracy at training points. The trained model was shown to be capable of predicting a fidelity map for a problem with an altered cylinder radius. A significant improvement in the prediction performance was seen when inputs are expanded to include multiscale features that incorporate neighborhood information.

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Predicting Alert Source Device using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1526.079920 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1-10

Keyword(s):

Neural Network ◽

Machine Learning ◽

Model Building ◽

Learning Algorithm ◽

Learning Algorithms ◽

Research Work ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Imbalanced Dataset ◽

Daunting Task

In a large distributed virtualized environment, predicting the alerting source from its text seems to be daunting task. This paper explores the option of using machine learning algorithm to solve this problem. Unfortunately, our training dataset is highly imbalanced. Where 96% of alerting data is reported by 24% of alerting sources. This is the expected dataset in any live distributed virtualized environment, where new version of device will have relatively less alert compared to older devices. Any classification effort with such imbalanced dataset present different set of challenges compared to binary classification. This type of skewed data distribution makes conventional machine learning less effective, especially while predicting the minority device type alerts. Our challenge is to build a robust model which can cope with this imbalanced dataset and achieves relative high level of prediction accuracy. This research work stared with traditional regression and classification algorithms using bag of words model. Then word2vec and doc2vec models are used to represent the words in vector formats, which preserve the sematic meaning of the sentence. With this alerting text with similar message will have same vector form representation. This vectorized alerting text is used with Logistic Regression for model building. This yields better accuracy, but the model is relatively complex and demand more computational resources. Finally, simple neural network is used for this multi-class text classification problem domain by using keras and tensorflow libraries. A simple two layered neural network yielded 99 % accuracy, even though our training dataset was not balanced. This paper goes through the qualitative evaluation of the different machine learning algorithms and their respective result. Finally, two layered deep learning algorithms is selected as final solution, since it takes relatively less resource and time with better accuracy values.

A Daily Covid-19 Cases Prediction System using Data Mining and Machine Learning Algorithm

10.5121/csit.2021.112320 ◽

2021 ◽

Author(s):

Yiqi Jack Gao ◽

Yu Sun

Keyword(s):

Machine Learning ◽

Random Forest ◽

Hospital Admissions ◽

Polynomial Regression ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Policy Makers ◽

Diverse Range ◽

Using Data

The start of 2020 marked the beginning of the deadly COVID-19 pandemic caused by the novel SARS-COV-2 from Wuhan, China. As of the time of writing, the virus had infected over 150 million people worldwide and resulted in more than 3.5 million global deaths. Accurate future predictions made through machine learning algorithms can be very useful as a guide for hospitals and policy makers to make adequate preparations and enact effective policies to combat the pandemic. This paper carries out a two pronged approach to analyzing COVID-19. First, the model utilizes the feature significance of random forest regressor to select eight of the most significant predictors (date, new tests, weekly hospital admissions, population density, total tests, total deaths, location, and total cases) for predicting daily increases of Covid-19 cases, highlighting potential target areas in order to achieve efficient pandemic responses. Then it utilizes machine learning algorithms such as linear regression, polynomial regression, and random forest regression to make accurate predictions of daily COVID-19 cases using a combination of this diverse range of predictors and proved to be competent at generating predictions with reasonable accuracy.