Churn Prediction and Fraud Detection in Dairy Sector Using Machine Learning

Author(s):  
Hitarth Deepak Shah ◽  
Chintan M. Bhatt ◽  
Shubham Mitul Patel ◽  
Jayshil Bhavin Khajanchi ◽  
Jaimin Narendrakumar Makwana

India has globally been the largest milk-producing country in the world for two decades. About 400 million litres of milk is produced every day. It is the responsibility of a dairy sector to look after the farmers by providing them with various services for their livelihood. The growing financial capital of the dairy industry has enticed various fraudulent behaviour. The majority of suspicious activities are seen during the collection at local collection centres, fake farmer entries, tempered quantity and fat entries manually, and adulteration are the profound malpractices exercised by farmers. So, in this research work, the authors present a profound study on the most popular machine learning methods applied to the problems of farmer churn prediction and fraud detection in the dairies. They applied a plethora of machine learning algorithms to get accurate results for churn and fraud detection. XGBoost Classifier was the best for churn prediction with 93% accuracy, while random forest classifier turns out to be effective for fraud detection with 94% accuracy.

2022 ◽  
pp. 383-393
Author(s):  
Lokesh M. Giripunje ◽  
Tejas Prashant Sonar ◽  
Rohit Shivaji Mali ◽  
Jayant C. Modhave ◽  
Mahesh B. Gaikwad

Risk because of heart disease is increasing throughout the world. According to the World Health Organization report, the number of deaths because of heart disease is drastically increasing as compared to other diseases. Multiple factors are responsible for causing heart-related issues. Many approaches were suggested for prediction of heart disease, but none of them were satisfactory in clinical terms. Heart disease therapies and operations available are so costly, and following treatment, heart disease is also costly. This chapter provides a comprehensive survey of existing machine learning algorithms and presents comparison in terms of accuracy, and the authors have found that the random forest classifier is the most accurate model; hence, they are using random forest for further processes. Deployment of machine learning model using web application was done with the help of flask, HTML, GitHub, and Heroku servers. Webpages take input attributes from the users and gives the output regarding the patient heart condition with accuracy of having coronary heart disease in the next 10 years.


2020 ◽  
Vol 184 ◽  
pp. 01011
Author(s):  
Sreethi Musunuru ◽  
Mahaalakshmi Mukkamala ◽  
Latha Kunaparaju ◽  
N V Ganapathi Raju

Though banks hold an abundance of data on their customers in general, it is not unusual for them to track the actions of the creditors regularly to improve the services they offer to them and understand why a lot of them choose to exit and shift to other banks. Analyzing customer behavior can be highly beneficial to the banks as they can reach out to their customers on a personal level and develop a business model that will improve the pricing structure, communication, advertising, and benefits for their customers and themselves. Features like the amount a customer credits every month, his salary per annum, the gender of the customer, etc. are used to classify them using machine learning algorithms like K Neighbors Classifier and Random Forest Classifier. On classifying the customers, banks can get an idea of who will be continuing with them and who will be leaving them in the near future. Our study determines to remove the features that are independent but are not influential to determine the status of the customers in the future without the loss of accuracy and to improve the model to see if this will also increase the accuracy of the results.


2020 ◽  
Vol 9 (2) ◽  
pp. 1049-1054

In this paper, we have tried to predict flight delays using different machine learning and deep learning techniques. By using such a model it can be easier to predict whether the flight will be delayed or not. Factors like ‘WeatherDelay’, ‘NASDelay’, ‘Destination’, ‘Origin’ play a vital role in this model. Using machine learning algorithms like Random Forest, Support Vector Machine (SVM) and K-Nearest Neighbors (KNN), the f1-score, precision, recall, support and accuracy have been predicted. To add to the model, Long Short-Term Memory (LSTM) RNN architecture has also been employed. In the paper, the dataset from Bureau of Transportation Statistics (BTS) of the ‘Pittsburgh’ is being used. The results computed from the above mentioned algorithms have been compared. Further, the results were visualized for various airlines to find maximum delay and AUC-ROC curve has been plotted for Random Forest Algorithm. The aim of our research work is to predict the delay so as to minimize loses and increase customer satisfaction.


Parkinson’s malady is the most current neurodegenerative disorder poignant quite ten million folks across the world. There's no single test at which may be administered for diagnosis Parkinson’s malady. Our aim is to analyze machine learning based mostly techniques for Parkinson malady identification in patients. Our machine learning-based technique is employed to accurately predict the malady by speech and handwriting patterns of humans and by predicting leads to the shape of best accuracy and in addition compare the performance of assorted machine learning algorithms from the given hospital dataset with analysis and classification report and additionally determine the result and prove against with best accuracy and exactness, Recall ,F1 Score specificity and sensitivity.


Author(s):  
Pedro Sobreiro ◽  
Pedro Guedes-Carvalho ◽  
Abel Santos ◽  
Paulo Pinheiro ◽  
Celina Gonçalves

The phenomenon of dropout is often found among customers of sports services. In this study we intend to evaluate the performance of machine learning algorithms in predicting dropout using available data about their historic use of facilities. The data relating to a sample of 5209 members was taken from a Portuguese fitness centre and included the variables registration data, payments and frequency, age, sex, non-attendance days, amount billed, average weekly visits, total number of visits, visits hired per week, number of registration renewals, number of members referrals, total monthly registrations, and total member enrolment time, which may be indicative of members’ commitment. Whilst the Gradient Boosting Classifier had the best performance in predicting dropout (sensitivity = 0.986), the Random Forest Classifier was the best at predicting non-dropout (specificity = 0.790); the overall performance of the Gradient Boosting Classifier was superior to the Random Forest Classifier (accuracy 0.955 against 0.920). The most relevant variables predicting dropout were “non-attendance days”, “total length of stay”, and “total amount billed”. The use of decision trees provides information that can be readily acted upon to identify member profiles of those at risk of dropout, giving also guidelines for measures and policies to reduce it.


Recycling ◽  
2021 ◽  
Vol 6 (4) ◽  
pp. 65
Author(s):  
Ali Hewiagh ◽  
Kannan Ramakrishnan ◽  
Timothy Tzen Vun Yap ◽  
Ching Seong Tan

Online frauds have pernicious impacts on different system domains, including waste management systems. Fraudsters illegally obtain rewards for their recycling activities or avoid penalties for those who are required to recycle their own waste. Although some approaches have been introduced to prevent such fraudulent activities, the fraudsters continuously seek new ways to commit illegal actions. Machine learning technology has shown significant and impressive results in identifying new online fraud patterns in different system domains such as e-commerce, insurance, and banking. The purpose of this paper, therefore, is to analyze a waste management system and develop a machine learning model to detect fraud in the system. The intended system allows consumers, individuals, and organizations to track, monitor, and update their performance in their recycling activities. The data set provided by a waste management organization is used for the analysis and the model training. This data set contains transactions of users’ recycling activities and behaviors. Three machine learning algorithms, random forest, support vector machine, and multi-layer perceptron are used in the experiments and the best detection model is selected based on the model’s performance. Results show that each of these algorithms can be used for fraud detection in waste managements with high accuracy. The random forest algorithm produces the optimal model with an accuracy of 96.33%, F1-score of 95.20%, and ROC of 98.92%.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Yogesh Kumar ◽  
Apeksha Koul ◽  
Pushpendra Singh Sisodia ◽  
Jana Shafi ◽  
Verma Kavita ◽  
...  

Quantum-enhanced machine learning plays a vital role in healthcare because of its robust application concerning current research scenarios, the growth of novel medical trials, patient information and record management, procurement of chronic disease detection, and many more. Due to this reason, the healthcare industry is applying quantum computing to sustain patient-oriented attention to healthcare patrons. The present work summarized the recent research progress in quantum-enhanced machine learning and its significance in heart failure detection on a dataset of 14 attributes. In this paper, the number of qubits in terms of the features of heart failure data is normalized by using min-max, PCA, and standard scalar, and further, has been optimized using the pipelining technique. The current work verifies that quantum-enhanced machine learning algorithms such as quantum random forest (QRF), quantum K nearest neighbour (QKNN), quantum decision tree (QDT), and quantum Gaussian Naïve Bayes (QGNB) are better than traditional machine learning algorithms in heart failure detection. The best accuracy rate is (0.89), which the quantum random forest classifier attained. In addition to this, the quantum random forest classifier also incurred the best results in F 1 score, recall and, precision by (0.88), (0.93), and (0.89), respectively. The computation time taken by traditional and quantum-enhanced machine learning algorithms has also been compared where the quantum random forest has the least execution time by 150 microseconds. Hence, the work provides a way to quantify the differences between standard and quantum-enhanced machine learning algorithms to select the optimal method for detecting heart failure.


2021 ◽  
Author(s):  
Meng Ji ◽  
Pierrette Bouillon

BACKGROUND Linguistic accessibility has important impact on the reception and utilization of translated health resources among multicultural and multilingual populations. Linguistic understandability of health translation has been under-studied. OBJECTIVE Our study aimed to develop novel machine learning models for the study of the linguistic accessibility of health translations comparing Chinese translations of the World Health Organization health materials with original Chinese health resources developed by the Chinese health authorities. METHODS Using natural language processing tools for the assessment of the readability of Chinese materials, we explored and compared the readability of Chinese health translations from the World Health Organization with original Chinese materials from China Centre for Disease Control and Prevention. RESULTS Pairwise adjusted t test showed that three new machine learning models achieved statistically significant improvement over the baseline logistic regression in terms of AUC: C5.0 decision tree (p=0.000, 95% CI: -0.249, -0.152), random forest (p=0.000, 95% CI: 0.139, 0.239) and XGBoost Tree (p=0.000, 95% CI: 0.099, 0.193). There was however no significant difference between C5.0 decision tree and random forest (p=0.513). Extreme gradient boost tree was the best model having achieved statistically significant improvement over the C5.0 model (p=0.003) and the Random Forest model (p=0.006) at the adjusted Bonferroni p value at 0.008. CONCLUSIONS The development of machine learning algorithms significantly improved the accuracy and reliability of current approaches to the evaluation of the linguistic accessibility of Chinese health information, especially Chinese health translations in relation to original health resources. Although the new algorithms developed were based on Chinese health resources, they can be adapted for other languages to advance current research in accessible health translation, communication, and promotion.


Author(s):  
Sheikh Shehzad Ahmed

The Internet is used practically everywhere in today's digital environment. With the increased use of the Internet comes an increase in the number of threats. DDoS attacks are one of the most popular types of cyber-attacks nowadays. With the fast advancement of technology, the harm caused by DDoS attacks has grown increasingly severe. Because DDoS attacks may readily modify the ports/protocols utilized or how they function, the basic features of these attacks must be examined. Machine learning approaches have also been used extensively in intrusion detection research. Still, it is unclear what features are applicable and which approach would be better suited for detection. With this in mind, the research presents a machine learning-based DDoS attack detection approach. To train the attack detection model, we employ four Machine Learning algorithms: Decision Tree classifier (ID3), k-Nearest Neighbors (k-NN), Logistic Regression, and Random Forest classifier. The results of our experiments show that the Random Forest classifier is more accurate in recognizing attacks.


2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Qiaochu Chen ◽  
Lauren E Charles

Objective: The objective is to develop an ensemble of machine learning algorithms to identify multilingual, online articles that are relevant to biosurveillance. Language morphology varies widely across languages and must be accounted for when designing algorithms. Here, we compare the performance of a word embedding-based approach and a topic modeling approach with machine learning algorithms to determine the best method for Chinese, Arabic, and French languages.Introduction: Global biosurveillance is an extremely important, yet challenging task. One form of global biosurveillance comes from harvesting open source online data (e.g. news, blogs, reports, RSS feeds). The information derived from this data can be used for timely detection and identification of biological threats all over the world. However, the more inclusive the data harvesting procedure is to ensure that all potentially relevant articles are collected, the more data that is irrelevant also gets harvested. This issue can become even more complex when the online data is in a non-native language. Foreign language articles not only create language-specific issues for Natural Language Processing (NLP), but also add a significant amount of translation costs. Previous work shows success in the use of combinatory monolingual classifiers in specific applications, e.g., legal domain [1]. A critical component for a comprehensive, online harvesting biosurveillance system is the capability to identify relevant foreign language articles from irrelevant ones based on the initial article information collected, without the additional cost of full text retrieval and translation.Methods: The analysis text dataset contains the title and brief description of 3506 online articles in Chinese, Arabic, and French languages from the date range of August, 17, 2016 to July 5, 2017. The NLP article pre-processing steps are language-specific tokenization and stop words removal. We compare two different approaches: word embeddings and topic modeling (Fig. 1). For word embeddings, we first generate word vectors for the data using a pretrained Word2Vec (W2V) model [2]. Subsequently, the word vectors within a document are averaged to produce a single feature vector for the document. Then, we fit a machine learning algorithm (random forest classifier or Support Vector Machine (SVM)) to the training vectors and get predictions for the test documents. For topic modelling, we used a Latent Dirichlet Allocation (LDA) model to generate five topics for all relevant documents [3]. For each new document, the output is the probability measure for the document belonging to these five topics. Here, we classify the new document by comparing the probability measure with a relevancy threshold.Results: The Word2Vec model combined with a random forest classifier outperformed the other approaches across the three languages (Fig. 2); the Chinese model has an 89% F1-score, the Arabic model has 86%, and the French model has 94%. To decrease the chance of calling a potentially relevant article irrelevant, high recall was more important than high precision. In the Chinese model, the Word2Vec with a random forest approach had the highest recall at 98% (Table 1).Conclusions: We present research findings on different approaches of relevance to biosurveillance identification on non-English texts and identify the best performing methods for implementation into a biosurveillance online article harvesting system. Our initial results suggest that the word embeddings model has an advantage over topic modeling, and the random forest classifier outperforms the SVM. Directions for future work will aim to further expand the list of languages and methods to be compared, e.g., n-grams and non-negative matrix factorization. In addition, we will fine-tune the Arabic and French model for better accuracy results.


Sign in / Sign up

Export Citation Format

Share Document