Customer Behavior Analysis using Machine Learning

RFM (Recency, Frequency, Monetary) investigation is a demonstrated showcasing model for conduct based client division. It groups clients dependent on their exchange history – how as of late, how frequently and what amount they buy.RFM helps partition clients into different classes or groups to distinguish clients who will react to advancements and how. This RFM examination depends on a blend of three boundaries. For instance, we can say that individuals who spend the most on items are our best clients. A large portion of us coincide and think about something very similar. In any case, Imagine a scenario in which they were bought just a single time. Or on the other hand an extremely quiet past? Consider the possibility that they are done utilizing our item. would they be able to in any case be viewed as your best clients? Most likely not. Making a decision about client esteem from only one perspective will give you a mistaken report of your client base and their lifetime. That is the reason, the RFM model joins three diverse clients ascribed to rank clients. In the event that they purchased in the recent past, they get higher focus. On the off chance that they purchase ordinarily, they get a higher score. What's more, on the off chance that they spend greater, they get more focus. Thus, we Combine these three scores to make the RFM score. At long last we can portion the client data set into various gatherings dependent on this RFM score.

Download Full-text

Customer Segment Prognostic System by Machine Learning using Principal Component and Linear Discriminant Analysis

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2290.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 6198-6203

Keyword(s):

Machine Learning ◽

Discriminant Analysis ◽

Dimensionality Reduction ◽

Linear Discriminant Analysis ◽

Principal Component ◽

Customer Behavior ◽

Machine Learning Algorithms ◽

Data Set ◽

Linear Discriminant ◽

Customer Group

Recently, manufacturing industry faces lots of problem in predicting the customer behavior and group for matching their outcome with the profit. The organizations are finding difficult in identifying the customer behavior for the purpose of predicting the product design so as to increase the profit. The prediction of customer group is a challenging task for all the organization due to the current growing entrepreneurs. This results in using the machine learning algorithms to cluster the customer group for predicting the demand of the customers. This helps in decision making process of manufacturing the products. This paper attempts to predict the customer group for the wine data set extracted from UCI Machine Learning repository. The wine data set is subjected to dimensionality reduction with principal component analysis and linear discriminant analysis. A Performance analysis is done with various classification algorithms and comparative study is done with the performance metric such as accuracy, precision, recall, and f-score. Experimental results shows that after applying dimensionality reduction, the 2 component LDA reduced wine data set with the kernel SVM, Random Forest classifier is found to be effective with the accuracy of 100% compared to other classifiers.

Download Full-text

An Integrated Framework with Machine Learning and Radiomics for Accurate and Rapid Early Diagnosis of COVID-19 from Chest X-ray

10.1101/2020.10.01.20205146 ◽

2020 ◽

Author(s):

Mahbubunnabi Tamal ◽

Maha Alshammari ◽

Meernah Alabdullah ◽

Rana Hourani ◽

Hossain Abu Alola ◽

...

Keyword(s):

Machine Learning ◽

Early Diagnosis ◽

Cost Effective ◽

The Other ◽

Support Vector ◽

Rt Pcr ◽

Data Set ◽

X Ray ◽

Other Hand ◽

Chest X Ray

ABSTRACTEarly diagnosis of COVID-19 is considered the first key action to prevent spread of the virus. Currently, reverse transcription-polymerase chain reaction (RT-PCR) is considered as a gold standard point-of-care diagnostic tool. However, several limitations of RT-PCR have been identified, e.g., low sensitivity, cost, long delay in getting results and the need of a professional technician to collect samples. On the other hand, chest X-ray (CXR) is routinely used as a cost-effective diagnostic test for diagnosis and monitoring different respiratory abnormalities and is currently being used as a discriminating tool for COVID-19. However, visual assessment of CXR is not able to distinguish COVID-19 from other lung conditions. Several machine learning algorithms have been proposed to detect COVID-19 directly from CXR images with reasonably good accuracy on a data set that was randomly split into two subsets for training and test. Since these methods require a huge number of images for training, data augmentation with geometric transformation was applied to increase the number of images. It is highly likely that the images of the same patients are present in both the training and test sets resulting in higher accuracies in detection of COVID-19. It is, therefore, vital to assess the performance of COVID-19 detection algorithm on an independent data set with different degrees of the disease before being employed for clinical settings. On the other hand, machine learning techniques that depend on handcrafted features extraction and selection approaches can be trained with smaller data set. The features can also be analyzed separately for various lung conditions. Radiomics features are such kind of handcrafted features that represent heterogeneous appearance of the lung on CXR quantitatively and can be used to distinguish COVID-19 from other lung conditions. Based on this hypothesis, a machine learning based technique is proposed here that is trained on a set of suitable radiomics features (71 features) to detect COVID-19. It is found that Support Vector Machine (SVM) and Ensemble Bagging Model Trees (EBM) trained on these 71 radiomics features can distinguish between COVID-19 and other diseases with an overall sensitivity of 99.6% and 87.8% and specificity of 85% and 97% respectively. Though the performance is comparable for both methods, EBM is more robust across severity levels. Severity, in this case, was scored between 0 to 4 by two experienced radiologists for each lung segment of each CXR image represents the degree of severity of the disease. For the case of 0 severity, sensitivity and specificity of the EBM method are 91.7% and 100% respectively indicating that there are certain radiomics pattern that are not visibly distinguishable. Since the proposed method does not require any manual intervention (e.g., sample collection etc.), it can be integrated with any standard X-ray reporting system to be used as an efficient, cost-effective and rapid early diagnosis device. It can also be deployed in places where quick results of the COVID-19 test are required, e.g., airports, seaports, hospitals, health clinics, etc.

Download Full-text

Identifying Ransomware Actors in the Bitcoin Network

10.5121/csit.2021.111201 ◽

2021 ◽

Author(s):

Siddhartha Dalal ◽

Zihe Wang ◽

Siddhanth Sabharwal

Keyword(s):

Machine Learning ◽

Test Data ◽

Prediction Accuracy ◽

The Other ◽

Common Pattern ◽

Data Set ◽

Local Clustering ◽

Illegal Activities ◽

New Algorithms

Due to the pseudo-anonymity of the Bitcoin network, users can hide behind their bitcoin addresses that can be generated in unlimited quantity, on the fly, without any formal links between them. Thus, it is being used for payment transfer by the actors involved in ransomware and other illegal activities. The other activity we consider is related to gambling since gambling is often used for transferring illegal funds. The question addressed here is that given temporally limited graphs of Bitcoin transactions, to what extent can one identify common patterns associated with these fraudulent activities and apply themto find other ransomware actors. The problem is rather complex, given that thousands of addresses can belong to the same actor without any obvious links between them and any common pattern of behavior. The main contribution of this paper is to introduce and apply new algorithms for local clustering and supervised graph machine learning for identifying malicious actors. We show that very local subgraphsof the known such actors are sufficient to differentiate between ransomware, random and gambling actors with 85%prediction accuracy on the test data set.

Download Full-text

Customer Behavior Analysis and Revenue Prediction System

International Journal for Research in Engineering Application & Management ◽

10.35291/2454-9150.2020.0245 ◽

2020 ◽

pp. 05-08

Keyword(s):

Machine Learning ◽

Behavior Analysis ◽

Learning Algorithm ◽

Customer Behavior ◽

Gradient Boosting ◽

Machine Learning Algorithm ◽

Prediction System ◽

Customer Behaviour ◽

Boosting Method ◽

Potential Customers

In the era of e-commerce there are many organizations that have implemented customer behaviour analytics for their growth in business. It is a crucial challenge for the organizations in the e-commerce world to study and analyse the behaviour of the online buyers. The success of every organization is within the satisfaction of the customers they have and to gain new customers as well, and this is done by targeting the potential customers that can generate revenue to the organizations. RFM analysis is used to indicate recently buying customers, frequently buying customers, and huge spending customers. It is one of the best methods to segment organization’s revenue generating customers around other customers. Also 80/20 rule is implemented which focuses on the 20 percent of the customers that generate 80 percent of the revenue for the organization. The model is developed using Light GBM (Gradient Boosting Method) which is a machine learning algorithm.

Download Full-text

The Determinants of Capital Adequacy Ratio: The Case of the Vietnamese Banking System in the Period 2011-2015

VNU Journal of Science Economics and Business ◽

10.25073/2588-1108/vnueab.4070 ◽

2017 ◽

Vol 33 (2) ◽

Author(s):

Ngoc Anh Nguyen

Keyword(s):

Banking System ◽

Precious Metals ◽

Capital Adequacy ◽

The Other ◽

Total Asset ◽

Data Set ◽

Hand Size ◽

Net Interest Margin ◽

Capital Adequacy Ratio ◽

Positive Effect

The analysis of a data set of observation for Vietnamese banks in period from 2011 - 2015 shows how Capital Adequacy Ratio (CAR) is influenced by selected factors: asset of the bank SIZE, loans in total asset LOA, leverage LEV, net interest margin NIM, loans lost reserve LLR, Cash and Precious Metals in total asset LIQ. Results indicate based on data that NIM, LIQ have significant effect on CAR. On the other hand, SIZE and LEV do not appear to have significant effect on CAR. Variables NIM, LIQ have positive effect on CAR, while variables LLR and LOA are negatively related with CAR.

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text