Data Mining Approach for Customer Segmentation

Anshumala Jaiswal

doi:10.22214/ijraset.2021.35140

Data Mining Approach for Customer Segmentation

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35140 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 1008-1012

Author(s):

Anshumala Jaiswal

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Customer Segmentation ◽

Machine Learning Algorithms ◽

Clustering Methods ◽

Clustering Technique ◽

Data Mining Approach ◽

Huge Data ◽

Or Groups ◽

Data Points

In Marketing world, rapidly increasing competition makes it difficult to sustain in this field, marketers have to take decisions that satisfy their customers. Growth of an organization is highly depended on right decisions by the organization. For that, they have to collect deep knowledge about their customer's needs. Substantial amount of data of customers is collected daily. To manage such a huge data is not a piece of cake. An idea is to segment customers in different groups and go through each group and find the potential group among pool of customers. If it is done manually, it will require lot of human efforts and also consume lot of time. For reducing the human efforts, machine learning plays an important role. One can find various patterns which is used to analyze customers database using machine learning algorithms. Using clustering technique, customers can be segmented on the basis of some similarities. One of the best procedures for clustering technique is by using K-means algorithm. The k-means clustering algorithm is one of the widely used data clustering methods where the datasets having “n” data points are partitioned into “k” groups or cluster [1].in this paper. K is number of clusters or groups or segments and elbow method is used for determining value of K.

Download Full-text

Smooth input preparation for quantum and quantum-inspired machine learning

Quantum Machine Intelligence ◽

10.1007/s42484-021-00045-x ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Zhikuan Zhao ◽

Jack K. Fitzsimons ◽

Patrick Rebentrost ◽

Vedran Dunjko ◽

Joseph F. Fitzsimons

Keyword(s):

Machine Learning ◽

Quantum Algorithms ◽

Machine Learning Algorithms ◽

Low Rank ◽

Smoothed Analysis ◽

Machine Learning Applications ◽

Data Points ◽

Input Model ◽

The Cost ◽

Input Perturbation

AbstractMachine learning has recently emerged as a fruitful area for finding potential quantum computational advantage. Many of the quantum-enhanced machine learning algorithms critically hinge upon the ability to efficiently produce states proportional to high-dimensional data points stored in a quantum accessible memory. Even given query access to exponentially many entries stored in a database, the construction of which is considered a one-off overhead, it has been argued that the cost of preparing such amplitude-encoded states may offset any exponential quantum advantage. Here we prove using smoothed analysis that if the data analysis algorithm is robust against small entry-wise input perturbation, state preparation can always be achieved with constant queries. This criterion is typically satisfied in realistic machine learning applications, where input data is subjective to moderate noise. Our results are equally applicable to the recent seminal progress in quantum-inspired algorithms, where specially constructed databases suffice for polylogarithmic classical algorithm in low-rank cases. The consequence of our finding is that for the purpose of practical machine learning, polylogarithmic processing time is possible under a general and flexible input model with quantum algorithms or quantum-inspired classical algorithms in the low-rank cases.

Download Full-text

Scalable hierarchical clustering by composition rank vector encoding and tree structure

10.1101/2020.04.12.038026 ◽

2020 ◽

Author(s):

Xiao Lai ◽

Pu Tian

Keyword(s):

Machine Learning ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

High Dimensional Data ◽

Machine Learning Algorithms ◽

Tree Structure ◽

Supervised Machine Learning ◽

High Dimensional ◽

Rank Vector ◽

Nonlinear Correlations

AbstractSupervised machine learning, especially deep learning based on a wide variety of neural network architectures, have contributed tremendously to fields such as marketing, computer vision and natural language processing. However, development of un-supervised machine learning algorithms has been a bottleneck of artificial intelligence. Clustering is a fundamental unsupervised task in many different subjects. Unfortunately, no present algorithm is satisfactory for clustering of high dimensional data with strong nonlinear correlations. In this work, we propose a simple and highly efficient hierarchical clustering algorithm based on encoding by composition rank vectors and tree structure, and demonstrate its utility with clustering of protein structural domains. No record comparison, which is an expensive and essential common step to all present clustering algorithms, is involved. Consequently, it achieves linear time and space computational complexity hierarchical clustering, thus applicable to arbitrarily large datasets. The key factor in this algorithm is definition of composition, which is dependent upon physical nature of target data and therefore need to be constructed case by case. Nonetheless, the algorithm is general and applicable to any high dimensional data with strong nonlinear correlations. We hope this algorithm to inspire a rich research field of encoding based clustering well beyond composition rank vector trees.

Download Full-text

A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.2.3811 ◽

2020 ◽

Vol 15 (2) ◽

Author(s):

Jia Luo ◽

Dongwen Yu ◽

Zong Dai

Keyword(s):

Machine Learning ◽

Fuzzy Clustering ◽

Latent Dirichlet Allocation ◽

Learning Model ◽

Machine Learning Algorithms ◽

Text Data ◽

Huge Data ◽

Machine Learning Model ◽

N Gram ◽

Dirichlet Allocation

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

Download Full-text

Predicting Obstetric Disease With Machine Learning Applied to Patient-Reported Data (Preprint)

10.2196/preprints.11766 ◽

2018 ◽

Cited By ~ 1

Author(s):

Danielle Bradley ◽

Erin Landau ◽

Adam Wolfberg ◽

Alex Baron

Keyword(s):

Machine Learning ◽

At Risk ◽

Mobile Apps ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Obstetric Outcomes ◽

Patient Reported ◽

Data Points ◽

Reported Data

BACKGROUND The rise of highly engaging digital health mobile apps over the past few years has created repositories containing billions of patient-reported data points that have the potential to inform clinical research and advance medicine. OBJECTIVE To determine if self-reported data could be leveraged to create machine learning algorithms to predict the presence of, or risk for, obstetric outcomes and related conditions. METHODS More than 10 million women have downloaded Ovia Health’s three mobile apps (Ovia Fertility, Ovia Pregnancy, and Ovia Parenting). Data points logged by app users can include information about menstrual cycle, health history, current health status, nutrition habits, exercise activity, symptoms, or moods. Machine learning algorithms were developed using supervised machine learning methodologies, specifically, Gradient Boosting Decision Tree algorithms. Each algorithm was developed and trained using anywhere from 385 to 5770 features and data from 77,621 to 121,740 app users. RESULTS Algorithms were created to detect the risk of developing preeclampsia, gestational diabetes, and preterm delivery, as well as to identify the presence of existing preeclampsia. The positive predictive value (PPV) was set to 0.75 for all of the models, as this was the threshold where the researchers felt a clinical response—additional screening or testing—would be reasonable, due to the likelihood of a positive outcome. Sensitivity ranged from 24% to 75% across all models. When PPV was adjusted from 0.75 to 0.52, the sensitivity of the preeclampsia prediction algorithm rose from 24% to 85%. When PPV was adjusted from 0.75 to 0.65, the sensitivity of the preeclampsia detection or diagnostic algorithm increased from 37% to 79%. CONCLUSIONS Algorithms based on patient-reported data can predict serious obstetric conditions with accuracy levels sufficient to guide clinical screening by health care providers and health plans. Further research is needed to determine whether such an approach can improve outcomes for at-risk patients and reduce the cost of screening those not at risk. Presenting the results of these models to patients themselves could also provide important insight into otherwise unknown health risks.

Download Full-text

Data Analytics for Monitoring the Satisfactory Parameters of Airline Passengers using Machine Learning Algorithms in Python

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8677.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1231-1235

Keyword(s):

Machine Learning ◽

Data Analytics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Complex Information ◽

Huge Data ◽

Gradient Boosting Machine ◽

Airline Passengers ◽

Effective Representation

An effective representation by machine learning algorithms is to obtain the results especially in Big Data, there are numerous applications can produce outcome, whereas a Random Forest Algorithm (RF) Gradient Boosting Machine (GBM), Decision tree (DT) in Python will able to give the higher accuracy in regard with classifying various parameters of Airliner Passengers satisfactory levels. The complex information of airline passengers has provided huge data for interpretation through different parameters of satisfaction that contains large information in quantity wise. An algorithm has to support in classifying these data’s with accuracies. As a result some of the methods may provide less precision and there is an opportunity of information cancellation and furthermore information missing utilizing conventional techniques. Subsequently RF and GBM used to conquer the unpredictability and exactness about the information provided. The aim of this study is to identify an Algorithm which is suitable for classifying the satisfactory level of airline passengers with data analytics using python by knowing the output. The optimization and Implementation of independent variables by training and testing for accuracy in python platform determined the variation between the each parameters and also recognized RF and GBM as a better algorithm in comparison with other classifying algorithms.

Download Full-text

Comparison of Machine Learning Algorithms in the Interpolation and Extrapolation of Flame Describing Functions

Volume 4B: Combustion, Fuels, and Emissions ◽

10.1115/gt2019-91319 ◽

2019 ◽

Author(s):

Michael McCartney ◽

Matthias Haeringer ◽

Wolfgang Polifke

Keyword(s):

Machine Learning ◽

Gaussian Processes ◽

Spline Interpolation ◽

Learning Algorithms ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Test Time ◽

Minimal Amount ◽

Data Points ◽

The Impact

Abstract This paper examines and compares commonly used Machine Learning algorithms in their performance in interpolation and extrapolation of FDFs, based on experimental and simulation data. Algorithm performance is evaluated by interpolating and extrapolating FDFs and then the impact of errors on the limit cycle amplitudes are evaluated using the xFDF framework. The best algorithms in interpolation and extrapolation were found to be the widely used cubic spline interpolation, as well as the Gaussian Processes regressor. The data itself was found to be an important factor in defining the predictive performance of a model, therefore a method of optimally selecting data points at test time using Gaussian Processes was demonstrated. The aim of this is to allow a minimal amount of data points to be collected while still providing enough information to model the FDF accurately. The extrapolation performance was shown to decay very quickly with distance from the domain and so emphasis should be put on selecting measurement points in order to expand the covered domain. Gaussian Processes also give an indication of confidence on its predictions and is used to carry out uncertainty quantification, in order to understand model sensitivities. This was demonstrated through application to the xFDF framework.

Download Full-text

A Novel Semi-Supervised Fuzzy C-Means Clustering Algorithm Using Multiple Fuzzification Coefficients

Algorithms ◽

10.3390/a14090258 ◽

2021 ◽

Vol 14 (9) ◽

pp. 258

Author(s):

Tran Dinh Khang ◽

Manh-Kien Tran ◽

Michael Fowler

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Machine Learning Techniques ◽

Unsupervised Machine Learning ◽

Practical Applications ◽

Fuzzy C Means ◽

Learning Techniques ◽

Fuzzy C Means Clustering ◽

Data Points ◽

Data Elements

Clustering is an unsupervised machine learning method with many practical applications that has gathered extensive research interest. It is a technique of dividing data elements into clusters such that elements in the same cluster are similar. Clustering belongs to the group of unsupervised machine learning techniques, meaning that there is no information about the labels of the elements. However, when knowledge of data points is known in advance, it will be beneficial to use a semi-supervised algorithm. Within many clustering techniques available, fuzzy C-means clustering (FCM) is a common one. To make the FCM algorithm a semi-supervised method, it was proposed in the literature to use an auxiliary matrix to adjust the membership grade of the elements to force them into certain clusters during the computation. In this study, instead of using the auxiliary matrix, we proposed to use multiple fuzzification coefficients to implement the semi-supervision component. After deriving the proposed semi-supervised fuzzy C-means clustering algorithm with multiple fuzzification coefficients (sSMC-FCM), we demonstrated the convergence of the algorithm and validated the efficiency of the method through a numerical example.

Download Full-text

Using Machine Learning Algorithms to Recognize Shuttlecock Movements

Wireless Communications and Mobile Computing ◽

10.1155/2021/9976306 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Wei Wang

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Learning Algorithm ◽

Learning Algorithms ◽

Recognition Rate ◽

Machine Learning Algorithms ◽

Research Strategies ◽

Image Capture ◽

Motion Recognition ◽

Motion Characteristics

Shuttlecock is an excellent traditional national sport in China. Because of its simplicity, convenience, and fun, it is loved by the broad masses of people, especially teenagers and children. The development of shuttlecock sports into a confrontational event is not long, and it takes a period of research to master the tactics and strategies of shuttlecock sports. Based on this, this article proposes the use of machine learning algorithms to recognize the movement of shuttlecock movements, aiming to provide more theoretical and technical support for shuttlecock competitions by identifying features through actions with the assistance of technical algorithms. This paper uses literature research methods, model methods, comparative analysis methods, and other methods to deeply study the motion characteristics of shuttlecock motion, the key algorithms of machine learning algorithms, and other theories and construct the shuttlecock motion recognition based on multiview clustering algorithm. The model analyzes the robustness and accuracy of the machine learning algorithm and other algorithms, such as a variety of performance comparisons, and the results of the shuttlecock motion recognition image. For the key movements of shuttlecock movement, disk, stretch, hook, wipe, knock, and abduction, the algorithm proposed in this paper has a good movement recognition rate, which can reach 91.2%. Although several similar actions can be recognized well, the average recognition accuracy rate can exceed 75%, and even through continuous image capture, the number of occurrences of the action can be automatically analyzed, which is beneficial to athletes. And the coach can better analyze tactics and research strategies.

Download Full-text

A Machine Learning Approach to Predict Hypotensive Events in ICU Settings

10.1101/794768 ◽

2019 ◽

Author(s):

Mina Chookhachizadeh Moghadam ◽

Ehsan Masoumi ◽

Nader Bagherzadeh ◽

Davinder Ramsingh ◽

Guann-Pyng Li ◽

...

Keyword(s):

Machine Learning ◽

Real Time ◽

Evaluation Method ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Physiological Status ◽

Time Prediction ◽

Evaluation Approach ◽

High Positive Predictive Value ◽

Data Points

AbstractPurposePredicting hypotension well in advance provides physicians with enough time to respond with proper therapeutic measures. However, the real-time prediction of hypotension with high positive predictive value (PPV) is a challenge due to the dynamic changes in patients’ physiological status under the drug administration which is limiting the amount of useful data available for the algorithm.MethodsTo mimic real-time monitoring, we developed a machine learning algorithm that uses most of the available data points from patients’ record to train and test the algorithm. The algorithm predicts hypotension up to 30 minutes in advance based on only 5 minutes of patient’s physiological history. A novel evaluation method is proposed to assess the algorithm performance as a function of time at every timestamp within 30 minutes prior to hypotension. This evaluation approach provides statistical tools to find the best possible prediction window.ResultsDuring 181,000 minutes of monitoring of about 400 patients, the algorithm demonstrated 94% accuracy, 85% sensitivity and 96% specificity in predicting hypotension within 30 minutes of the events. A high PPV of 81% obtained and the algorithm predicted 80% of the events 25 minutes prior to their onsets. It was shown that choosing a classification threshold that maximizes the F1 score during the training phase contributes to a high PPV and sensitivity.ConclusionThis study reveals the promising potential of the machine learning algorithms in real-time prediction of hypotensive events in ICU setting based on short-term physiological history.

Download Full-text

COVID-19 effect on supply and demand of essential commodities using unsupervised learning method

10.21203/rs.3.rs-110010/v1 ◽

2020 ◽

Author(s):

P. Anitha ◽

Malini M. Patil ◽

Rekha B Venkatapur

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Clustering Algorithm ◽

Supply And Demand ◽

Machine Learning Algorithms ◽

Learning Method ◽

The People ◽

Future Data ◽

Predicted Values ◽

Yearly Data

Abstract The affliction caused by the Covid-19 Pandemic is diverse from other disasters seen so far. Supply chain industries are facing unique challenges in fulfilling the essential needs of the people. The objective of the paper is to analyse the supply and demand of essentials during pre-pandemic and post-pandemic lockdowns using machine learning algorithms. This helps for supply chain industries in forecasting and managing the supply and demand of essential stocks for the future. Data is analyzed using prediction algorithms to check the actual and predicted values. The clustering algorithm along with rolling mean is used for half-yearly data of 2019 and 2020 to identify the sales of different categories of essential commodities. This paper aims at applying intelligence in predicting various categories of sales by providing timely information for B2B Industries during the time of disasters.

Download Full-text