scholarly journals The Classification of Customers’ Sentiment using Data Mining Approaches

2019 ◽  
Vol IV (IV) ◽  
pp. 146-156
Author(s):  
Dost Muhammad Khan ◽  
Tariq Aziz Rao ◽  
Faisal Shahzad

Data mining is a procedure of extracting the requisite information from unprocessed records by using certain methodologies and techniques. Data having sentiments of customers is of utmost importance for managers and decision-makers who intend to monitor the progress, to maintain the quality of their products or services and to observe the latest market trends for business support. Billions of customers are using micro-blogging websites and social media for sharing their opinions about different topics on daily basis. Therefore, it has become a source of acquiring information but to identify a particular feature of a product is still an issue as the information retrieves from varied sources. We proposed a framework for data acquisition, preprocessing, feature extraction and used three supervised machine-learning algorithms for classification of customers’ sentiments. The proposed framework also tested to evaluate the system’s performance. Our proposed methodology will be helpful for researchers, service providers, and decisionmakers.

Diagnostics ◽  
2020 ◽  
Vol 10 (3) ◽  
pp. 162 ◽  
Author(s):  
Julieta G. Rodríguez-Ruiz ◽  
Carlos E. Galván-Tejada ◽  
Laura A. Zanella-Calzada ◽  
José M. Celaya-Padilla ◽  
Jorge I. Galván-Tejada ◽  
...  

Major Depression Disease has been increasing in the last few years, affecting around 7 percent of the world population, but nowadays techniques to diagnose it are outdated and inefficient. Motor activity data in the last decade is presented as a better way to diagnose, treat and monitor patients suffering from this illness, this is achieved through the use of machine learning algorithms. Disturbances in the circadian rhythm of mental illness patients increase the effectiveness of the data mining process. In this paper, a comparison of motor activity data from the night, day and full day is carried out through a data mining process using the Random Forest classifier to identified depressive and non-depressive episodes. Data from Depressjon dataset is split into three different subsets and 24 features in time and frequency domain are extracted to select the best model to be used in the classification of depression episodes. The results showed that the best dataset and model to realize the classification of depressive episodes is the night motor activity data with 99.37% of sensitivity and 99.91% of specificity.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1701
Author(s):  
Theodor Panagiotakopoulos ◽  
Sotiris Kotsiantis ◽  
Georgios Kostopoulos ◽  
Omiros Iatrellis ◽  
Achilles Kameas

Over recent years, massive open online courses (MOOCs) have gained increasing popularity in the field of online education. Students with different needs and learning specificities are able to attend a wide range of specialized online courses offered by universities and educational institutions. As a result, large amounts of data regarding students’ demographic characteristics, activity patterns, and learning performances are generated and stored in institutional repositories on a daily basis. Unfortunately, a key issue in MOOCs is low completion rates, which directly affect student success. Therefore, it is of utmost importance for educational institutions and faculty members to find more effective practices and reduce non-completer ratios. In this context, the main purpose of the present study is to employ a plethora of state-of-the-art supervised machine learning algorithms for predicting student dropout in a MOOC for smart city professionals at an early stage. The experimental results show that accuracy exceeds 96% based on data collected during the first week of the course, thus enabling effective intervention strategies and support actions.


Author(s):  
K. G. Yashchenkov ◽  
K. S. Dymko ◽  
N. O. Ukhanov ◽  
A. V. Khnykin

The issues of using data analysis methods to find and correct errors in the reports issued by meteorologists are considered. The features of processing various types of meteorological messages are studied. The advantages and disadvantages of existing methods of classification of text information are considered. The classification methods are compared in order to identify the optimal method that will be used in the developed algorithm for analyzing meteorological messages. The prospects of using each of the methods in the developed algorithm are described. An algorithm for processing the source data is proposed, which consists in using syntactic and logical analysis to preclean the data from various kinds of noise and determine format errors for each type of message. After preliminary preparation the classification method correlates the received set of message characteristics with the previously trained model to determine the error of the current weather report and output the corresponding message to the operator in real time. The software tools used in the algorithm development and implementation processes are described. A complete description of the process of processing a meteorological message is presented from the moment when the message is entered in a text editor until the message is sent to the international weather message exchange service. The developed software is demonstrated, in which the proposed algorithm is implemented, which allows to improve the quality of messages and, as a result, the quality of meteorological forecasts. The results of the implementation of the new algorithm are described by comparing the number of messages containing various types of errors before the implementation of the algorithm and after the implementation.


Proceedings ◽  
2018 ◽  
Vol 2 (19) ◽  
pp. 1217
Author(s):  
Teresa Cristóbal ◽  
Gabino Padrón ◽  
Alexis Quesada ◽  
Francisco Alayón ◽  
Gabriel de Blasio ◽  
...  

Travel Time plays a key role in the quality of service in road-based mass transit systems. In this type of mass transit systems, travel time of a public transport line is the sum of the dwell time at each bus stop and the nonstop running time between pair of consecutives bus stops of the line. The aim of the methodology presented in this paper is to obtain the behavior patterns of these times. Knowing these patterns, it would be possible to reduce travel time or its variability to make more reliable travel time predictions. To achieve this goal, the methodology uses data related to check-in and check-out movements of the passengers and vehicles GPS positions, processing this data by Data Mining techniques. To illustrate the validity of the proposal, the results obtained in a case of use in presented.


2013 ◽  
Vol 5 (2) ◽  
pp. 136-143 ◽  
Author(s):  
Astha Mehra ◽  
Sanjay Kumar Dubey

In today’s world data is produced every day at a phenomenal rate and we are required to store this ever growing data on almost daily basis. Even though our ability to store this huge data has grown but the problem lies when users expect sophisticated information from this data. This can be achieved by uncovering the hidden information from the raw data, which is the purpose of data mining.  Data mining or knowledge discovery is the computer-assisted process of digging through and analyzing enormous set of data and then extracting the meaning out of it. The raw and unlabeled data present in large databases can be classified initially in an unsupervised manner by making use of cluster analysis. Clustering analysis is the process of finding the groups of objects such that the objects in a group will be similar to one another and dissimilar from the objects in other groups. These groups are known as clusters.  In other words, clustering is the process of organizing the data objects in groups whose members have some similarity among them. Some of the applications of clustering are in marketing -finding group of customers with similar behavior, biology- classification of plants and animals given their features, data analysis, and earthquake study -observe earthquake epicenter to identify dangerous zones, WWW -document classification, etc. The results or outcome and efficiency of clustering process is generally identified though various clustering algorithms. The aim of this research paper is to compare two important clustering algorithms namely centroid based K-means and X-means. The performance of the algorithms is evaluated in different program execution on the same input dataset. The performance of these algorithms is analyzed and compared on the basis of quality of clustering outputs, number of iterations and cut-off factors.


10.2196/20995 ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. e20995
Author(s):  
Debbie Rankin ◽  
Michaela Black ◽  
Bronac Flanagan ◽  
Catherine F Hughes ◽  
Adrian Moore ◽  
...  

Background Machine learning techniques, specifically classification algorithms, may be effective to help understand key health, nutritional, and environmental factors associated with cognitive function in aging populations. Objective This study aims to use classification techniques to identify the key patient predictors that are considered most important in the classification of poorer cognitive performance, which is an early risk factor for dementia. Methods Data were used from the Trinity-Ulster and Department of Agriculture study, which included detailed information on sociodemographic, clinical, biochemical, nutritional, and lifestyle factors in 5186 older adults recruited from the Republic of Ireland and Northern Ireland, a proportion of whom (987/5186, 19.03%) were followed up 5-7 years later for reassessment. Cognitive function at both time points was assessed using a battery of tests, including the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS), with a score <70 classed as poorer cognitive performance. This study trained 3 classifiers—decision trees, Naïve Bayes, and random forests—to classify the RBANS score and to identify key health, nutritional, and environmental predictors of cognitive performance and cognitive decline over the follow-up period. It assessed their performance, taking note of the variables that were deemed important for the optimized classifiers for their computational diagnostics. Results In the classification of a low RBANS score (<70), our models performed well (F1 score range 0.73-0.93), all highlighting the individual’s score from the Timed Up and Go (TUG) test, the age at which the participant stopped education, and whether or not the participant’s family reported memory concerns to be of key importance. The classification models performed well in classifying a greater rate of decline in the RBANS score (F1 score range 0.66-0.85), also indicating the TUG score to be of key importance, followed by blood indicators: plasma homocysteine, vitamin B6 biomarker (plasma pyridoxal-5-phosphate), and glycated hemoglobin. Conclusions The results suggest that it may be possible for a health care professional to make an initial evaluation, with a high level of confidence, of the potential for cognitive dysfunction using only a few short, noninvasive questions, thus providing a quick, efficient, and noninvasive way to help them decide whether or not a patient requires a full cognitive evaluation. This approach has the potential benefits of making time and cost savings for health service providers and avoiding stress created through unnecessary cognitive assessments in low-risk patients.


Author(s):  
Roma Sahani ◽  
Shatabdinalini ◽  
Chinmayee Rout ◽  
J. Chandrakanta Badajena ◽  
Ajay Kumar Jena ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document