Automatic Categorization of Reviews and Opinions of Internet E-Shopping Customers

Author(s):  
Jan Žižka ◽  
Vadim Rukavitsyn

E-shopping customers, blog authors, reviewers, and other web contributors can express their opinions of a purchased item, film, book, and so forth. Typically, various opinions are centered around one topic (e.g., a commodity, film, etc.). From the Business Intelligence viewpoint, such entries are very valuable; however, they are difficult to automatically process because they are in a natural language. Human beings can distinguish the various opinions. Because of the very large data volumes, could a machine do the same? The suggested method uses the machine-learning (ML) based approach to this classification problem, demonstrating via real-world data that a machine can learn from examples relatively well. The classification accuracy is better than 70%; it is not perfect because of typical problems associated with processing unstructured textual items in natural languages. The data characteristics and experimental results are shown.

2011 ◽  
Vol 1 (2) ◽  
pp. 68-77 ◽  
Author(s):  
Jan Žižka ◽  
Vadim Rukavitsyn

E-shopping customers, blog authors, reviewers, and other web contributors can express their opinions of a purchased item, film, book, and so forth. Typically, various opinions are centered around one topic (e.g., a commodity, film, etc.). From the Business Intelligence viewpoint, such entries are very valuable; however, they are difficult to automatically process because they are in a natural language. Human beings can distinguish the various opinions. Because of the very large data volumes, could a machine do the same? The suggested method uses the machine-learning (ML) based approach to this classification problem, demonstrating via real-world data that a machine can learn from examples relatively well. The classification accuracy is better than 70%; it is not perfect because of typical problems associated with processing unstructured textual items in natural languages. The data characteristics and experimental results are shown.


Author(s):  
Xueru Zhang ◽  
Mohammad Mahdi Khalili ◽  
Mingyan Liu

Machine learning models developed from real-world data can inherit potential, preexisting bias in the dataset. When these models are used to inform decisions involving human beings, fairness concerns inevitably arise. Imposing certain fairness constraints in the training of models can be effective only if appropriate criteria are applied. However, a fairness criterion can be defined/assessed only when the interaction between the decisions and the underlying population is well understood. We introduce two feedback models describing how people react when receiving machine-aided decisions and illustrate that some commonly used fairness criteria can end with undesirable consequences while reinforcing discrimination.


2021 ◽  
pp. 1-24
Author(s):  
Avidit Acharya ◽  
Kirk Bansak ◽  
Jens Hainmueller

Abstract We introduce a constrained priority mechanism that combines outcome-based matching from machine learning with preference-based allocation schemes common in market design. Using real-world data, we illustrate how our mechanism could be applied to the assignment of refugee families to host country locations, and kindergarteners to schools. Our mechanism allows a planner to first specify a threshold $\bar g$ for the minimum acceptable average outcome score that should be achieved by the assignment. In the refugee matching context, this score corresponds to the probability of employment, whereas in the student assignment context, it corresponds to standardized test scores. The mechanism is a priority mechanism that considers both outcomes and preferences by assigning agents (refugee families and students) based on their preferences, but subject to meeting the planner’s specified threshold. The mechanism is both strategy-proof and constrained efficient in that it always generates a matching that is not Pareto dominated by any other matching that respects the planner’s threshold.


2014 ◽  
Vol 5 (3) ◽  
pp. 82-96 ◽  
Author(s):  
Marijana Zekić-Sušac ◽  
Sanja Pfeifer ◽  
Nataša Šarlija

Abstract Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.


Author(s):  
Jan Žižka ◽  
František Dařena

Gaining new and keeping existing clients or customers can be well-supported by creating and monitoring feedbacks: “Are the customers satisfied? Can we improve our services?” One of possible feedbacks is allowing the customers to freely write their reviews using a simple textual form. The more reviews that are available, the better knowledge can be acquired and applied to improving the service. However, very large data generated by collecting the reviews has to be processed automatically as humans usually cannot manage it within an acceptable time. The main question is “Can a computer reveal an opinion core hidden in text reviews?” It is a challenging task because the text is written in a natural language. This chapter presents a method based on the automatic extraction of expressions that are significant for specifying a review attitude to a given topic. The significant expressions are composed using significant words revealed in the documents. The significant words are selected by a decision-tree generator based on the entropy minimization. Words included in branches represent kernels of the significant expressions. The full expressions are composed of the significant words and words surrounding them in the original documents. The results are here demonstrated using large real-world multilingual data representing customers' opinions concerning hotel accommodation booked on-line, and Internet shopping. Knowledge discovered in the reviews may subsequently serve for various marketing tasks.


2021 ◽  
pp. 338-354
Author(s):  
Ute Schmid

With the growing number of applications of machine learning in complex real-world domains machine learning research has to meet new requirements to deal with the imperfections of real world data and the legal as well as ethical obligations to make classifier decisions transparent and comprehensible. In this contribution, arguments for interpretable and interactive approaches to machine learning are presented. It is argued that visual explanations are often not expressive enough to grasp critical information which relies on relations between different aspects or sub-concepts. Consequently, inductive logic programming (ILP) and the generation of verbal explanations from Prolog rules is advocated. Interactive learning in the context of ILP is illustrated with the Dare2Del system which helps users to manage their digital clutter. It is shown that verbal explanations overcome the explanatory one-way street from AI system to user. Interactive learning with mutual explanations allows the learning system to take into account not only class corrections but also corrections of explanations to guide learning. We propose mutual explanations as a building-block for human-like computing and an important ingredient for human AI partnership.


Sensors ◽  
2019 ◽  
Vol 19 (10) ◽  
pp. 2266 ◽  
Author(s):  
Nikolaos Sideris ◽  
Georgios Bardis ◽  
Athanasios Voulodimos ◽  
Georgios Miaoulis ◽  
Djamchid Ghazanfarpour

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).


Sign in / Sign up

Export Citation Format

Share Document