Automatic Categorization of Reviews and Opinions of Internet E-Shopping Customers

E-shopping customers, blog authors, reviewers, and other web contributors can express their opinions of a purchased item, film, book, and so forth. Typically, various opinions are centered around one topic (e.g., a commodity, film, etc.). From the Business Intelligence viewpoint, such entries are very valuable; however, they are difficult to automatically process because they are in a natural language. Human beings can distinguish the various opinions. Because of the very large data volumes, could a machine do the same? The suggested method uses the machine-learning (ML) based approach to this classification problem, demonstrating via real-world data that a machine can learn from examples relatively well. The classification accuracy is better than 70%; it is not perfect because of typical problems associated with processing unstructured textual items in natural languages. The data characteristics and experimental results are shown.

Download Full-text

Long-Term Impacts of Fair Machine Learning

Ergonomics in Design The Quarterly of Human Factors Applications ◽

10.1177/1064804619884160 ◽

2019 ◽

Vol 28 (3) ◽

pp. 7-11

Author(s):

Xueru Zhang ◽

Mohammad Mahdi Khalili ◽

Mingyan Liu

Keyword(s):

Machine Learning ◽

Real World ◽

Human Beings ◽

Learning Models ◽

Real World Data ◽

World Data ◽

Fairness Concerns ◽

Fairness Constraints ◽

Machine Learning Models

Machine learning models developed from real-world data can inherit potential, preexisting bias in the dataset. When these models are used to inform decisions involving human beings, fairness concerns inevitably arise. Imposing certain fairness constraints in the training of models can be effective only if appropriate criteria are applied. However, a fairness criterion can be defined/assessed only when the interaction between the decisions and the underlying population is well understood. We introduce two feedback models describing how people react when receiving machine-aided decisions and illustrate that some commonly used fairness criteria can end with undesirable consequences while reinforcing discrimination.

Download Full-text

Combining Outcome-Based and Preference-Based Matching: A Constrained Priority Mechanism

Political Analysis ◽

10.1017/pan.2020.48 ◽

2021 ◽

pp. 1-24

Author(s):

Avidit Acharya ◽

Kirk Bansak ◽

Jens Hainmueller

Keyword(s):

Machine Learning ◽

Host Country ◽

Real World ◽

Test Scores ◽

Standardized Test ◽

Market Design ◽

Standardized Test Scores ◽

Real World Data ◽

Refugee Families ◽

Strategy Proof

Abstract We introduce a constrained priority mechanism that combines outcome-based matching from machine learning with preference-based allocation schemes common in market design. Using real-world data, we illustrate how our mechanism could be applied to the assignment of refugee families to host country locations, and kindergarteners to schools. Our mechanism allows a planner to first specify a threshold $\bar g$ for the minimum acceptable average outcome score that should be achieved by the assignment. In the refugee matching context, this score corresponds to the probability of employment, whereas in the student assignment context, it corresponds to standardized test scores. The mechanism is a priority mechanism that considers both outcomes and preferences by assigning agents (refugee families and students) based on their preferences, but subject to meeting the planner’s specified threshold. The mechanism is both strategy-proof and constrained efficient in that it always generates a matching that is not Pareto dominated by any other matching that respects the planner’s threshold.

Download Full-text

Estimation of the epidemiology of dementia and associated neuropsychiatric symptoms by applying machine learning to real-world data

Revista de Psiquiatría y Salud Mental ◽

10.1016/j.rpsm.2021.03.001 ◽

2021 ◽

Author(s):

Javier Mar ◽

Ania Gorostiza ◽

Arantzazu Arrospide ◽

Igor Larrañaga ◽

Ane Alberdi ◽

...

Keyword(s):

Machine Learning ◽

Real World ◽

Neuropsychiatric Symptoms ◽

Real World Data ◽

World Data ◽

Epidemiology Of Dementia

Download Full-text

Cellular Bandwidth Prediction for Highly Automated Driving - Evaluation of Machine Learning Approaches based on Real-World Data

Proceedings of the 4th International Conference on Vehicle Technology and Intelligent Transport Systems ◽

10.5220/0006692501210132 ◽

2018 ◽

Cited By ~ 8

Author(s):

Florian Jomrich ◽

Alexander Herzberger ◽

Tobias Meuser ◽

Björn Richerzhagen ◽

Ralf Steinmetz ◽

...

Keyword(s):

Machine Learning ◽

Real World ◽

Learning Approaches ◽

Automated Driving ◽

Real World Data ◽

World Data ◽

Highly Automated Driving

Download Full-text

A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

Business Systems Research Journal ◽

10.2478/bsrj-2014-0021 ◽

2014 ◽

Vol 5 (3) ◽

pp. 82-96 ◽

Cited By ~ 3

Author(s):

Marijana Zekić-Sušac ◽

Sanja Pfeifer ◽

Nataša Šarlija

Keyword(s):

Neural Network ◽

Machine Learning ◽

Classification Accuracy ◽

Classification Problem ◽

High Dimensional ◽

Nearest Neighbour ◽

Learning Methods ◽

Machine Learning Methods ◽

Dimensional Classification ◽

Artificial Neural

Abstract Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.

Download Full-text

Discovering Opinions from Customers' Unstructured Textual Reviews Written in Different Natural Languages

Hospitality, Travel, and Tourism ◽

10.4018/978-1-4666-6543-9.ch049 ◽

2014 ◽

pp. 834-859

Author(s):

Jan Žižka ◽

František Dařena

Keyword(s):

Decision Tree ◽

Natural Language ◽

Real World ◽

Large Data ◽

Main Question ◽

Internet Shopping ◽

Natural Languages ◽

On Line ◽

Multilingual Data ◽

Textual Form

Gaining new and keeping existing clients or customers can be well-supported by creating and monitoring feedbacks: “Are the customers satisfied? Can we improve our services?” One of possible feedbacks is allowing the customers to freely write their reviews using a simple textual form. The more reviews that are available, the better knowledge can be acquired and applied to improving the service. However, very large data generated by collecting the reviews has to be processed automatically as humans usually cannot manage it within an acceptable time. The main question is “Can a computer reveal an opinion core hidden in text reviews?” It is a challenging task because the text is written in a natural language. This chapter presents a method based on the automatic extraction of expressions that are significant for specifying a review attitude to a given topic. The significant expressions are composed using significant words revealed in the documents. The significant words are selected by a decision-tree generator based on the entropy minimization. Words included in branches represent kernels of the significant expressions. The full expressions are composed of the significant words and words surrounding them in the original documents. The results are here demonstrated using large real-world multilingual data representing customers' opinions concerning hotel accommodation booked on-line, and Internet shopping. Knowledge discovered in the reviews may subsequently serve for various marketing tasks.

Download Full-text

Interactive Learning with Mutual Explanations in Relational Domains

10.1093/oso/9780198862536.003.0017 ◽

2021 ◽

pp. 338-354

Author(s):

Ute Schmid

Keyword(s):

Machine Learning ◽

Real World ◽

Inductive Logic Programming ◽

Interactive Learning ◽

Inductive Logic ◽

Learning System ◽

Real World Data ◽

Relational Domains ◽

Learning Research ◽

Applications Of Machine Learning

With the growing number of applications of machine learning in complex real-world domains machine learning research has to meet new requirements to deal with the imperfections of real world data and the legal as well as ethical obligations to make classifier decisions transparent and comprehensible. In this contribution, arguments for interpretable and interactive approaches to machine learning are presented. It is argued that visual explanations are often not expressive enough to grasp critical information which relies on relations between different aspects or sub-concepts. Consequently, inductive logic programming (ILP) and the generation of verbal explanations from Prolog rules is advocated. Interactive learning in the context of ILP is illustrated with the Dare2Del system which helps users to manage their digital clutter. It is shown that verbal explanations overcome the explanatory one-way street from AI system to user. Interactive learning with mutual explanations allows the learning system to take into account not only class corrections but also corrections of explanations to guide learning. We propose mutual explanations as a building-block for human-like computing and an important ingredient for human AI partnership.

Download Full-text

Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System

Sensors ◽

10.3390/s19102266 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2266 ◽

Cited By ~ 1

Author(s):

Nikolaos Sideris ◽

Georgios Bardis ◽

Athanasios Voulodimos ◽

Georgios Miaoulis ◽

Djamchid Ghazanfarpour

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Random Forests ◽

Real World ◽

Performance Metrics ◽

World City ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Real World Data

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

Download Full-text