INVESTIGATING INTER-RATER RELIABILITY OF QUALITATIVE TEXT ANNOTATIONS IN MACHINE LEARNING DATASETS

Proceedings of the Design Society: DESIGN Conference ◽

10.1017/dsd.2020.153 ◽

2020 ◽

Vol 1 ◽

pp. 21-30

Author(s):

N. El Dehaibi ◽

E. F. MacDonald

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

User Generated Content ◽

Product Reviews ◽

Learning Models ◽

Reliability Measures ◽

Rater Reliability ◽

Machine Learning Models

AbstractAn important step when designers use machine learning models is annotating user generated content. In this study we investigate inter-rater reliability measures of qualitative annotations for supervised learning. We work with previously annotated product reviews from Amazon where phrases related to sustainability are highlighted. We measure inter-rater reliability of the annotations using four variations of Krippendorff's U-alpha. Based on the results we propose suggestions to designers on measuring reliability of qualitative annotations for machine learning datasets.

Download Full-text

TOPICAL ISSUES OF APPLICATION OF MACHINE LEARNING METHODS IN ECONOMY

Инновационные аспекты развития науки и техники. Сборник статей VIII Международной научно-практической конференции: сборник статей, [электронное издание сетевого распространения] / Под ред. Н.В. Емельянова. – М.: “КДУ”, “Добросвет”, 2021. – 149 с. ◽

10.31453/kdu.ru.978-5-7913-1176-4-2021-28-33 ◽

2021 ◽

Author(s):

Natalia Pavlovna Persteneva ◽

◽

Darya Dmitrievn Skryleva ◽

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Learning Model ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Model ◽

Popular Classes ◽

Machine Learning Models

The article discusses machine learning methods. Using the example of two popular classes: supervised learning and unsupervised learning. Variants of the main types of machine learning models for each method are presented. A generalized algorithm for building any machine learning model is formed.

Download Full-text

Predicting the e-Signing Likelihood of Loan Using Machine Learning Models Combining Clustering with Supervised Learning

Advances on Smart and Soft Computing - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-16-5559-3_11 ◽

2021 ◽

pp. 121-131

Author(s):

Abdellatif Aattouf ◽

Said El Kafhali ◽

Youssef Saadi

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning

Chemical Science ◽

10.1039/c9sc00616h ◽

2019 ◽

Vol 10 (35) ◽

pp. 8154-8163 ◽

Cited By ~ 14

Author(s):

Yao Zhang ◽

Alpha A. Lee

Keyword(s):

Machine Learning ◽

Active Learning ◽

Supervised Learning ◽

Molecular Properties ◽

Learning Models ◽

Molecular Properties Prediction ◽

Design Experiments ◽

Machine Learning Models

We report a statistically principled method to quantify the uncertainty of machine learning models for molecular properties prediction. We show that this uncertainty estimate can be used to judiciously design experiments.

Download Full-text

ML-CB: Machine Learning Canvas Block

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0056 ◽

2021 ◽

Vol 2021 (3) ◽

pp. 453-473

Author(s):

Nathan Reitinger ◽

Michelle L. Mazurek

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Semantic Representation ◽

Source Code ◽

Online Privacy ◽

Learning Approach ◽

Learning Models ◽

One Step ◽

The Web ◽

Machine Learning Models

Abstract With the aim of increasing online privacy, we present a novel, machine-learning based approach to blocking one of the three main ways website visitors are tracked online—canvas fingerprinting. Because the act of canvas fingerprinting uses, at its core, a JavaScript program, and because many of these programs are reused across the web, we are able to fit several machine learning models around a semantic representation of a potentially offending program, achieving accurate and robust classifiers. Our supervised learning approach is trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Classification leverages our key insight that the images drawn by canvas fingerprinting programs have a facially distinct appearance, allowing us to manually classify files based on the images drawn; we take this approach one step further and train our classifiers not on the malleable images themselves, but on the more-difficult-to-change, underlying source code generating the images. As a result, ML-CB allows for more accurate tracker blocking.

Download Full-text

Comparative Analysis of Machine Learning Models for Day-Ahead Photovoltaic Power Production Forecasting

Energies ◽

10.3390/en14041081 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1081

Author(s):

Spyros Theocharides ◽

Marios Theristis ◽

George Makrides ◽

Marios Kynigos ◽

Chrysovalantis Spanias ◽

...

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Regression Tree ◽

Training Data ◽

Support Vector ◽

Learning Models ◽

Bayesian Neural Network ◽

Production Forecasting ◽

Main Challenge ◽

Machine Learning Models

A main challenge for integrating the intermittent photovoltaic (PV) power generation remains the accuracy of day-ahead forecasts and the establishment of robust performing methods. The purpose of this work is to address these technological challenges by evaluating the day-ahead PV production forecasting performance of different machine learning models under different supervised learning regimes and minimal input features. Specifically, the day-ahead forecasting capability of Bayesian neural network (BNN), support vector regression (SVR), and regression tree (RT) models was investigated by employing the same dataset for training and performance verification, thus enabling a valid comparison. The training regime analysis demonstrated that the performance of the investigated models was strongly dependent on the timeframe of the train set, training data sequence, and application of irradiance condition filters. Furthermore, accurate results were obtained utilizing only the measured power output and other calculated parameters for training. Consequently, useful information is provided for establishing a robust day-ahead forecasting methodology that utilizes calculated input parameters and an optimal supervised learning approach. Finally, the obtained results demonstrated that the optimally constructed BNN outperformed all other machine learning models achieving forecasting accuracies lower than 5%.

Download Full-text

On the Spillover Effects of Online Product Reviews on Purchases: Evidence from Clickstream Data

Information Systems Research ◽

10.1287/isre.2021.0998 ◽

2021 ◽

Author(s):

Young Kwark ◽

Gene Moo Lee ◽

Paul A. Pavlou ◽

Liangfei Qiu

Keyword(s):

Machine Learning ◽

Spillover Effect ◽

Spillover Effects ◽

Product Reviews ◽

Learning Models ◽

Complementary Products ◽

Clickstream Data ◽

Online Product Reviews ◽

The Mean ◽

Machine Learning Models

We study the spillover effects of the online reviews of other covisited products on the purchases of a focal product using clickstream data from a large retailer. The proposed spillover effects are moderated by (a) whether the related (covisited) products are complementary or substitutive, (b) the choice of media channel (mobile or personal computer (PC)) used, (c) whether the related products are from the same or a different brand, (d) consumer experience, and (e) the variance of the review ratings. To identify complementary and substitutive products, we develop supervised machine-learning models based on product characteristics, such as product category and brand, and novel text-based similarity measures. We train and validate the machine-learning models using product pair labels from Amazon Mechanical Turk. Our results show that the mean rating of substitutive (complementary) products has a negative (positive) effect on purchasing of the focal product. Interestingly, the magnitude of the spillover effects of the mean ratings of covisited (substitutive and complementary) products is significantly larger than the effects on the focal product, especially for complementary products. The spillover effect of ratings is stronger for consumers who use mobile devices versus PCs. We find the negative effect of the mean ratings of substitutive products across different brands on purchasing of a focal product to be significantly higher than within the same brand. Lastly, the effect of the mean ratings is stronger for less experienced consumers and for ratings with lower variance. We discuss implications on leveraging the spillover effect of the online product reviews of related products to encourage online purchases.

Download Full-text

Customer Loan Approval Classification by Supervised Learning Model

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d9275.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 9898-9901

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Learning Model ◽

High Accuracy ◽

Learning Models ◽

Loan Repayment ◽

Financial Status ◽

Loan Approval ◽

Prediction Problems ◽

Machine Learning Models

The loan is one of the most important schemes of bank. Usually the Banks are willing to give loans to the customers based on their requirements. However, unfortunately there are some customers who delay the payment of loan or unable to pay the loans due to financial status. In order to solve this problem, banks need to use thehelp of some techniques in predicting the loan repayment status. Machine Learning models are known to have a high accuracy on prediction problems, so in this paper we use some of the machine learning models in default loan prediction.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text