scholarly journals INVESTIGATING INTER-RATER RELIABILITY OF QUALITATIVE TEXT ANNOTATIONS IN MACHINE LEARNING DATASETS

2020 ◽  
Vol 1 ◽  
pp. 21-30
Author(s):  
N. El Dehaibi ◽  
E. F. MacDonald

AbstractAn important step when designers use machine learning models is annotating user generated content. In this study we investigate inter-rater reliability measures of qualitative annotations for supervised learning. We work with previously annotated product reviews from Amazon where phrases related to sustainability are highlighted. We measure inter-rater reliability of the annotations using four variations of Krippendorff's U-alpha. Based on the results we propose suggestions to designers on measuring reliability of qualitative annotations for machine learning datasets.

2019 ◽  
Vol 10 (35) ◽  
pp. 8154-8163 ◽  
Author(s):  
Yao Zhang ◽  
Alpha A. Lee

We report a statistically principled method to quantify the uncertainty of machine learning models for molecular properties prediction. We show that this uncertainty estimate can be used to judiciously design experiments.


2021 ◽  
Vol 2021 (3) ◽  
pp. 453-473
Author(s):  
Nathan Reitinger ◽  
Michelle L. Mazurek

Abstract With the aim of increasing online privacy, we present a novel, machine-learning based approach to blocking one of the three main ways website visitors are tracked online—canvas fingerprinting. Because the act of canvas fingerprinting uses, at its core, a JavaScript program, and because many of these programs are reused across the web, we are able to fit several machine learning models around a semantic representation of a potentially offending program, achieving accurate and robust classifiers. Our supervised learning approach is trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Classification leverages our key insight that the images drawn by canvas fingerprinting programs have a facially distinct appearance, allowing us to manually classify files based on the images drawn; we take this approach one step further and train our classifiers not on the malleable images themselves, but on the more-difficult-to-change, underlying source code generating the images. As a result, ML-CB allows for more accurate tracker blocking.


Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 1081
Author(s):  
Spyros Theocharides ◽  
Marios Theristis ◽  
George Makrides ◽  
Marios Kynigos ◽  
Chrysovalantis Spanias ◽  
...  

A main challenge for integrating the intermittent photovoltaic (PV) power generation remains the accuracy of day-ahead forecasts and the establishment of robust performing methods. The purpose of this work is to address these technological challenges by evaluating the day-ahead PV production forecasting performance of different machine learning models under different supervised learning regimes and minimal input features. Specifically, the day-ahead forecasting capability of Bayesian neural network (BNN), support vector regression (SVR), and regression tree (RT) models was investigated by employing the same dataset for training and performance verification, thus enabling a valid comparison. The training regime analysis demonstrated that the performance of the investigated models was strongly dependent on the timeframe of the train set, training data sequence, and application of irradiance condition filters. Furthermore, accurate results were obtained utilizing only the measured power output and other calculated parameters for training. Consequently, useful information is provided for establishing a robust day-ahead forecasting methodology that utilizes calculated input parameters and an optimal supervised learning approach. Finally, the obtained results demonstrated that the optimally constructed BNN outperformed all other machine learning models achieving forecasting accuracies lower than 5%.


Author(s):  
Young Kwark ◽  
Gene Moo Lee ◽  
Paul A. Pavlou ◽  
Liangfei Qiu

We study the spillover effects of the online reviews of other covisited products on the purchases of a focal product using clickstream data from a large retailer. The proposed spillover effects are moderated by (a) whether the related (covisited) products are complementary or substitutive, (b) the choice of media channel (mobile or personal computer (PC)) used, (c) whether the related products are from the same or a different brand, (d) consumer experience, and (e) the variance of the review ratings. To identify complementary and substitutive products, we develop supervised machine-learning models based on product characteristics, such as product category and brand, and novel text-based similarity measures. We train and validate the machine-learning models using product pair labels from Amazon Mechanical Turk. Our results show that the mean rating of substitutive (complementary) products has a negative (positive) effect on purchasing of the focal product. Interestingly, the magnitude of the spillover effects of the mean ratings of covisited (substitutive and complementary) products is significantly larger than the effects on the focal product, especially for complementary products. The spillover effect of ratings is stronger for consumers who use mobile devices versus PCs. We find the negative effect of the mean ratings of substitutive products across different brands on purchasing of a focal product to be significantly higher than within the same brand. Lastly, the effect of the mean ratings is stronger for less experienced consumers and for ratings with lower variance. We discuss implications on leveraging the spillover effect of the online product reviews of related products to encourage online purchases.


2019 ◽  
Vol 8 (4) ◽  
pp. 9898-9901

The loan is one of the most important schemes of bank. Usually the Banks are willing to give loans to the customers based on their requirements. However, unfortunately there are some customers who delay the payment of loan or unable to pay the loans due to financial status. In order to solve this problem, banks need to use thehelp of some techniques in predicting the loan repayment status. Machine Learning models are known to have a high accuracy on prediction problems, so in this paper we use some of the machine learning models in default loan prediction.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


Sign in / Sign up

Export Citation Format

Share Document