Establishing Ground Truth on Pyschophysiological Models for Training Machine Learning Algorithms: Options for Ground Truth Proxies

Assessing biases, relaxing moralism: On ground-truthing practices in machine learning design and application

Big Data & Society ◽

10.1177/20539517211013569 ◽

2021 ◽

Vol 8 (1) ◽

pp. 205395172110135

Author(s):

Florian Jaton

Keyword(s):

Machine Learning ◽

William James ◽

A Priori ◽

Learning Algorithms ◽

Three Dimensional ◽

Ground Truth ◽

Machine Learning Algorithms ◽

Ground Truthing ◽

Set Up ◽

The Moment

This theoretical paper considers the morality of machine learning algorithms and systems in the light of the biases that ground their correctness. It begins by presenting biases not as a priori negative entities but as contingent external referents—often gathered in benchmarked repositories called ground-truth datasets—that define what needs to be learned and allow for performance measures. I then argue that ground-truth datasets and their concomitant practices—that fundamentally involve establishing biases to enable learning procedures—can be described by their respective morality, here defined as the more or less accounted experience of hesitation when faced with what pragmatist philosopher William James called “genuine options”—that is, choices to be made in the heat of the moment that engage different possible futures. I then stress three constitutive dimensions of this pragmatist morality, as far as ground-truthing practices are concerned: (I) the definition of the problem to be solved (problematization), (II) the identification of the data to be collected and set up (databasing), and (III) the qualification of the targets to be learned (labeling). I finally suggest that this three-dimensional conceptual space can be used to map machine learning algorithmic projects in terms of the morality of their respective and constitutive ground-truthing practices. Such techno-moral graphs may, in turn, serve as equipment for greater governance of machine learning algorithms and systems.

Download Full-text

Integrating hierarchical statistical models and machine-learning algorithms for ground-truthing drone images of the vegetation: taxonomy, abundance and population ecological models

10.1101/491381 ◽

2018 ◽

Cited By ~ 1

Author(s):

Christian Damgaard

Keyword(s):

Machine Learning ◽

Statistical Models ◽

Learning Algorithms ◽

Plant Competition ◽

Image Data ◽

Ground Truth ◽

Ecological Models ◽

Machine Learning Algorithms ◽

Ground Truth Data ◽

Ground Truthing

AbstractIn order to fit population ecological models, e.g. plant competition models, to new drone-aided image data, we need to develop statistical models that may take the new type of measurement uncertainty when applying machine-learning algorithms into account and quantify its importance for statistical inferences and ecological predictions. Here, it is proposed to quantify the uncertainty and bias of image predicted plant taxonomy and abundance in a hierarchical statistical model that is linked to ground-truth data obtained by the pin-point method. It is critical that the error rate in the species identification process is minimized when the image data are fitted to the population ecological models, and several avenues for reaching this objective are discussed. The outlined method to statistically model known sources of uncertainty when applying machine-learning algorithms may be relevant for other applied scientific disciplines.

Download Full-text

Improving Accelerometry-Based Measurement of Functional Use of the Upper Extremity After Stroke: Machine Learning Versus Counts Threshold Method

Neurorehabilitation and Neural Repair ◽

10.1177/1545968320962483 ◽

2020 ◽

Vol 34 (12) ◽

pp. 1078-1087

Author(s):

Peter S. Lum ◽

Liqi Shu ◽

Elaine M. Bochniewicz ◽

Tan Tran ◽

Lin-Ching Chang ◽

...

Keyword(s):

Machine Learning ◽

Upper Extremity ◽

Functional Activity ◽

Learning Algorithms ◽

Ground Truth ◽

Machine Learning Algorithms ◽

Average Error ◽

Test Duration ◽

Accelerometer Data ◽

Paretic Limb

Background Wrist-worn accelerometry provides objective monitoring of upper-extremity functional use, such as reaching tasks, but also detects nonfunctional movements, leading to ambiguity in monitoring results. Objective Compare machine learning algorithms with standard methods (counts ratio) to improve accuracy in detecting functional activity. Methods Healthy controls and individuals with stroke performed unstructured tasks in a simulated community environment (Test duration = 26 ± 8 minutes) while accelerometry and video were synchronously recorded. Human annotators scored each frame of the video as being functional or nonfunctional activity, providing ground truth. Several machine learning algorithms were developed to separate functional from nonfunctional activity in the accelerometer data. We also calculated the counts ratio, which uses a thresholding scheme to calculate the duration of activity in the paretic limb normalized by the less-affected limb. Results The counts ratio was not significantly correlated with ground truth and had large errors ( r = 0.48; P = .16; average error = 52.7%) because of high levels of nonfunctional movement in the paretic limb. Counts did not increase with increased functional movement. The best-performing intrasubject machine learning algorithm had an accuracy of 92.6% in the paretic limb of stroke patients, and the correlation with ground truth was r = 0.99 ( P < .001; average error = 3.9%). The best intersubject model had an accuracy of 74.2% and a correlation of r =0.81 ( P = .005; average error = 5.2%) with ground truth. Conclusions In our sample, the counts ratio did not accurately reflect functional activity. Machine learning algorithms were more accurate, and future work should focus on the development of a clinical tool.

Download Full-text

Integrating Hierarchical Statistical Models and Machine-Learning Algorithms for Ground-Truthing Drone Images of the Vegetation: Taxonomy, Abundance and Population Ecological Models

Remote Sensing ◽

10.3390/rs13061161 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1161

Author(s):

Christian Damgaard

Keyword(s):

Machine Learning ◽

Statistical Models ◽

Learning Algorithms ◽

Plant Competition ◽

Image Data ◽

Ground Truth ◽

Ecological Models ◽

Machine Learning Algorithms ◽

Ground Truth Data ◽

Ground Truthing

In order to fit population ecological models, e.g., plant competition models, to new drone-aided image data, we need to develop statistical models that may take the new type of measurement uncertainty when applying machine-learning algorithms into account and quantify its importance for statistical inferences and ecological predictions. Here, it is proposed to quantify the uncertainty and bias of image predicted plant taxonomy and abundance in a hierarchical statistical model that is linked to ground-truth data obtained by the pin-point method. It is critical that the error rate in the species identification process is minimized when the image data are fitted to the population ecological models, and several avenues for reaching this objective are discussed. The outlined method to statistically model known sources of uncertainty when applying machine-learning algorithms may be relevant for other applied scientific disciplines.

Download Full-text

G-PARE: A visual analytic tool for comparative analysis of uncertain graphs

10.31219/osf.io/3sbvz ◽

2020 ◽

Author(s):

awalin sopan

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Ground Truth ◽

Citation Network ◽

Viral Marketing ◽

Machine Learning Algorithms ◽

Analytic Tool ◽

Uncertain Graphs ◽

Exploration Environment

There are a growing number of machine learning algorithms which operate on graphs. Example applications for these algorithms include predicting which customers will recommend products to their friends in a viral marketing campaign using a customer network, predicting the topics of publications in a citation network, or predicting the political affiliations of people in a social network. It is important for an analyst to have tools to help compare the output of these machine learning algorithms. In this work, we present G-PARE, a visual analytic tool for comparing two uncertain graphs, where each uncertain graph is produced by a machine learning algorithm which outputs probabilities over node labels. G-PARE provides several different views which allow users to obtain a global overview of the algorithms output, as well as focused views that show subsets of nodes of interest. By providing an adaptive exploration environment, G-PARE guides the users to places in the graph where two algorithms predictions agree and places where they disagree. This enables the user to follow cascades of misclassifications by comparing the algorithms outcome with the ground truth. After describing the features of G-PARE, we illustrate its utility through several use cases based on networks from different domains.

Download Full-text

Supplemental Material for One Model to Rule Them All? Using Machine Learning Algorithms to Determine the Number of Factors in Exploratory Factor Analysis

Psychological Methods ◽

10.1037/met0000262.supp ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Factor Analysis ◽

Exploratory Factor Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Number Of Factors

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

The Unlearnable Checkerboard Pattern

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.1.2.holloway.1 ◽

2019 ◽

Vol 1 (2) ◽

pp. 78-80

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Checkerboard Pattern ◽

Simple Task

Detecting some patterns is a simple task for humans, but nearly impossible for current machine learning algorithms. Here, the "checkerboard" pattern is examined, where human prediction nears 100% and machine prediction drops significantly below 50%.

Download Full-text