Establishing Ground Truth on Pyschophysiological Models for Training Machine Learning Algorithms: Options for Ground Truth Proxies

Author(s):  
Keith Brawner ◽  
Michael W. Boyce
2021 ◽  
Vol 8 (1) ◽  
pp. 205395172110135
Author(s):  
Florian Jaton

This theoretical paper considers the morality of machine learning algorithms and systems in the light of the biases that ground their correctness. It begins by presenting biases not as a priori negative entities but as contingent external referents—often gathered in benchmarked repositories called ground-truth datasets—that define what needs to be learned and allow for performance measures. I then argue that ground-truth datasets and their concomitant practices—that fundamentally involve establishing biases to enable learning procedures—can be described by their respective morality, here defined as the more or less accounted experience of hesitation when faced with what pragmatist philosopher William James called “genuine options”—that is, choices to be made in the heat of the moment that engage different possible futures. I then stress three constitutive dimensions of this pragmatist morality, as far as ground-truthing practices are concerned: (I) the definition of the problem to be solved (problematization), (II) the identification of the data to be collected and set up (databasing), and (III) the qualification of the targets to be learned (labeling). I finally suggest that this three-dimensional conceptual space can be used to map machine learning algorithmic projects in terms of the morality of their respective and constitutive ground-truthing practices. Such techno-moral graphs may, in turn, serve as equipment for greater governance of machine learning algorithms and systems.


2018 ◽  
Author(s):  
Christian Damgaard

AbstractIn order to fit population ecological models, e.g. plant competition models, to new drone-aided image data, we need to develop statistical models that may take the new type of measurement uncertainty when applying machine-learning algorithms into account and quantify its importance for statistical inferences and ecological predictions. Here, it is proposed to quantify the uncertainty and bias of image predicted plant taxonomy and abundance in a hierarchical statistical model that is linked to ground-truth data obtained by the pin-point method. It is critical that the error rate in the species identification process is minimized when the image data are fitted to the population ecological models, and several avenues for reaching this objective are discussed. The outlined method to statistically model known sources of uncertainty when applying machine-learning algorithms may be relevant for other applied scientific disciplines.


2020 ◽  
Vol 34 (12) ◽  
pp. 1078-1087
Author(s):  
Peter S. Lum ◽  
Liqi Shu ◽  
Elaine M. Bochniewicz ◽  
Tan Tran ◽  
Lin-Ching Chang ◽  
...  

Background Wrist-worn accelerometry provides objective monitoring of upper-extremity functional use, such as reaching tasks, but also detects nonfunctional movements, leading to ambiguity in monitoring results. Objective Compare machine learning algorithms with standard methods (counts ratio) to improve accuracy in detecting functional activity. Methods Healthy controls and individuals with stroke performed unstructured tasks in a simulated community environment (Test duration = 26 ± 8 minutes) while accelerometry and video were synchronously recorded. Human annotators scored each frame of the video as being functional or nonfunctional activity, providing ground truth. Several machine learning algorithms were developed to separate functional from nonfunctional activity in the accelerometer data. We also calculated the counts ratio, which uses a thresholding scheme to calculate the duration of activity in the paretic limb normalized by the less-affected limb. Results The counts ratio was not significantly correlated with ground truth and had large errors ( r = 0.48; P = .16; average error = 52.7%) because of high levels of nonfunctional movement in the paretic limb. Counts did not increase with increased functional movement. The best-performing intrasubject machine learning algorithm had an accuracy of 92.6% in the paretic limb of stroke patients, and the correlation with ground truth was r = 0.99 ( P < .001; average error = 3.9%). The best intersubject model had an accuracy of 74.2% and a correlation of r =0.81 ( P = .005; average error = 5.2%) with ground truth. Conclusions In our sample, the counts ratio did not accurately reflect functional activity. Machine learning algorithms were more accurate, and future work should focus on the development of a clinical tool.


2021 ◽  
Vol 13 (6) ◽  
pp. 1161
Author(s):  
Christian Damgaard

In order to fit population ecological models, e.g., plant competition models, to new drone-aided image data, we need to develop statistical models that may take the new type of measurement uncertainty when applying machine-learning algorithms into account and quantify its importance for statistical inferences and ecological predictions. Here, it is proposed to quantify the uncertainty and bias of image predicted plant taxonomy and abundance in a hierarchical statistical model that is linked to ground-truth data obtained by the pin-point method. It is critical that the error rate in the species identification process is minimized when the image data are fitted to the population ecological models, and several avenues for reaching this objective are discussed. The outlined method to statistically model known sources of uncertainty when applying machine-learning algorithms may be relevant for other applied scientific disciplines.


2020 ◽  
Author(s):  
awalin sopan

There are a growing number of machine learning algorithms which operate on graphs. Example applications for these algorithms include predicting which customers will recommend products to their friends in a viral marketing campaign using a customer network, predicting the topics of publications in a citation network, or predicting the political affiliations of people in a social network. It is important for an analyst to have tools to help compare the output of these machine learning algorithms. In this work, we present G-PARE, a visual analytic tool for comparing two uncertain graphs, where each uncertain graph is produced by a machine learning algorithm which outputs probabilities over node labels. G-PARE provides several different views which allow users to obtain a global overview of the algorithms output, as well as focused views that show subsets of nodes of interest. By providing an adaptive exploration environment, G-PARE guides the users to places in the graph where two algorithms predictions agree and places where they disagree. This enables the user to follow cascades of misclassifications by comparing the algorithms outcome with the ground truth. After describing the features of G-PARE, we illustrate its utility through several use cases based on networks from different domains.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2020 ◽  
pp. 1-11
Author(s):  
Jie Liu ◽  
Lin Lin ◽  
Xiufang Liang

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.


2019 ◽  
Vol 1 (2) ◽  
pp. 78-80
Author(s):  
Eric Holloway

Detecting some patterns is a simple task for humans, but nearly impossible for current machine learning algorithms.  Here, the "checkerboard" pattern is examined, where human prediction nears 100% and machine prediction drops significantly below 50%.


Sign in / Sign up

Export Citation Format

Share Document