scholarly journals Assessing biases, relaxing moralism: On ground-truthing practices in machine learning design and application

2021 ◽  
Vol 8 (1) ◽  
pp. 205395172110135
Author(s):  
Florian Jaton

This theoretical paper considers the morality of machine learning algorithms and systems in the light of the biases that ground their correctness. It begins by presenting biases not as a priori negative entities but as contingent external referents—often gathered in benchmarked repositories called ground-truth datasets—that define what needs to be learned and allow for performance measures. I then argue that ground-truth datasets and their concomitant practices—that fundamentally involve establishing biases to enable learning procedures—can be described by their respective morality, here defined as the more or less accounted experience of hesitation when faced with what pragmatist philosopher William James called “genuine options”—that is, choices to be made in the heat of the moment that engage different possible futures. I then stress three constitutive dimensions of this pragmatist morality, as far as ground-truthing practices are concerned: (I) the definition of the problem to be solved (problematization), (II) the identification of the data to be collected and set up (databasing), and (III) the qualification of the targets to be learned (labeling). I finally suggest that this three-dimensional conceptual space can be used to map machine learning algorithmic projects in terms of the morality of their respective and constitutive ground-truthing practices. Such techno-moral graphs may, in turn, serve as equipment for greater governance of machine learning algorithms and systems.

2018 ◽  
Author(s):  
Christian Damgaard

AbstractIn order to fit population ecological models, e.g. plant competition models, to new drone-aided image data, we need to develop statistical models that may take the new type of measurement uncertainty when applying machine-learning algorithms into account and quantify its importance for statistical inferences and ecological predictions. Here, it is proposed to quantify the uncertainty and bias of image predicted plant taxonomy and abundance in a hierarchical statistical model that is linked to ground-truth data obtained by the pin-point method. It is critical that the error rate in the species identification process is minimized when the image data are fitted to the population ecological models, and several avenues for reaching this objective are discussed. The outlined method to statistically model known sources of uncertainty when applying machine-learning algorithms may be relevant for other applied scientific disciplines.


2021 ◽  
Vol 13 (6) ◽  
pp. 1161
Author(s):  
Christian Damgaard

In order to fit population ecological models, e.g., plant competition models, to new drone-aided image data, we need to develop statistical models that may take the new type of measurement uncertainty when applying machine-learning algorithms into account and quantify its importance for statistical inferences and ecological predictions. Here, it is proposed to quantify the uncertainty and bias of image predicted plant taxonomy and abundance in a hierarchical statistical model that is linked to ground-truth data obtained by the pin-point method. It is critical that the error rate in the species identification process is minimized when the image data are fitted to the population ecological models, and several avenues for reaching this objective are discussed. The outlined method to statistically model known sources of uncertainty when applying machine-learning algorithms may be relevant for other applied scientific disciplines.


2020 ◽  
Vol 6 ◽  
pp. e253
Author(s):  
Nafees Sadique ◽  
Al Amin Neaz Ahmed ◽  
Md Tajul Islam ◽  
Md. Nawshad Pervage ◽  
Swakkhar Shatabda

Proteins are the building blocks of all cells in both human and all living creatures of the world. Most of the work in the living organism is performed by proteins. Proteins are polymers of amino acid monomers which are biomolecules or macromolecules. The tertiary structure of protein represents the three-dimensional shape of a protein. The functions, classification and binding sites are governed by the protein’s tertiary structure. If two protein structures are alike, then the two proteins can be of the same kind implying similar structural class and ligand binding properties. In this paper, we have used the protein tertiary structure to generate effective features for applications in structural similarity to detect structural class and ligand binding. Firstly, we have analyzed the effectiveness of a group of image-based features to predict the structural class of a protein. These features are derived from the image generated by the distance matrix of the tertiary structure of a given protein. They include local binary pattern (LBP) histogram, Gabor filtered LBP histogram, separate row multiplication matrix with uniform LBP histogram, neighbor block subtraction matrix with uniform LBP histogram and atom bond. Separate row multiplication matrix and neighbor block subtraction matrix filters, as well as atom bond, are our novels. The experiments were done on a standard benchmark dataset. We have demonstrated the effectiveness of these features over a large variety of supervised machine learning algorithms. Experiments suggest support vector machines is the best performing classifier on the selected dataset using the set of features. We believe the excellent performance of Hybrid LBP in terms of accuracy would motivate the researchers and practitioners to use it to identify protein structural class. To facilitate that, a classification model using Hybrid LBP is readily available for use at http://brl.uiu.ac.bd/PL/. Protein-ligand binding is accountable for managing the tasks of biological receptors that help to cure diseases and many more. Therefore, binding prediction between protein and ligand is important for understanding a protein’s activity or to accelerate docking computations in virtual screening-based drug design. Protein-ligand binding prediction requires three-dimensional tertiary structure of the target protein to be searched for ligand binding. In this paper, we have proposed a supervised learning algorithm for predicting protein-ligand binding, which is a similarity-based clustering approach using the same set of features. Our algorithm works better than the most popular and widely used machine learning algorithms.


2020 ◽  
Vol 34 (12) ◽  
pp. 1078-1087
Author(s):  
Peter S. Lum ◽  
Liqi Shu ◽  
Elaine M. Bochniewicz ◽  
Tan Tran ◽  
Lin-Ching Chang ◽  
...  

Background Wrist-worn accelerometry provides objective monitoring of upper-extremity functional use, such as reaching tasks, but also detects nonfunctional movements, leading to ambiguity in monitoring results. Objective Compare machine learning algorithms with standard methods (counts ratio) to improve accuracy in detecting functional activity. Methods Healthy controls and individuals with stroke performed unstructured tasks in a simulated community environment (Test duration = 26 ± 8 minutes) while accelerometry and video were synchronously recorded. Human annotators scored each frame of the video as being functional or nonfunctional activity, providing ground truth. Several machine learning algorithms were developed to separate functional from nonfunctional activity in the accelerometer data. We also calculated the counts ratio, which uses a thresholding scheme to calculate the duration of activity in the paretic limb normalized by the less-affected limb. Results The counts ratio was not significantly correlated with ground truth and had large errors ( r = 0.48; P = .16; average error = 52.7%) because of high levels of nonfunctional movement in the paretic limb. Counts did not increase with increased functional movement. The best-performing intrasubject machine learning algorithm had an accuracy of 92.6% in the paretic limb of stroke patients, and the correlation with ground truth was r = 0.99 ( P < .001; average error = 3.9%). The best intersubject model had an accuracy of 74.2% and a correlation of r =0.81 ( P = .005; average error = 5.2%) with ground truth. Conclusions In our sample, the counts ratio did not accurately reflect functional activity. Machine learning algorithms were more accurate, and future work should focus on the development of a clinical tool.


Author(s):  
А.Н. ВИНОГРАДОВ ◽  
А.С. СУРМАЧЕВ

Предлагается метод выявления характерных искажений речевого сигнала в системах подвижной радиосвязи в условиях априорной неопределенности относительно условий приема сигнала и его качества. Предлагаемый метод базируется на использовании алгоритмов машинного обучения, в частности, аппарата построения деревьев решений и их множеств. Приводится подробное описание используемых для классификации признаков сигналов, а также характеристики обучающей и контрольной выборок. Приведены фрагменты кода программ, отражающие основные ключевые моменты их работы, и экспериментально полученные результаты. It is proposed a method of detecting specific distortions in mobile communications systems under conditions of a priori uncertainty of signal reception conditions and its quality. The proposed method is based on the use of machine learning algorithms, in particular construction of decision trees and their ensembles. A detailed description of signal features used for classification, as well as characteristics of training and control samples, are provided. Program code fragments that implement basic working stages and experimentally obtained results are given.


2020 ◽  
Author(s):  
awalin sopan

There are a growing number of machine learning algorithms which operate on graphs. Example applications for these algorithms include predicting which customers will recommend products to their friends in a viral marketing campaign using a customer network, predicting the topics of publications in a citation network, or predicting the political affiliations of people in a social network. It is important for an analyst to have tools to help compare the output of these machine learning algorithms. In this work, we present G-PARE, a visual analytic tool for comparing two uncertain graphs, where each uncertain graph is produced by a machine learning algorithm which outputs probabilities over node labels. G-PARE provides several different views which allow users to obtain a global overview of the algorithms output, as well as focused views that show subsets of nodes of interest. By providing an adaptive exploration environment, G-PARE guides the users to places in the graph where two algorithms predictions agree and places where they disagree. This enables the user to follow cascades of misclassifications by comparing the algorithms outcome with the ground truth. After describing the features of G-PARE, we illustrate its utility through several use cases based on networks from different domains.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Ijaz Khan ◽  
Abdul Rahim Ahmad ◽  
Nafaa Jabeur ◽  
Mohammed Najah Mahdi

AbstractA major problem an instructor experiences is the systematic monitoring of students’ academic progress in a course. The moment the students, with unsatisfactory academic progress, are identified the instructor can take measures to offer additional support to the struggling students. The fact is that the modern-day educational institutes tend to collect enormous amount of data concerning their students from various sources, however, the institutes are craving novel procedures to utilize the data to magnify their prestige and improve the education quality. This research evaluates the effectiveness of machine learning algorithms to monitor students’ academic progress and informs the instructor about the students at the risk of ending up with unsatisfactory result in a course. In addition, the prediction model is transformed into a clear shape to make it easy for the instructor to prepare the necessary precautionary procedures. We developed a set of prediction models with distinct machine learning algorithms. Decision tree triumph over other models and thus is further transformed into easily explicable format. The final output of the research turns into a set of supportive measures to carefully monitor students’ performance from the very start of the course and a set of preventive measures to offer additional attention to the struggling students.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
A. A. Guda ◽  
S. A. Guda ◽  
A. Martini ◽  
A. N. Kravtsova ◽  
A. Algasov ◽  
...  

AbstractX-ray absorption near-edge structure (XANES) spectra are the fingerprint of the local atomic and electronic structures around the absorbing atom. However, the quantitative analysis of these spectra is not straightforward. Even with the most recent advances in this area, for a given spectrum, it is not clear a priori which structural parameters can be refined and how uncertainties should be estimated. Here, we present an alternative concept for the analysis of XANES spectra, which is based on machine learning algorithms and establishes the relationship between intuitive descriptors of spectra, such as edge position, intensities, positions, and curvatures of minima and maxima on the one hand, and those related to the local atomic and electronic structure which are the coordination numbers, bond distances and angles and oxidation state on the other hand. This approach overcoms the problem of the systematic difference between theoretical and experimental spectra. Furthermore, the numerical relations can be expressed in analytical formulas providing a simple and fast tool to extract structural parameters based on the spectral shape. The methodology was successfully applied to experimental data for the multicomponent Fe:SiO2 system and reference iron compounds, demonstrating the high prediction quality for both the theoretical validation sets and experimental data.


Sign in / Sign up

Export Citation Format

Share Document