scholarly journals Machine Learning and Irresponsible Inference: Morally Assessing the Training Data for Image Recognition Systems

Author(s):  
Owen C. King
2020 ◽  
Vol 6 ◽  
pp. 237802312096717
Author(s):  
Carsten Schwemmer ◽  
Carly Knight ◽  
Emily D. Bello-Pardo ◽  
Stan Oklobdzija ◽  
Martijn Schoonvelde ◽  
...  

Image recognition systems offer the promise to learn from images at scale without requiring expert knowledge. However, past research suggests that machine learning systems often produce biased output. In this article, we evaluate potential gender biases of commercial image recognition platforms using photographs of U.S. members of Congress and a large number of Twitter images posted by these politicians. Our crowdsourced validation shows that commercial image recognition systems can produce labels that are correct and biased at the same time as they selectively report a subset of many possible true labels. We find that images of women received three times more annotations related to physical appearance. Moreover, women in images are recognized at substantially lower rates in comparison with men. We discuss how encoded biases such as these affect the visibility of women, reinforce harmful gender stereotypes, and limit the validity of the insights that can be gathered from such data.


In recent years, huge amounts of data in form of images has been efficiently created and accumulated at extraordinary rates. This huge amount of data that has high volume and velocity has presented us with the problem of coming up with practical and effective ways to classify it for analysis. Existing classification systems can never fulfil the demand and the difficulties of accurately classifying such data. In this paper, we built a Convolutional Neural Network (CNN) which is one of the most powerful and popular machine learning tools used in image recognition systems for classifying images from one of the widely used image datasets CIFAR-10. This paper also gives a thorough overview of the working of our CNN architecture with its parameters and difficulties.


2021 ◽  
Vol 40 (1) ◽  
Author(s):  
Tuomas Koskinen ◽  
Iikka Virkkunen ◽  
Oskar Siljama ◽  
Oskari Jessen-Juhler

AbstractPrevious research (Li et al., Understanding the disharmony between dropout and batch normalization by variance shift. CoRR abs/1801.05134 (2018). http://arxiv.org/abs/1801.05134arXiv:1801.05134) has shown the plausibility of using a modern deep convolutional neural network to detect flaws from phased-array ultrasonic data. This brings the repeatability and effectiveness of automated systems to complex ultrasonic signal evaluation, previously done exclusively by human inspectors. The major breakthrough was to use virtual flaws to generate ample flaw data for the teaching of the algorithm. This enabled the use of raw ultrasonic scan data for detection and to leverage some of the approaches used in machine learning for image recognition. Unlike traditional image recognition, training data for ultrasonic inspection is scarce. While virtual flaws allow us to broaden the data considerably, original flaws with proper flaw-size distribution are still required. This is of course the same for training human inspectors. The training of human inspectors is usually done with easily manufacturable flaws such as side-drilled holes and EDM notches. While the difference between these easily manufactured artificial flaws and real flaws is obvious, human inspectors still manage to train with them and perform well in real inspection scenarios. In the present work, we use a modern, deep convolutional neural network to detect flaws from phased-array ultrasonic data and compare the results achieved from different training data obtained from various artificial flaws. The model demonstrated good generalization capability toward flaw sizes larger than the original training data, and the effect of the minimum flaw size in the data set affects the $$a_{90/95}$$ a 90 / 95 value. This work also demonstrates how different artificial flaws, solidification cracks, EDM notch and simple simulated flaws generalize differently.


2020 ◽  
Vol 14 (2) ◽  
pp. 27-44
Author(s):  
Benjamin M. Abdel-Karim

The work by Mandelbrot develops a basic understanding of fractals and the artwork of Jackson Pollok to reveal the beauty fractal geometry. The pattern of recurring structures is also reflected in share prices. Mandelbrot himself speaks of the fractal heart of the financial markets. Previous research has shown the potential of image recognition. This paper presents the possibility of using the structure recognition capability of modern machine learning methods to make forecasts based on fractal course information. We generate training data from real and simulated data. These data are represented in images to train a special artificial neural network. Subsequently, real data are presented to the network for use in predicting. The results show that the forecast of time series based on stock price illustration, compared to a benchmark, delivers promising results. This paper makes two essential contributions to research. From a theoretical point of view, fractal geometry shows that it can serve as a means of legitimation for technical analysis. From a practical point of view, highly developed methods from the field of machine learning are able to recognize patterns in data through appropriate data transformation, and that models such as random walk have an informational content that can be used to train machine learning models.


2018 ◽  
Author(s):  
Carsten Schwemmer ◽  
Carly Knight ◽  
Emily Bello-Pardo ◽  
Stan Oklobdzija ◽  
Martijn Schoonvelde ◽  
...  

Image recognition systems offer the promise to learn from images at scale without requiring expert knowledge. However, past research suggests that machine learning systems often produce biased output. In this article, we evaluate potential gender biases of commercial image recognition platforms using photographs of U.S. members of Congress and a large number of Twitter images posted by these politicians. Our crowdsourced validation shows that commercial image recognition systems can produce labels that are correct and biased at the same time as they selectively report a subset of many possible true labels. We find that images of women received three times more annotations related to physical appearance. Moreover, women in images are recognized at substantially lower rates in comparison with men. We discuss how encoded biases such as these affect the visibility of women, reinforce harmful gender stereotypes, and limit the validity of the insights that can be gathered from such data.


Author(s):  
Barlian Khasoggi ◽  
Ermatita Ermatita ◽  
Samsuryadi Samsuryadi

The introduction of a modern image recognition that has millions of parameters and requires a lot of training data as well as high computing power that is hungry for energy consumption so it becomes inefficient in everyday use. Machine Learning has changed the computing paradigm, from complex calculations that require high computational power to environmentally friendly technologies that can efficiently meet daily needs. To get the best training model, many studies use large numbers of datasets. However, the complexity of large datasets requires large devices and requires high computing power. Therefore large computational resources do not have high flexibility towards the tendency of human interaction which prioritizes the efficiency and effectiveness of computer vision. This study uses the Convolutional Neural Networks (CNN) method with MobileNet architecture for image recognition on mobile devices and embedded devices with limited resources with ARM-based CPUs and works with a moderate amount of training data (thousands of labeled images). As a result, the MobileNet v1 architecture on the ms8pro device can classify the caltech101 dataset with an accuracy rate 92.4% and 2.1 Watt power draw. With the level of accuracy and efficiency of the resources used, it is expected that MobileNet's architecture can change the machine learning paradigm so that it has a high degree of flexibility towards the tendency of human interaction that prioritizes the efficiency and effectiveness of computer vision.


2019 ◽  
Author(s):  
Andrew Medford ◽  
Shengchun Yang ◽  
Fuzhu Liu

Understanding the interaction of multiple types of adsorbate molecules on solid surfaces is crucial to establishing the stability of catalysts under various chemical environments. Computational studies on the high coverage and mixed coverages of reaction intermediates are still challenging, especially for transition-metal compounds. In this work, we present a framework to predict differential adsorption energies and identify low-energy structures under high- and mixed-adsorbate coverages on oxide materials. The approach uses Gaussian process machine-learning models with quantified uncertainty in conjunction with an iterative training algorithm to actively identify the training set. The framework is demonstrated for the mixed adsorption of CH<sub>x</sub>, NH<sub>x</sub> and OH<sub>x</sub> species on the oxygen vacancy and pristine rutile TiO<sub>2</sub>(110) surface sites. The results indicate that the proposed algorithm is highly efficient at identifying the most valuable training data, and is able to predict differential adsorption energies with a mean absolute error of ~0.3 eV based on <25% of the total DFT data. The algorithm is also used to identify 76% of the low-energy structures based on <30% of the total DFT data, enabling construction of surface phase diagrams that account for high and mixed coverage as a function of the chemical potential of C, H, O, and N. Furthermore, the computational scaling indicates the algorithm scales nearly linearly (N<sup>1.12</sup>) as the number of adsorbates increases. This framework can be directly extended to metals, metal oxides, and other materials, providing a practical route toward the investigation of the behavior of catalysts under high-coverage conditions.


2018 ◽  
Vol 6 (2) ◽  
pp. 283-286
Author(s):  
M. Samba Siva Rao ◽  
◽  
M.Yaswanth . ◽  
K. Raghavendra Swamy ◽  
◽  
...  

Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


2019 ◽  
Vol 11 (3) ◽  
pp. 284 ◽  
Author(s):  
Linglin Zeng ◽  
Shun Hu ◽  
Daxiang Xiang ◽  
Xiang Zhang ◽  
Deren Li ◽  
...  

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.


Sign in / Sign up

Export Citation Format

Share Document