On the relationship between research parasites and fairness in machine learning: challenges and opportunities

Nicolás Nieto; Agostina Larrazabal; Victoria Peterson; Diego H Milone; Enzo Ferrante

doi:10.1093/gigascience/giab086

On the relationship between research parasites and fairness in machine learning: challenges and opportunities

GigaScience ◽

10.1093/gigascience/giab086 ◽

2021 ◽

Vol 10 (12) ◽

Author(s):

Nicolás Nieto ◽

Agostina Larrazabal ◽

Victoria Peterson ◽

Diego H Milone ◽

Enzo Ferrante

Keyword(s):

Machine Learning ◽

Data Model ◽

Learning Systems ◽

Training Data ◽

Model Construction ◽

Daily Lives ◽

The Past ◽

Learning Challenges ◽

Challenges And Opportunities ◽

The Relationship

Abstract Machine learning systems influence our daily lives in many different ways. Hence, it is crucial to ensure that the decisions and recommendations made by these systems are fair, equitable, and free of unintended biases. Over the past few years, the field of fairness in machine learning has grown rapidly, investigating how, when, and why these models capture, and even potentiate, biases that are deeply rooted not only in the training data but also in our society. In this Commentary, we discuss challenges and opportunities for rigorous posterior analyses of publicly available data to build fair and equitable machine learning systems, focusing on the importance of training data, model construction, and diversity in the team of developers. The thoughts presented here have grown out of the work we did, which resulted in our winning the annual Research Parasite Award that GigaSciencesponsors.

Download Full-text

The first 10 years of the international coordination network for standards in systems and synthetic biology (COMBINE)

Journal of Integrative Bioinformatics ◽

10.1515/jib-2020-0005 ◽

2020 ◽

Vol 17 (2-3) ◽

Cited By ~ 1

Author(s):

Dagmar Waltemath ◽

Martin Golebiewski ◽

Michael L Blinov ◽

Padraig Gleeson ◽

Henning Hermjakob ◽

...

Keyword(s):

Synthetic Biology ◽

Data Model ◽

Computational Models ◽

Life Sciences ◽

The Past ◽

Challenges And Opportunities ◽

Stakeholder Workshop ◽

Software Engineers ◽

Future Work ◽

Modeling In Biology

AbstractThis paper presents a report on outcomes of the 10th Computational Modeling in Biology Network (COMBINE) meeting that was held in Heidelberg, Germany, in July of 2019. The annual event brings together researchers, biocurators and software engineers to present recent results and discuss future work in the area of standards for systems and synthetic biology. The COMBINE initiative coordinates the development of various community standards and formats for computational models in the life sciences. Over the past 10 years, COMBINE has brought together standard communities that have further developed and harmonized their standards for better interoperability of models and data. COMBINE 2019 was co-located with a stakeholder workshop of the European EU-STANDS4PM initiative that aims at harmonized data and model standardization for in silico models in the field of personalized medicine, as well as with the FAIRDOM PALs meeting to discuss findable, accessible, interoperable and reusable (FAIR) data sharing. This report briefly describes the work discussed in invited and contributed talks as well as during breakout sessions. It also highlights recent advancements in data, model, and annotation standardization efforts. Finally, this report concludes with some challenges and opportunities that this community will face during the next 10 years.

Download Full-text

Zero-Shot Feature Selection via Transferring Supervised Knowledge

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2021040101 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-20

Author(s):

Zheng Wang ◽

Qiao Wang ◽

Tingzhang Zhao ◽

Chaokun Wang ◽

Xiaojun Ye

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dimensionality Reduction ◽

Real World ◽

Rapid Growth ◽

Learning Systems ◽

Training Data ◽

Effective Technique ◽

Supervised Methods ◽

Real World Datasets

Feature selection, an effective technique for dimensionality reduction, plays an important role in many machine learning systems. Supervised knowledge can significantly improve the performance. However, faced with the rapid growth of newly emerging concepts, existing supervised methods might easily suffer from the scarcity and validity of labeled data for training. In this paper, the authors study the problem of zero-shot feature selection (i.e., building a feature selection model that generalizes well to “unseen” concepts with limited training data of “seen” concepts). Specifically, they adopt class-semantic descriptions (i.e., attributes) as supervision for feature selection, so as to utilize the supervised knowledge transferred from the seen concepts. For more reliable discriminative features, they further propose the center-characteristic loss which encourages the selected features to capture the central characteristics of seen concepts. Extensive experiments conducted on various real-world datasets demonstrate the effectiveness of the method.

Download Full-text

Machine Learning Interpretability: A Survey on Methods and Metrics

Electronics ◽

10.3390/electronics8080832 ◽

2019 ◽

Vol 8 (8) ◽

pp. 832 ◽

Cited By ~ 36

Author(s):

Diogo V. Carvalho ◽

Eduardo M. Pereira ◽

Jaime S. Cardoso

Keyword(s):

Machine Learning ◽

Social Impact ◽

Research Field ◽

Learning Systems ◽

Future Directions ◽

The Past ◽

Current State ◽

Black Boxes ◽

Interpretable Models

Machine learning systems are becoming increasingly ubiquitous. These systems’s adoption has been expanding, accelerating the shift towards a more algorithmic society, meaning that algorithmically informed decisions have greater potential for significant social impact. However, most of these accurate decision support systems remain complex black boxes, meaning their internal logic and inner workings are hidden to the user and even experts cannot fully understand the rationale behind their predictions. Moreover, new regulations and highly regulated domains have made the audit and verifiability of decisions mandatory, increasing the demand for the ability to question, understand, and trust machine learning systems, for which interpretability is indispensable. The research community has recognized this interpretability problem and focused on developing both interpretable models and explanation methods over the past few years. However, the emergence of these methods shows there is no consensus on how to assess the explanation quality. Which are the most suitable metrics to assess the quality of an explanation? The aim of this article is to provide a review of the current state of the research field on machine learning interpretability while focusing on the societal impact and on the developed methods and metrics. Furthermore, a complete literature review is presented in order to identify future directions of work on this field.

Download Full-text

How relevant is linguistics to computational linguistics?

Linguistic Issues in Language Technology ◽

10.33011/lilt.v6i.1249 ◽

2011 ◽

Vol 6 ◽

Author(s):

Mark Johnson

Keyword(s):

Machine Learning ◽

Computational Linguistics ◽

Linguistic Theory ◽

Statistical Techniques ◽

Engineering Applications ◽

The Past ◽

Statistical Parsing ◽

Types Of Information ◽

The Relationship ◽

Scientific Fields

I start by explaining what I take computational linguistics to be, and discuss the relationship between its scientific side and its engineering applications. Statistical techniques have revolutionised many scientific fields in the past two decades, including computational linguistics. I describe the evolution of my own research in statistical parsing and how that lead me away from focusing on the details of any specific linguistic theory, and to concentrate instead on discovering which types of information (i.e., features) are important for specific linguistic processes, rather than on the details of exactly how this information should be formalised. I end by describing some of the ways that ideas from computational linguistics, statistics and machine learning may have an impact on linguistics in the future.

Download Full-text

Understanding the Relationship between Interactions and Outcomes in Human-in-the-Loop Machine Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/599 ◽

2021 ◽

Author(s):

Yuchen Cui ◽

Pallavi Koppol ◽

Henny Admoni ◽

Scott Niekum ◽

Reid Simmons ◽

...

Keyword(s):

Machine Learning ◽

Human Performance ◽

Autonomous Agents ◽

Training Data ◽

Open Problems ◽

Human In The Loop ◽

Efficiency And Effectiveness ◽

Interaction Types ◽

The Relationship ◽

And Training

Human-in-the-loop Machine Learning (HIL-ML) is a widely adopted paradigm for instilling human knowledge in autonomous agents. Many design choices influence the efficiency and effectiveness of such interactive learning processes, particularly the interaction type through which the human teacher may provide feedback. While different interaction types (demonstrations, preferences, etc.) have been proposed and evaluated in the HIL-ML literature, there has been little discussion of how these compare or how they should be selected to best address a particular learning problem. In this survey, we propose an organizing principle for HIL-ML that provides a way to analyze the effects of interaction types on human performance and training data. We also identify open problems in understanding the effects of interaction types.

Download Full-text

Privacy Preserving Machine Learning Challenges and Solution Approach for Training Data in ERP Systems

SSRN Electronic Journal ◽

10.2139/ssrn.3679275 ◽

2020 ◽

Author(s):

Mithun Gaur

Keyword(s):

Machine Learning ◽

Privacy Preserving ◽

Training Data ◽

Erp Systems ◽

Solution Approach ◽

Learning Challenges

Download Full-text

Hardware for machine learning: Challenges and opportunities

2017 IEEE Custom Integrated Circuits Conference (CICC) ◽

10.1109/cicc.2017.7993626 ◽

2017 ◽

Cited By ~ 30

Author(s):

Vivienne Sze ◽

Yu-Hsin Chen ◽

Joel Emer ◽

Amr Suleiman ◽

Zhengdong Zhang

Keyword(s):

Machine Learning ◽

Learning Challenges ◽

Challenges And Opportunities

Download Full-text

Identifying and Analyzing Data Model Requirements and Technology Potentials of Machine Learning Systems in the Manufacturing Industry of the Future

2020 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS) ◽

10.1109/itms51158.2020.9259303 ◽

2020 ◽

Author(s):

Gunter Schuh ◽

Paul Scholz ◽

Thomas Leich ◽

Richard May

Keyword(s):

Machine Learning ◽

Data Model ◽

Manufacturing Industry ◽

Learning Systems ◽

The Future

Download Full-text

Data-Centric Explanations: Explaining Training Data of Machine Learning Systems to Promote Transparency

Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems ◽

10.1145/3411764.3445736 ◽

2021 ◽

Author(s):

Ariful Islam Anik ◽

Andrea Bunt

Keyword(s):

Machine Learning ◽

Learning Systems ◽

Training Data

Download Full-text

Rationale Discovery and Explainable AI

10.3233/faia210341 ◽

2021 ◽

Author(s):

Cor Steging ◽

Silja Renooij ◽

Bart Verheij

Keyword(s):

Machine Learning ◽

Feature Detection ◽

State Of The Art ◽

High Accuracy ◽

Learning Systems ◽

Training Data ◽

Relevant Feature ◽

Explainable Ai ◽

The Right ◽

The Impact

The justification of an algorithm’s outcomes is important in many domains, and in particular in the law. However, previous research has shown that machine learning systems can make the right decisions for the wrong reasons: despite high accuracies, not all of the conditions that define the domain of the training data are learned. In this study, we investigate what the system does learn, using state-of-the-art explainable AI techniques. With the use of SHAP and LIME, we are able to show which features impact the decision making process and how the impact changes with different distributions of the training data. However, our results also show that even high accuracy and good relevant feature detection are no guarantee for a sound rationale. Hence these state-of-the-art explainable AI techniques cannot be used to fully expose unsound rationales, further advocating the need for a separate method for rationale evaluation.

Download Full-text