scholarly journals A Human-AI Loop Approach for Joint Keyword Discovery and Expectation Estimation in Micropost Event Detection

2020 ◽  
Vol 34 (03) ◽  
pp. 2451-2458
Author(s):  
Akansha Bhardwaj ◽  
Jie Yang ◽  
Philippe Cudré-Mauroux

Microblogging platforms such as Twitter are increasingly being used in event detection. Existing approaches mainly use machine learning models and rely on event-related keywords to collect the data for model training. These approaches make strong assumptions on the distribution of the relevant microposts containing the keyword – referred to as the expectation of the distribution – and use it as a posterior regularization parameter during model training. Such approaches are, however, limited as they fail to reliably estimate the informativeness of a keyword and its expectation for model training. This paper introduces a Human-AI loop approach to jointly discover informative keywords for model training while estimating their expectation. Our approach iteratively leverages the crowd to estimate both keyword-specific expectation and the disagreement between the crowd and the model in order to discover new keywords that are most beneficial for model training. These keywords and their expectation not only improve the resulting performance but also make the model training process more transparent. We empirically demonstrate the merits of our approach, both in terms of accuracy and interpretability, on multiple real-world datasets and show that our approach improves the state of the art by 24.3%.

Author(s):  
Yongkai Wu ◽  
Lu Zhang ◽  
Xintao Wu

Fairness-aware learning studies the problem of building machine learning models that are subject to fairness requirements. Counterfactual fairness is a notion of fairness derived from Pearl's causal model, which considers a model is fair if for a particular individual or group its prediction in the real world is the same as that in the counterfactual world where the individual(s) had belonged to a different demographic group. However, an inherent limitation of counterfactual fairness is that it cannot be uniquely quantified from the observational data in certain situations, due to the unidentifiability of the counterfactual quantity. In this paper, we address this limitation by mathematically bounding the unidentifiable counterfactual quantity, and develop a theoretically sound algorithm for constructing counterfactually fair classifiers. We evaluate our method in the experiments using both synthetic and real-world datasets, as well as compare with existing methods. The results validate our theory and show the effectiveness of our method.


2022 ◽  
Vol 54 (9) ◽  
pp. 1-36
Author(s):  
Dylan Chou ◽  
Meng Jiang

Data-driven network intrusion detection (NID) has a tendency towards minority attack classes compared to normal traffic. Many datasets are collected in simulated environments rather than real-world networks. These challenges undermine the performance of intrusion detection machine learning models by fitting machine learning models to unrepresentative “sandbox” datasets. This survey presents a taxonomy with eight main challenges and explores common datasets from 1999 to 2020. Trends are analyzed on the challenges in the past decade and future directions are proposed on expanding NID into cloud-based environments, devising scalable models for large network data, and creating labeled datasets collected in real-world networks.


2021 ◽  
Vol 6 ◽  
pp. 309
Author(s):  
Paul Mwaniki ◽  
Timothy Kamanu ◽  
Samuel Akech ◽  
M. J. C Eijkemans

Introduction: Epidemiological studies that involve interpretation of chest radiographs (CXRs) suffer from inter-reader and intra-reader variability. Inter-reader and intra-reader variability hinder comparison of results from different studies or centres, which negatively affects efforts to track the burden of chest diseases or evaluate the efficacy of interventions such as vaccines. This study explores machine learning models that could standardize interpretation of CXR across studies and the utility of incorporating individual reader annotations when training models using CXR data sets annotated by multiple readers. Methods: Convolutional neural networks were used to classify CXRs from seven low to middle-income countries into five categories according to the World Health Organization's standardized methodology for interpreting paediatric CXRs. We compared models trained to predict the final/aggregate classification with models trained to predict how each reader would classify an image and then aggregate predictions for all readers using unweighted mean. Results: Incorporating individual reader's annotations during model training improved classification accuracy by 3.4% (multi-class accuracy 61% vs 59%). Model accuracy was higher for children above 12 months of age (68% vs 58%). The accuracy of the models in different countries ranged between 45% and 71%. Conclusions: Machine learning models can annotate CXRs in epidemiological studies reducing inter-reader and intra-reader variability. In addition, incorporating individual reader annotations can improve the performance of machine learning models trained using CXRs annotated by multiple readers.


2021 ◽  
Author(s):  
Chih-Kuan Yeh ◽  
Been Kim ◽  
Pradeep Ravikumar

Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which parts of that particular image is important to the model, but humans may prefer an explanation which explains the prediction by concept-based thinking. In this work, we review the emerging area of concept based explanations. We start by introducing concept explanations including the class of Concept Activation Vectors (CAV) which characterize concepts using vectors in appropriate spaces of neural activations, and discuss different properties of useful concepts, and approaches to measure the usefulness of concept vectors. We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.


2021 ◽  
pp. 164-184
Author(s):  
Saiph Savage ◽  
Carlos Toxtli ◽  
Eber Betanzos-Torres

The artificial intelligence (AI) industry has created new jobs that are essential to the real world deployment of intelligent systems. Part of the job focuses on labelling data for machine learning models or having workers complete tasks that AI alone cannot do. These workers are usually known as ‘crowd workers’—they are part of a large distributed crowd that is jointly (but separately) working on the tasks although they are often invisible to end-users, leading to workers often being paid below minimum wage and having limited career growth. In this chapter, we draw upon the field of human–computer interaction to provide research methods for studying and empowering crowd workers. We present our Computational Worker Leagues which enable workers to work towards their desired professional goals and also supply quantitative information about crowdsourcing markets. This chapter demonstrates the benefits of this approach and highlights important factors to consider when researching the experiences of crowd workers.


2021 ◽  
Author(s):  
Bruno Barbosa Miranda de Paiva ◽  
Polianna Delfino Pereira ◽  
Claudio Moises Valiense de Andrade ◽  
Virginia Mara Reis Gomes ◽  
Maria Clara Pontello Barbosa Lima ◽  
...  

Objective: To provide a thorough comparative study among state ofthe art machine learning methods and statistical methods for determining in-hospital mortality in COVID 19 patients using data upon hospital admission; to study the reliability of the predictions of the most effective methods by correlating the probability of the outcome and the accuracy of the methods; to investigate how explainable are the predictions produced by the most effective methods. Materials and Methods: De-identified data were obtained from COVID 19 positive patients in 36 participating hospitals, from March 1 to September 30, 2020. Demographic, comorbidity, clinical presentation and laboratory data were used as training data to develop COVID 19 mortality prediction models. Multiple machine learning and traditional statistics models were trained on this prediction task using a folded cross validation procedure, from which we assessed performance and interpretability metrics. Results: The Stacking of machine learning models improved over the previous state of the art results by more than 26% in predicting the class of interest (death), achieving 87.1% of AUROC and macroF1 of 73.9%. We also show that some machine learning models can be very interpretable and reliable, yielding more accurate predictions while providing a good explanation for the why. Conclusion: The best results were obtained using the meta learning ensemble model Stacking. State of the art explainability techniques such as SHAP values can be used to draw useful insights into the patterns learned by machine-learning algorithms. Machine learning models can be more explainable than traditional statistics models while also yielding highly reliable predictions. Key words: COVID-19; prognosis; prediction model; machine learning


2020 ◽  
Vol 39 (3) ◽  
pp. 713-756 ◽  
Author(s):  
A. Chatzimparmpas ◽  
R. M. Martins ◽  
I. Jusufi ◽  
K. Kucher ◽  
F. Rossi ◽  
...  

Author(s):  
Xueru Zhang ◽  
Mohammad Mahdi Khalili ◽  
Mingyan Liu

Machine learning models developed from real-world data can inherit potential, preexisting bias in the dataset. When these models are used to inform decisions involving human beings, fairness concerns inevitably arise. Imposing certain fairness constraints in the training of models can be effective only if appropriate criteria are applied. However, a fairness criterion can be defined/assessed only when the interaction between the decisions and the underlying population is well understood. We introduce two feedback models describing how people react when receiving machine-aided decisions and illustrate that some commonly used fairness criteria can end with undesirable consequences while reinforcing discrimination.


Sign in / Sign up

Export Citation Format

Share Document