Fair Engineering of Machine Learning Systems – Lessons Learned from a Literature Review

Federated learning is an emerging machine learning paradigm where clients train models locally and formulate a global model based on the local model updates. To identify the state-of-the-art in federated learning and explore how to develop federated learning systems, we perform a systematic literature review from a software engineering perspective, based on 231 primary studies. Our data synthesis covers the lifecycle of federated learning system development that includes background understanding, requirement analysis, architecture design, implementation, and evaluation. We highlight and summarise the findings from the results and identify future trends to encourage researchers to advance their current work.

Download Full-text

Platform for Analysing and Encouraging Student Activity on Contest and E-learning Systems

OLYMPIADS IN INFORMATICS ◽

10.15388/ioi.2018.07 ◽

2018 ◽

Vol 12 ◽

pp. 85-98

Author(s):

Bojan Kostadinov ◽

Mile Jovanov ◽

Emil STANKOV

Keyword(s):

Machine Learning ◽

Data Collection ◽

Educational Policy ◽

Learning Systems ◽

Data Sources ◽

Or Education ◽

Student Activity ◽

The World ◽

E Learning ◽

Analyse Data

Data collection and machine learning are changing the world. Whether it is medicine, sports or education, companies and institutions are investing a lot of time and money in systems that gather, process and analyse data. Likewise, to improve competitiveness, a lot of countries are making changes to their educational policy by supporting STEM disciplines. Therefore, it’s important to put effort into using various data sources to help students succeed in STEM. In this paper, we present a platform that can analyse student’s activity on various contest and e-learning systems, combine and process the data, and then present it in various ways that are easy to understand. This in turn enables teachers and organizers to recognize talented and hardworking students, identify issues, and/or motivate students to practice and work on areas where they’re weaker.

Download Full-text

A Systematic and Comprehensive Literature Review on the Application of Machine Learning in Software Estimation

SSRN Electronic Journal ◽

10.2139/ssrn.3447006 ◽

2019 ◽

Author(s):

Pooja Jayaprakash ◽

Pradeep kumar Kalampukatt

Keyword(s):

Machine Learning ◽

Literature Review ◽

Comprehensive Literature Review ◽

Software Estimation

Download Full-text

Paper2Wire – A Case Study of User-Centred Development of Machine Learning Tools for UX Designers

i-com ◽

10.1515/icom-2021-0002 ◽

2021 ◽

Vol 20 (1) ◽

pp. 19-32

Author(s):

Daniel Buschek ◽

Charlotte Anlauff ◽

Florian Lachner

Keyword(s):

Machine Learning ◽

Development Process ◽

User Study ◽

Concept Development ◽

Lessons Learned ◽

Design Tool ◽

Learning Tools ◽

Interface Elements ◽

Industry Partner

Abstract This paper reflects on a case study of a user-centred concept development process for a Machine Learning (ML) based design tool, conducted at an industry partner. The resulting concept uses ML to match graphical user interface elements in sketches on paper to their digital counterparts to create consistent wireframes. A user study (N=20) with a working prototype shows that this concept is preferred by designers, compared to the previous manual procedure. Reflecting on our process and findings we discuss lessons learned for developing ML tools that respect practitioners’ needs and practices.

Download Full-text

The graph neural networking challenge

ACM SIGCOMM Computer Communication Review ◽

10.1145/3477482.3477485 ◽

2021 ◽

Vol 51 (3) ◽

pp. 9-16

Author(s):

José Suárez-Varela ◽

Miquel Ferriol-Galmés ◽

Albert López ◽

Paul Almasan ◽

Guillermo Bernárdez ◽

...

Keyword(s):

Machine Learning ◽

Computer Networks ◽

Real World ◽

Large Scale ◽

Lessons Learned ◽

Educational Resources ◽

Global Competition ◽

International Telecommunication Union ◽

International Telecommunication ◽

Broad Audience

During the last decade, Machine Learning (ML) has increasingly become a hot topic in the field of Computer Networks and is expected to be gradually adopted for a plethora of control, monitoring and management tasks in real-world deployments. This poses the need to count on new generations of students, researchers and practitioners with a solid background in ML applied to networks. During 2020, the International Telecommunication Union (ITU) has organized the "ITU AI/ML in 5G challenge", an open global competition that has introduced to a broad audience some of the current main challenges in ML for networks. This large-scale initiative has gathered 23 different challenges proposed by network operators, equipment manufacturers and academia, and has attracted a total of 1300+ participants from 60+ countries. This paper narrates our experience organizing one of the proposed challenges: the "Graph Neural Networking Challenge 2020". We describe the problem presented to participants, the tools and resources provided, some organization aspects and participation statistics, an outline of the top-3 awarded solutions, and a summary with some lessons learned during all this journey. As a result, this challenge leaves a curated set of educational resources openly available to anyone interested in the topic.

Download Full-text

Federated Learning in a Medical Context: A Systematic Literature Review

ACM Transactions on Internet Technology ◽

10.1145/3412357 ◽

2021 ◽

Vol 21 (2) ◽

pp. 1-31

Author(s):

Bjarne Pfitzner ◽

Nico Steckhan ◽

Bert Arnrich

Keyword(s):

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Data Privacy ◽

Research Area ◽

Learning Models ◽

Related Data ◽

Private Data ◽

Large Databases ◽

Machine Learning Models

Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients’ anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.

Download Full-text

Artificial Intelligence and Behavioral Science Through the Looking Glass: Challenges for Real-World Application

Annals of Behavioral Medicine ◽

10.1093/abm/kaaa095 ◽

2020 ◽

Vol 54 (12) ◽

pp. 942-947

Author(s):

Pol Mac Aonghusa ◽

Susan Michie

Keyword(s):

Climate Change ◽

Artificial Intelligence ◽

Machine Learning ◽

Behavior Change ◽

Behavioral Science ◽

Lessons Learned ◽

Learning Approaches ◽

Intervention Evaluation ◽

Research Activities ◽

Behavior Change Interventions

Abstract Background Artificial Intelligence (AI) is transforming the process of scientific research. AI, coupled with availability of large datasets and increasing computational power, is accelerating progress in areas such as genetics, climate change and astronomy [NeurIPS 2019 Workshop Tackling Climate Change with Machine Learning, Vancouver, Canada; Hausen R, Robertson BE. Morpheus: A deep learning framework for the pixel-level analysis of astronomical image data. Astrophys J Suppl Ser. 2020;248:20; Dias R, Torkamani A. AI in clinical and genomic diagnostics. Genome Med. 2019;11:70.]. The application of AI in behavioral science is still in its infancy and realizing the promise of AI requires adapting current practices. Purposes By using AI to synthesize and interpret behavior change intervention evaluation report findings at a scale beyond human capability, the HBCP seeks to improve the efficiency and effectiveness of research activities. We explore challenges facing AI adoption in behavioral science through the lens of lessons learned during the Human Behaviour-Change Project (HBCP). Methods The project used an iterative cycle of development and testing of AI algorithms. Using a corpus of published research reports of randomized controlled trials of behavioral interventions, behavioral science experts annotated occurrences of interventions and outcomes. AI algorithms were trained to recognize natural language patterns associated with interventions and outcomes from the expert human annotations. Once trained, the AI algorithms were used to predict outcomes for interventions that were checked by behavioral scientists. Results Intervention reports contain many items of information needing to be extracted and these are expressed in hugely variable and idiosyncratic language used in research reports to convey information makes developing algorithms to extract all the information with near perfect accuracy impractical. However, statistical matching algorithms combined with advanced machine learning approaches created reasonably accurate outcome predictions from incomplete data. Conclusions AI holds promise for achieving the goal of predicting outcomes of behavior change interventions, based on information that is automatically extracted from intervention evaluation reports. This information can be used to train knowledge systems using machine learning and reasoning algorithms.

Download Full-text

Data Mining-based Financial Statement Fraud Detection: Systematic Literature Review and Meta-analysis to Estimate Data Sample Mapping of Fraudulent Companies Against Non-fraudulent Companies

Global Business Review ◽

10.1177/0972150920984857 ◽

2021 ◽

pp. 097215092098485

Author(s):

Sonika Gupta ◽

Sushil Kumar Mehta

Keyword(s):

Machine Learning ◽

Data Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Classification Accuracy ◽

Meta Analysis ◽

Financial Statement ◽

Research Articles ◽

Financial Statement Fraud ◽

Data Mining Techniques

Data mining techniques have proven quite effective not only in detecting financial statement frauds but also in discovering other financial crimes, such as credit card frauds, loan and security frauds, corporate frauds, bank and insurance frauds, etc. Classification of data mining techniques, in recent years, has been accepted as one of the most credible methodologies for the detection of symptoms of financial statement frauds through scanning the published financial statements of companies. The retrieved literature that has used data mining classification techniques can be broadly categorized on the basis of the type of technique applied, as statistical techniques and machine learning techniques. The biggest challenge in executing the classification process using data mining techniques lies in collecting the data sample of fraudulent companies and mapping the sample of fraudulent companies against non-fraudulent companies. In this article, a systematic literature review (SLR) of studies from the area of financial statement fraud detection has been conducted. The review has considered research articles published between 1995 and 2020. Further, a meta-analysis has been performed to establish the effect of data sample mapping of fraudulent companies against non-fraudulent companies on the classification methods through comparing the overall classification accuracy reported in the literature. The retrieved literature indicates that a fraudulent sample can either be equally paired with non-fraudulent sample (1:1 data mapping) or be unequally mapped using 1:many ratio to increase the sample size proportionally. Based on the meta-analysis of the research articles, it can be concluded that machine learning approaches, in comparison to statistical approaches, can achieve better classification accuracy, particularly when the availability of sample data is low. High classification accuracy can be obtained with even a 1:1 mapping data set using machine learning classification approaches.

Download Full-text