Knowledge-Driven Data Ecosystems Toward Data Transparency

Sandra Geisler; Maria-Esther Vidal; Cinzia Cappiello; Bernadette Farias Lóscio; Avigdor Gal; Matthias Jarke; Maurizio Lenzerini; Paolo Missier; Boris Otto; Elda Paja; Barbara Pernici; Jakob Rehof

doi:10.1145/3467022

Knowledge-Driven Data Ecosystems Toward Data Transparency

Journal of Data and Information Quality ◽

10.1145/3467022 ◽

2022 ◽

Vol 14 (1) ◽

pp. 1-12

Author(s):

Sandra Geisler ◽

Maria-Esther Vidal ◽

Cinzia Cappiello ◽

Bernadette Farias Lóscio ◽

Avigdor Gal ◽

...

Keyword(s):

Real World ◽

Organizational Management ◽

Data Driven ◽

Data Governance ◽

Ethical Considerations ◽

Shared Data ◽

Data Transparency ◽

Data Ecosystem

A data ecosystem (DE) offers a keystone-player or alliance-driven infrastructure that enables the interaction of different stakeholders and the resolution of interoperability issues among shared data. However, despite years of research in data governance and management, trustability is still affected by the absence of transparent and traceable data-driven pipelines. In this work, we focus on requirements and challenges that DEs face when ensuring data transparency. Requirements are derived from the data and organizational management, as well as from broader legal and ethical considerations. We propose a novel knowledge-driven DE architecture, providing the pillars for satisfying the analyzed requirements. We illustrate the potential of our proposal in a real-world scenario. Last, we discuss and rate the potential of the proposed architecture in the fulfillmentof these requirements.

Download Full-text

DaLiF: a data lifecycle framework for data-driven governments

Journal Of Big Data ◽

10.1186/s40537-021-00481-3 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Syed Iftikhar Hussain Shah ◽

Vassilios Peristeras ◽

Ioannis Magnisalis

Keyword(s):

Big Data ◽

Data Management ◽

Good Governance ◽

Business Community ◽

Data Driven ◽

Governmental Organizations ◽

Data Lifecycle ◽

The Government ◽

Data Ecosystem ◽

Governance Transparency

AbstractThe public sector, private firms, business community, and civil society are generating data that is high in volume, veracity, velocity and comes from a diversity of sources. This kind of data is known as big data. Public Administrations (PAs) pursue big data as “new oil” and implement data-centric policies to transform data into knowledge, to promote good governance, transparency, innovative digital services, and citizens’ engagement in public policy. From the above, the Government Big Data Ecosystem (GBDE) emerges. Managing big data throughout its lifecycle becomes a challenging task for governmental organizations. Despite the vast interest in this ecosystem, appropriate big data management is still a challenge. This study intends to fill the above-mentioned gap by proposing a data lifecycle framework for data-driven governments. Through a Systematic Literature Review, we identified and analysed 76 data lifecycles models to propose a data lifecycle framework for data-driven governments (DaliF). In this way, we contribute to the ongoing discussion around big data management, which attracts researchers’ and practitioners’ interest.

Download Full-text

Research on a novel data-driven aging estimation method for battery systems in real-world electric vehicles

Advances in Mechanical Engineering ◽

10.1177/16878140211027735 ◽

2021 ◽

Vol 13 (7) ◽

pp. 168781402110277

Author(s):

Yankai Hou ◽

Zhaosheng Zhang ◽

Peng Liu ◽

Chunbao Song ◽

Zhenpo Wang

Keyword(s):

Electric Vehicles ◽

Real World ◽

Regression Models ◽

Estimation Method ◽

Recursive Least Squares ◽

Data Driven ◽

Accurate Estimation ◽

Support Vector ◽

Battery Degradation ◽

Operational Data

Accurate estimation of the degree of battery aging is essential to ensure safe operation of electric vehicles. In this paper, using real-world vehicles and their operational data, a battery aging estimation method is proposed based on a dual-polarization equivalent circuit (DPEC) model and multiple data-driven models. The DPEC model and the forgetting factor recursive least-squares method are used to determine the battery system’s ohmic internal resistance, with outliers being filtered using boxplots. Furthermore, eight common data-driven models are used to describe the relationship between battery degradation and the factors influencing this degradation, and these models are analyzed and compared in terms of both estimation accuracy and computational requirements. The results show that the gradient descent tree regression, XGBoost regression, and light GBM regression models are more accurate than the other methods, with root mean square errors of less than 6.9 mΩ. The AdaBoost and random forest regression models are regarded as alternative groups because of their relative instability. The linear regression, support vector machine regression, and k-nearest neighbor regression models are not recommended because of poor accuracy or excessively high computational requirements. This work can serve as a reference for subsequent battery degradation studies based on real-time operational data.

Download Full-text

Data-driven Energy Management Strategy for Plug-in Hybrid Electric Vehicles with Real-World Trip Information

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.1070 ◽

2020 ◽

Vol 53 (2) ◽

pp. 14224-14229

Author(s):

Yongkeun Choi ◽

Jacopo Guanetti ◽

Scott Moura ◽

Francesco Borrelli

Keyword(s):

Energy Management ◽

Electric Vehicles ◽

Real World ◽

Management Strategy ◽

Hybrid Electric Vehicles ◽

Data Driven ◽

Energy Management Strategy ◽

Hybrid Electric

Download Full-text

Data-driven graph drawing techniques with applications for conveyor systems

Journal of Mathematics in Industry ◽

10.1186/s13362-020-00092-2 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Simone Göttlich ◽

Sven Spieckermann ◽

Stephan Stauber ◽

Andrea Storck

Keyword(s):

Real World ◽

Connected Graph ◽

Stress Function ◽

Graph Drawing ◽

Point Of View ◽

Data Driven ◽

Challenging Problem ◽

Conveyor System ◽

System Graph ◽

Real World Problems

AbstractThe visualization of conveyor systems in the sense of a connected graph is a challenging problem. Starting from communication data provided by the IT system, graph drawing techniques are applied to generate an appealing layout of the conveyor system. From a mathematical point of view, the key idea is to use the concept of stress majorization to minimize a stress function over the positions of the nodes in the graph. Different to the already existing literature, we have to take care of special features inspired by the real-world problems.

Download Full-text

A Survey on Data-driven Network Intrusion Detection

ACM Computing Surveys ◽

10.1145/3472753 ◽

2022 ◽

Vol 54 (9) ◽

pp. 1-36

Author(s):

Dylan Chou ◽

Meng Jiang

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Real World ◽

Data Driven ◽

Network Intrusion Detection ◽

Large Network ◽

Learning Models ◽

Simulated Environments ◽

Network Intrusion ◽

Machine Learning Models

Data-driven network intrusion detection (NID) has a tendency towards minority attack classes compared to normal traffic. Many datasets are collected in simulated environments rather than real-world networks. These challenges undermine the performance of intrusion detection machine learning models by fitting machine learning models to unrepresentative “sandbox” datasets. This survey presents a taxonomy with eight main challenges and explores common datasets from 1999 to 2020. Trends are analyzed on the challenges in the past decade and future directions are proposed on expanding NID into cloud-based environments, devising scalable models for large network data, and creating labeled datasets collected in real-world networks.

Download Full-text

Against generalisation: Data-driven decisions need context to be human-compatible

Business Information Review ◽

10.1177/02663821211061986 ◽

2021 ◽

pp. 026638212110619

Author(s):

Sharon Richardson

Keyword(s):

Artificial Intelligence ◽

Real World ◽

Data Science ◽

Machine Learning Algorithms ◽

Data Driven ◽

Theoretical Frameworks ◽

The Past ◽

Real World Applications ◽

Contextual Framework ◽

Data Driven Decisions

During the past two decades, there have been a number of breakthroughs in the fields of data science and artificial intelligence, made possible by advanced machine learning algorithms trained through access to massive volumes of data. However, their adoption and use in real-world applications remains a challenge. This paper posits that a key limitation in making AI applicable has been a failure to modernise the theoretical frameworks needed to evaluate and adopt outcomes. Such a need was anticipated with the arrival of the digital computer in the 1950s but has remained unrealised. This paper reviews how the field of data science emerged and led to rapid breakthroughs in algorithms underpinning research into artificial intelligence. It then discusses the contextual framework now needed to advance the use of AI in real-world decisions that impact human lives and livelihoods.

Download Full-text

A data-driven strategy for predicting greenness scores, rationally comparing synthetic routes and benchmarking PMI outcomes for the synthesis of molecules in the pharmaceutical industry

Green Chemistry ◽

10.1039/c6gc02359b ◽

2017 ◽

Vol 19 (1) ◽

pp. 127-139 ◽

Cited By ~ 26

Author(s):

Jun Li ◽

Eric M. Simmons ◽

Martin D. Eastgate

Keyword(s):

Pharmaceutical Industry ◽

Real World ◽

Predictive Analytics ◽

Prior Experience ◽

Data Driven ◽

Synthetic Route ◽

Real World Data ◽

World Data ◽

Synthetic Routes

A predictive analytics approach to understanding process mass intensity (PMI) is described. This method leverages real-world data to predict probable PMI outcomes for a potential synthetic route and to compare PMI outcomes to the summation of prior experience.

Download Full-text

Domain Driven Data Mining

Data Mining and Knowledge Discovery Technologies ◽

10.4018/978-1-59904-960-1.ch008 ◽

2008 ◽

pp. 196-223 ◽

Cited By ~ 1

Author(s):

Longbing Cao ◽

Chengqi Zhang

Keyword(s):

Data Mining ◽

Complex Systems ◽

Real World ◽

Domain Knowledge ◽

Pattern Mining ◽

Iterative Refinement ◽

User Preference ◽

Data Driven ◽

Real World Data ◽

Hidden Knowledge

Quantitative intelligence based traditional data mining is facing grand challenges from real-world enterprise and cross-organization applications. For instance, the usual demonstration of specific algorithms cannot support business users to take actions to their advantage and needs. We think this is due to Quantitative Intelligence focused data-driven philosophy. It either views data mining as an autonomous data-driven, trial-and-error process, or only analyzes business issues in an isolated, case-by-case manner. Based on experience and lessons learnt from real-world data mining and complex systems, this article proposes a practical data mining methodology referred to as Domain-Driven Data Mining. On top of quantitative intelligence and hidden knowledge in data, domain-driven data mining aims to meta-synthesize quantitative intelligence and qualitative intelligence in mining complex applications in which human is in the loop. It targets actionable knowledge discovery in constrained environment for satisfying user preference. Domain-driven methodology consists of key components including understanding constrained environment, business-technical questionnaire, representing and involving domain knowledge, human-mining cooperation and interaction, constructing next-generation mining infrastructure, in-depth pattern mining and postprocessing, business interestingness and actionability enhancement, and loop-closed human-cooperated iterative refinement. Domain-driven data mining complements the data-driven methodology, the metasynthesis of qualitative intelligence and quantitative intelligence has potential to discover knowledge from complex systems, and enhance knowledge actionability for practical use by industry and business.

Download Full-text

Correlating Espresso Quality with Coffee-Machine Parameters by Means of Association Rule Mining

Electronics ◽

10.3390/electronics9010100 ◽

2020 ◽

Vol 9 (1) ◽

pp. 100

Author(s):

Daniele Apiletti ◽

Eliana Pastor

Keyword(s):

Real World ◽

Association Rule ◽

Association Rule Mining ◽

Data Driven ◽

Rule Mining ◽

Electronic Noses ◽

Data Driven Approach ◽

The Many ◽

Coffee Machine

Coffee is among the most popular beverages in many cities all over the world, being both at the core of the busiest shops and a long-standing tradition of recreational and social value for many people. Among the many coffee variants, espresso attracts the interest of different stakeholders: from citizens consuming espresso around the city, to local business activities, coffee-machine vendors and international coffee industries. The quality of espresso is one of the most discussed and investigated issues. So far, it has been addressed by means of human experts, electronic noses, and chemical approaches. The current work, instead, proposes a data-driven approach exploiting association rule mining. We analyze a real-world dataset of espresso brewing by professional coffee-making machines, and extract all correlations among external quality-influencing variables and actual metrics determining the quality of the espresso. Thanks to the application of association rule mining, a powerful data-driven exhaustive and explainable approach, results are expressed in the form of human-readable rules combining the variables of interest, such as the grinder settings, the extraction time, and the dose amount. Novel insights from real-world coffee extractions collected on the field are presented, together with a data-driven approach, able to uncover insights into the espresso quality and its impact on both the life of consumers and the choices of coffee-making industries.

Download Full-text

Product-level profitability

Journal of Enterprise Information Management ◽

10.1108/jeim-05-2019-0127 ◽

2019 ◽

Vol 33 (1) ◽

pp. 214-237

Author(s):

Hannu Hannila ◽

Joni Koskinen ◽

Janne Harkonen ◽

Harri Haapasalo

Keyword(s):

Business Processes ◽

Data Driven ◽

Product Portfolio ◽

Data Governance ◽

Content Type ◽

Technical Product ◽

Product Level ◽

Level Data ◽

Corporate Level

Purpose The purpose of this paper is to analyse current challenges and to articulate the preconditions for data-driven, fact-based product portfolio management (PPM) based on commercial and technical product structures, critical business processes, corporate business IT and company data assets. Here, data assets were classified from a PPM perspective in terms of (product/customer/supplier) master data, transaction data and Internet of Things data. The study also addresses the supporting role of corporate-level data governance. Design/methodology/approach The study combines a literature review and qualitative analysis of empirical data collected from eight international companies of varying size. Findings Companies’ current inability to analyse products effectively based on existing data is surprising. The present findings identify a number of preconditions for data-driven, fact-based PPM, including mutual understanding of company products (to establish a consistent commercial and technical product structure), product classification as strategic, supportive or non-strategic (to link commercial and technical product structures with product strategy) and a holistic, corporate-level data model for adjusting the company’s business IT (to support product portfolio visualisation). Practical implications The findings provide a logical and empirical basis for fact-based, product-level analysis of product profitability and analysis of the product portfolio over the product life cycle, supporting a data-driven approach to the optimisation of commercial and technical product structure, business IT systems and company product strategy. As a virtual representation of reality, the company data model facilitates product visualisation. The findings are of great practical value, as they demonstrate the significance of corporate-level data assets, data governance and business-critical data for managing a company’s products and portfolio. Originality/value The study contributes to the existing literature by specifying the preconditions for data-driven, fact-based PPM as a basis for product-level analysis and decision making, emphasising the role of company data assets and clarifying the links between business processes, information systems and data assets for PPM.

Download Full-text