scholarly journals Knowledge-Driven Data Ecosystems Toward Data Transparency

2022 ◽  
Vol 14 (1) ◽  
pp. 1-12
Sandra Geisler ◽  
Maria-Esther Vidal ◽  
Cinzia Cappiello ◽  
Bernadette Farias Lóscio ◽  
Avigdor Gal ◽  

A data ecosystem (DE) offers a keystone-player or alliance-driven infrastructure that enables the interaction of different stakeholders and the resolution of interoperability issues among shared data. However, despite years of research in data governance and management, trustability is still affected by the absence of transparent and traceable data-driven pipelines. In this work, we focus on requirements and challenges that DEs face when ensuring data transparency. Requirements are derived from the data and organizational management, as well as from broader legal and ethical considerations. We propose a novel knowledge-driven DE architecture, providing the pillars for satisfying the analyzed requirements. We illustrate the potential of our proposal in a real-world scenario. Last, we discuss and rate the potential of the proposed architecture in the fulfillmentof these requirements.

2021 ◽  
Vol 8 (1) ◽  
Syed Iftikhar Hussain Shah ◽  
Vassilios Peristeras ◽  
Ioannis Magnisalis

AbstractThe public sector, private firms, business community, and civil society are generating data that is high in volume, veracity, velocity and comes from a diversity of sources. This kind of data is known as big data. Public Administrations (PAs) pursue big data as “new oil” and implement data-centric policies to transform data into knowledge, to promote good governance, transparency, innovative digital services, and citizens’ engagement in public policy. From the above, the Government Big Data Ecosystem (GBDE) emerges. Managing big data throughout its lifecycle becomes a challenging task for governmental organizations. Despite the vast interest in this ecosystem, appropriate big data management is still a challenge. This study intends to fill the above-mentioned gap by proposing a data lifecycle framework for data-driven governments. Through a Systematic Literature Review, we identified and analysed 76 data lifecycles models to propose a data lifecycle framework for data-driven governments (DaliF). In this way, we contribute to the ongoing discussion around big data management, which attracts researchers’ and practitioners’ interest.

2021 ◽  
Vol 13 (7) ◽  
pp. 168781402110277
Yankai Hou ◽  
Zhaosheng Zhang ◽  
Peng Liu ◽  
Chunbao Song ◽  
Zhenpo Wang

Accurate estimation of the degree of battery aging is essential to ensure safe operation of electric vehicles. In this paper, using real-world vehicles and their operational data, a battery aging estimation method is proposed based on a dual-polarization equivalent circuit (DPEC) model and multiple data-driven models. The DPEC model and the forgetting factor recursive least-squares method are used to determine the battery system’s ohmic internal resistance, with outliers being filtered using boxplots. Furthermore, eight common data-driven models are used to describe the relationship between battery degradation and the factors influencing this degradation, and these models are analyzed and compared in terms of both estimation accuracy and computational requirements. The results show that the gradient descent tree regression, XGBoost regression, and light GBM regression models are more accurate than the other methods, with root mean square errors of less than 6.9 mΩ. The AdaBoost and random forest regression models are regarded as alternative groups because of their relative instability. The linear regression, support vector machine regression, and k-nearest neighbor regression models are not recommended because of poor accuracy or excessively high computational requirements. This work can serve as a reference for subsequent battery degradation studies based on real-time operational data.

2020 ◽  
Vol 10 (1) ◽  
Simone Göttlich ◽  
Sven Spieckermann ◽  
Stephan Stauber ◽  
Andrea Storck

AbstractThe visualization of conveyor systems in the sense of a connected graph is a challenging problem. Starting from communication data provided by the IT system, graph drawing techniques are applied to generate an appealing layout of the conveyor system. From a mathematical point of view, the key idea is to use the concept of stress majorization to minimize a stress function over the positions of the nodes in the graph. Different to the already existing literature, we have to take care of special features inspired by the real-world problems.

2022 ◽  
Vol 54 (9) ◽  
pp. 1-36
Dylan Chou ◽  
Meng Jiang

Data-driven network intrusion detection (NID) has a tendency towards minority attack classes compared to normal traffic. Many datasets are collected in simulated environments rather than real-world networks. These challenges undermine the performance of intrusion detection machine learning models by fitting machine learning models to unrepresentative “sandbox” datasets. This survey presents a taxonomy with eight main challenges and explores common datasets from 1999 to 2020. Trends are analyzed on the challenges in the past decade and future directions are proposed on expanding NID into cloud-based environments, devising scalable models for large network data, and creating labeled datasets collected in real-world networks.

2021 ◽  
pp. 026638212110619
Sharon Richardson

During the past two decades, there have been a number of breakthroughs in the fields of data science and artificial intelligence, made possible by advanced machine learning algorithms trained through access to massive volumes of data. However, their adoption and use in real-world applications remains a challenge. This paper posits that a key limitation in making AI applicable has been a failure to modernise the theoretical frameworks needed to evaluate and adopt outcomes. Such a need was anticipated with the arrival of the digital computer in the 1950s but has remained unrealised. This paper reviews how the field of data science emerged and led to rapid breakthroughs in algorithms underpinning research into artificial intelligence. It then discusses the contextual framework now needed to advance the use of AI in real-world decisions that impact human lives and livelihoods.

2017 ◽  
Vol 19 (1) ◽  
pp. 127-139 ◽  
Jun Li ◽  
Eric M. Simmons ◽  
Martin D. Eastgate

A predictive analytics approach to understanding process mass intensity (PMI) is described. This method leverages real-world data to predict probable PMI outcomes for a potential synthetic route and to compare PMI outcomes to the summation of prior experience.

Longbing Cao ◽  
Chengqi Zhang

Quantitative intelligence based traditional data mining is facing grand challenges from real-world enterprise and cross-organization applications. For instance, the usual demonstration of specific algorithms cannot support business users to take actions to their advantage and needs. We think this is due to Quantitative Intelligence focused data-driven philosophy. It either views data mining as an autonomous data-driven, trial-and-error process, or only analyzes business issues in an isolated, case-by-case manner. Based on experience and lessons learnt from real-world data mining and complex systems, this article proposes a practical data mining methodology referred to as Domain-Driven Data Mining. On top of quantitative intelligence and hidden knowledge in data, domain-driven data mining aims to meta-synthesize quantitative intelligence and qualitative intelligence in mining complex applications in which human is in the loop. It targets actionable knowledge discovery in constrained environment for satisfying user preference. Domain-driven methodology consists of key components including understanding constrained environment, business-technical questionnaire, representing and involving domain knowledge, human-mining cooperation and interaction, constructing next-generation mining infrastructure, in-depth pattern mining and postprocessing, business interestingness and actionability enhancement, and loop-closed human-cooperated iterative refinement. Domain-driven data mining complements the data-driven methodology, the metasynthesis of qualitative intelligence and quantitative intelligence has potential to discover knowledge from complex systems, and enhance knowledge actionability for practical use by industry and business.

Electronics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 100
Daniele Apiletti ◽  
Eliana Pastor

Coffee is among the most popular beverages in many cities all over the world, being both at the core of the busiest shops and a long-standing tradition of recreational and social value for many people. Among the many coffee variants, espresso attracts the interest of different stakeholders: from citizens consuming espresso around the city, to local business activities, coffee-machine vendors and international coffee industries. The quality of espresso is one of the most discussed and investigated issues. So far, it has been addressed by means of human experts, electronic noses, and chemical approaches. The current work, instead, proposes a data-driven approach exploiting association rule mining. We analyze a real-world dataset of espresso brewing by professional coffee-making machines, and extract all correlations among external quality-influencing variables and actual metrics determining the quality of the espresso. Thanks to the application of association rule mining, a powerful data-driven exhaustive and explainable approach, results are expressed in the form of human-readable rules combining the variables of interest, such as the grinder settings, the extraction time, and the dose amount. Novel insights from real-world coffee extractions collected on the field are presented, together with a data-driven approach, able to uncover insights into the espresso quality and its impact on both the life of consumers and the choices of coffee-making industries.

2019 ◽  
Vol 33 (1) ◽  
pp. 214-237
Hannu Hannila ◽  
Joni Koskinen ◽  
Janne Harkonen ◽  
Harri Haapasalo

Purpose The purpose of this paper is to analyse current challenges and to articulate the preconditions for data-driven, fact-based product portfolio management (PPM) based on commercial and technical product structures, critical business processes, corporate business IT and company data assets. Here, data assets were classified from a PPM perspective in terms of (product/customer/supplier) master data, transaction data and Internet of Things data. The study also addresses the supporting role of corporate-level data governance. Design/methodology/approach The study combines a literature review and qualitative analysis of empirical data collected from eight international companies of varying size. Findings Companies’ current inability to analyse products effectively based on existing data is surprising. The present findings identify a number of preconditions for data-driven, fact-based PPM, including mutual understanding of company products (to establish a consistent commercial and technical product structure), product classification as strategic, supportive or non-strategic (to link commercial and technical product structures with product strategy) and a holistic, corporate-level data model for adjusting the company’s business IT (to support product portfolio visualisation). Practical implications The findings provide a logical and empirical basis for fact-based, product-level analysis of product profitability and analysis of the product portfolio over the product life cycle, supporting a data-driven approach to the optimisation of commercial and technical product structure, business IT systems and company product strategy. As a virtual representation of reality, the company data model facilitates product visualisation. The findings are of great practical value, as they demonstrate the significance of corporate-level data assets, data governance and business-critical data for managing a company’s products and portfolio. Originality/value The study contributes to the existing literature by specifying the preconditions for data-driven, fact-based PPM as a basis for product-level analysis and decision making, emphasising the role of company data assets and clarifying the links between business processes, information systems and data assets for PPM.

Sign in / Sign up

Export Citation Format

Share Document