Parameter tuning: Exposing the gap between data curation and effective data analytics

Abstract This paper proposes data-analytics-based factory operation strategies for the quality enhancement of die-casting. We first define the four main problems of die casting that result in lower quality: [P1] gaps between the input and output casting parameter values, [P2] occurrence of preheat shots, [P3] lateness of defect distinction, and [P4] worker-experience-based casting parameter tuning. To address these four problems, we derived seven tasks that should be conducted during factory operation: [T1] implementation of exploratory data analysis (EDA) for investigating the trends and correlations between data, [T2] deduction of the optimal casting parameter output values for the production of fair-quality products, [T3] deduction of the upper and lower control limits for casting parameter input–output gap management, [T4] development of a preheat shot diagnosis algorithm, [T5] development of a defect prediction algorithm, [T6] development of a defect cause diagnosis algorithm, and [T7] development of a casting parameter tuning algorithm. The details of the proposed data-analytics-based factory operation strategies with regard to the casting parameter input and output data, data preprocessing, data analytics method used, and implementation are presented and discussed. Finally, a case study of a die-casting factory in South Korea that has adopted the proposed strategies is introduced.

Download Full-text

Data Curation in Practice: Extract Tabular Data from PDF Files Using a Data Analytics Tool

Journal of eScience Librarianship ◽

10.7191/jeslib.2021.1209 ◽

2021 ◽

Vol 10 (3) ◽

Author(s):

Allis J. Choi ◽

Xuying Xin

Keyword(s):

Data Analytics ◽

Research Data ◽

Data Curation ◽

Data Repository ◽

State University ◽

Tabular Data ◽

Data Discovery ◽

Penn State ◽

Efficient Data ◽

General Data

Data curation is the process of managing data to make it available for reuse and preservation and to allow FAIR (findable, accessible, interoperable, reusable) uses. It is an important part of the research lifecycle as researchers are often either required by funders or generally encouraged to preserve the dataset and make it discoverable and reusable. This has been especially important as the Open Access (OA) policy is being implemented in many institutions across the nation. In facilitating research data discovery and enhancing its easier reuse, an efficient data repository and its data curation play key roles. In this article, we briefly discuss the local institutional repository at Penn State University and the general data curation practices we adopt for the deposited files and datasets, then we focus on a data analytics tool that has recently been applied to extract tabular data from PDF files. This is an enhancement to the existing data curation practices as it adds additional tabular data to deposits with PDF files where tables are often embedded and not easily reused.

Download Full-text

Accessible data curation and analytics for international-scale citizen science datasets

Scientific Data ◽

10.1038/s41597-021-01071-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Benjamin Murray ◽

Eric Kerfoot ◽

Liyuan Chen ◽

Jie Deng ◽

Mark S. Graham ◽

...

Keyword(s):

Citizen Science ◽

Open Source Software ◽

Data Analytics ◽

Data Curation ◽

Reproducible Research ◽

Self Assessment ◽

Commodity Hardware ◽

Alternative Technologies ◽

International Research Group ◽

Open Source Software Package

AbstractThe Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. As of May 23rd, 2021, over 5 million participants have collectively logged over 360 million self-assessment reports since its introduction in March 2020. The success of the Covid Symptom Study creates significant technical challenges around effective data curation. The primary issue is scale. The size of the dataset means that it can no longer be readily processed using standard Python-based data analytics software such as Pandas on commodity hardware. Alternative technologies exist but carry a higher technical complexity and are less accessible to many researchers. We present ExeTera, a Python-based open source software package designed to provide Pandas-like data analytics on datasets that approach terabyte scales. We present its design and capabilities, and show how it is a critical component of a data curation pipeline that enables reproducible research across an international research group for the Covid Symptom Study.

Download Full-text

Big Data Analytics: The Next Big Thing

The Management Accountant Journal ◽

10.33516/maj.v54i5.20-24p ◽

2019 ◽

Vol 54 (5) ◽

pp. 20

Author(s):

Dheeraj Kumar Pradhan

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics

Download Full-text

Strategy in Discovery Mode - Wie Big Data & Analytics strategisches Denken verdrängen kann

WiSt - Wirtschaftswissenschaftliches Studium ◽

10.15358/0340-1650-2020-5-11 ◽

2020 ◽

Vol 49 (5) ◽

pp. 11-17

Author(s):

Thomas Wrona ◽

Pauline Reinecke

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics

Big Data & Analytics (BDA) ist zu einer kaum hinterfragten Institution für Effizienz und Wettbewerbsvorteil von Unternehmen geworden. Zu viele prominente Beispiele, wie der Erfolg von Google oder Amazon, scheinen die Bedeutung zu bestätigen, die Daten und Algorithmen zur Erlangung von langfristigen Wettbewerbsvorteilen zukommt. Sowohl die Praxis als auch die Wissenschaft scheinen geradezu euphorisch auf den „Datenzug“ aufzuspringen. Wenn Risiken thematisiert werden, dann handelt es sich meist um ethische Fragen. Dabei wird häufig übersehen, dass die diskutierten Vorteile sich primär aus einer operativen Effizienzperspektive ergeben. Strategische Wirkungen werden allenfalls in Bezug auf Geschäftsmodellinnovationen diskutiert, deren tatsächlicher Innovationsgrad noch zu beurteilen ist. Im Folgenden soll gezeigt werden, dass durch BDA zwar Wettbewerbsvorteile erzeugt werden können, dass aber hiermit auch große strategische Risiken verbunden sind, die derzeit kaum beachtet werden.

Download Full-text