Data cleaning in the process industries

2015 ◽  
Vol 31 (5) ◽  
Author(s):  
Shu Xu ◽  
Bo Lu ◽  
Michael Baldea ◽  
Thomas F. Edgar ◽  
Willy Wojsznis ◽  
...  

AbstractIn the past decades, process engineers are facing increasingly more data analytics challenges and having difficulties obtaining valuable information from a wealth of process variable data trends. The raw data of different formats stored in databases are not useful until they are cleaned and transformed. Generally, data cleaning consists of four steps: missing data imputation, outlier detection, noise removal, and time alignment and delay estimation. This paper discusses available data cleaning methods that can be used in data pre-processing and help overcome challenges of “Big Data”.

Computers ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 37 ◽  
Author(s):  
Luca Cappelletti ◽  
Tommaso Fontana ◽  
Guido Walter Di Donato ◽  
Lorenzo Di Tucci ◽  
Elena Casiraghi ◽  
...  

Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results achieved by deep learning techniques have opened the way to their application for solving difficult problems where human skill is not able to provide a reliable solution. Not surprisingly, some deep learners, mainly exploiting encoder-decoder architectures, have also been designed and applied to the task of missing data imputation. However, most of the proposed imputation techniques have not been designed to tackle “complex data”, that is high dimensional data belonging to datasets with huge cardinality and describing complex problems. Precisely, they often need critical parameters to be manually set or exploit complex architecture and/or training phases that make their computational load impracticable. In this paper, after clustering the state-of-the-art imputation techniques into three broad categories, we briefly review the most representative methods and then describe our data imputation proposals, which exploit deep learning techniques specifically designed to handle complex data. Comparative tests on genome sequences show that our deep learning imputers outperform the state-of-the-art KNN-imputation method when filling gaps in human genome sequences.


2021 ◽  
Author(s):  
Yuan Luo

UNSTRUCTURED The Data Analytics Challenge on Missing data Imputation (DACMI) presented a shared clinical dataset with ground truth for evaluating and advancing the state-of-the-art in imputing missing data for clinical time series. The challenge attracted 12 international teams spanning three continents across multiple industries and academia. The challenge participating systems practically advanced the state-of-the-art with considerable margins, and their designing principles will inform future efforts to better model clinical missing data.


2020 ◽  
pp. 027347532096050
Author(s):  
Eileen Bridges

This article looks back over the past two decades to describe how teaching of undergraduate marketing research has (or has not) changed. Sweeping changes in technology and society have certainly affected how marketing research is designed and implemented—but how has this affected teaching of this important topic? Although the purpose of marketing research is still to better understand target customer needs, the tools are different now: customer data are typically collected using technology-based interfaces in place of such instruments as mailed, telephone, or in-person surveys. Observational techniques collect more data electronically rather than requiring a human recorder. Similarly, sampling has changed: sample frames are no longer widely used. Many of these changes are not yet fully discussed in marketing research courses. On the other hand, there is increasing interest in and availability of courses and programs in marketing data analytics, which teach specialized skills related to analysis and interpretation of electronic databases. Perhaps even more importantly, new technology-based tools permit greater automation of data collection and analysis, and presentation of findings. A critical gap is identified in this article; specifically, effort is needed to better integrate the perspectives of data collection and data analysis given current research conditions.


Sign in / Sign up

Export Citation Format

Share Document