scholarly journals Supplementary material to "Adaptive Baseline Finder, a statistical data selection strategy to identify atmospheric CO<sub>2</sub> baseline levels and its application to European elevated mountain stations"

Author(s):  
Ye Yuan ◽  
Ludwig Ries ◽  
Hannes Petermeier ◽  
Martin Steinbacher ◽  
Angel J. Gómez-Peláez ◽  
...  
2017 ◽  
Author(s):  
Ye Yuan ◽  
Ludwig Ries ◽  
Hannes Petermeier ◽  
Martin Steinbacher ◽  
Angel J. Gómez-Peláez ◽  
...  

Abstract. Critical data selection is essential for determining representative baseline levels of atmospheric trace gas measurements even at remote measuring sites. Different data selection techniques have been used around the world which could potentially lead to bias when comparing data from different stations. This paper presents a novel statistical data selection method based on CO2 diurnal pattern occurring typically at high elevated mountain stations. Its capability and applicability was studied for atmospheric measuring records of CO2 from 2010 to 2016 at six Global Atmosphere Watch (GAW) stations in Europe, namely Zugspitze-Schneefernerhaus (Germany), Sonnblick (Austria), Jungfraujoch (Switzerland), Izaña (Spain), Schauinsland (Germany) and Hohenpeissenberg (Germany). Three other frequently applied statistical data selection methods were implemented for comparison. Among all selection routines, the new method named Adaptive Baseline Finder (ABF) resulted in lower selection percentages with lower maxima during winter and higher minima during summer in the selected data. To investigate long-term trend and seasonality, seasonal decomposition technique STL was applied. Compared with the unselected data, mean annual growth rates of all selected data sets were not significantly different except for Schauinsland. However, clear differences were found in the annual amplitudes as well as for the seasonal time structure. Based on correlation analysis, results by ABF selection showed a better representation of the lower free tropospheric conditions.


Algorithms ◽  
2018 ◽  
Vol 12 (1) ◽  
pp. 4 ◽  
Author(s):  
Marcele O. K. Mendonça ◽  
Jonathas O. Ferreira ◽  
Christos G. Tsinos ◽  
Paulo S R Diniz ◽  
Tadeu N. Ferreira

The amount of information currently generated in the world has been increasing exponentially, raising the question of whether all acquired data is relevant for the learning algorithm process. If a subset of the data does not bring enough innovation, data-selection strategies can be employed to reduce the computational complexity cost and, in many cases, improve the estimation accuracy. In this paper, we explore some adaptive filtering algorithms whose characteristic features are their fast convergence and data selection. These algorithms incorporate a prescribed data-selection strategy and are compared in distinct applications environments. The simulation results include both synthetic and real data.


2020 ◽  
Vol 1342 ◽  
pp. 012110
Author(s):  
S Caprioli ◽  
C Ghiano ◽  
A Re ◽  
M Redchuck ◽  

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Catherine Cheung ◽  
Calista Biondic ◽  
Zouhair Hamaimou ◽  
Julio Valdes

Rapid developments in sensor technology, data processing tools and data storage capability have helped fuel an increased appetite for equipment health monitoring in mechanical systems. As a result, the number of sensors and amount of data collected for health monitoring has grown tremendously. It is hoped that by collecting large quantities of operational data, predictive tools can be developed that will provide operational, maintenance and safety benefits. Data mining and machine learning techniques are important tools in addressing the ensuing challenge of extracting useful results from the data collected. In this work, the sensor data from a gas turbine system was analyzed with the objective of failure modeling and prediction. Previous efforts had used a two-class approach for this problem, to distinguish healthy and failed states of the system. In this work, a third class labelled as deteriorated data is added prior to each failure event to explore the ability of machine learning models to provide early warning of upcoming incidents. Several maintenance incidents were recorded by the sensor system in two separate vehicles. Three approaches to selecting training data were used. The first followed a traditional method of randomly selecting data points from all data according to a desired percentage of failed data to include in training, target ratios between failed and healthy data in each data set, as well as target ratios between training and testing data. The second data selection strategy was to consider data related to failure incidents as a whole and select certain incidents to include in training, and the remaining ones to be unseen in testing. The third approach was cross-validation which is typically used as a technique to evaluate how a classifier will perform on unseen data while still using the entirety of the data to train the final classifier. In addition to investigating training and data selection strategies, the effect of hyperparameter optimization was explored as well as the effect of varying the time period of the deteriorated class. Using the gas turbine data, which included 7 failure incidents and 76 predictor variables, a variety of classifier models of the system were developed in a three-class problem to differentiate healthy, deteriorated and failed system states. The classifier methods included support vector machines, Gaussian Naïve Bayes, random forest, adaboost, multilayer perceptron, k-nearest neighbor, and XG boost. Ensemble models were also created to leverage all the individual classifier models that were developed. This paper will describe the comprehensive results that were obtained using the various approaches and combinations, highlighting the respective benefits and limitations.


Cloud is service-oriented computing emerged as a cost-effective paradigm for distributed applications because of its unique features. Distributed Datacenters of the cloud service provider allowed researchers to do their research in Cloud environment with the factors like increasing availability, reducing response time, Fault tolerance, performance etc. This paper aims at Problem of data selection from database for partitioning, Partitioning the databases relations, placing the partition in different data centers of the cloud with respect to Cloud Hosted Database. Scope of this work will focus on read intensive data operations/queries. In this paper, we focused on data selection and data partitioning technique so as to reduce overall network latency on cloud hosted database. General log is for finding the frequently accessed rows from the database. Experiment is carried on Amazon RDS.


2009 ◽  
Vol 6 (2) ◽  
pp. 127-140 ◽  
Author(s):  
Miroslav Hudec

Although the Structured Query Language (SQL) is a very powerful tool, it is unable to satisfy needs for data selection based on linguistic expressions and degrees of truth. The goal of the research whose results are presented in the paper is to capture these expressions and make them suitable for queries. For this purpose the fuzzy generalized logical condition for the WHERE part of SQL was developed. In this way, queries based on linguistic expressions are supported and are accessing relational databases in the same way as with the SQL. Fuzzy query is not only a querying tool; it improves the meaning of a query and extracts additional valuable information. Statistical data about districts of the Slovak Republic are used in the case study. Fuzzy approach has some limitations that would appear in a querying process. These limitations and ideas how to solve them are outlined in this paper.


Sign in / Sign up

Export Citation Format

Share Document