Estimating the parameters of a dependent model and applying it to environmental data set

Author(s):  
V. Mohtashami-Borzadaran ◽  
M. Amini ◽  
J. Ahmadi
Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


Author(s):  
Rohit Shankaran ◽  
Alexander Rimmer ◽  
Alan Haig

In recent years due to use of drilling risers with larger and heavier BOP/LMRP stacks, fatigue loading on subsea wellheads has increased, which poses potential restrictions on the duration of drilling operations. In order to track wellhead and conductor fatigue capacity consumption to support safe drilling operations a range of methods have been applied: • Analytical riser model and measured environmental data; • BOP motion measurement and transfer functions; • Strain gauge data. Strain gauge monitoring is considered the most accurate method for measuring fatigue capacity consumption. To compare the three approaches and establish recommendations for an optimal approach and method to establish fatigue accumulation of the wellhead, a monitoring data set is obtained on a well offshore West of Shetland. This paper presents an analysis of measured strain, motions and analytical predictions with the objective of better understanding the accuracy, limitations, or conservatism in each of the three methods defined above. Of the various parameters that affect the accuracy of the fatigue damage estimates, the paper identifies that the selection of analytical conductor-soil model is critical to narrowing the gap between fatigue life predictions from the different approaches. The work presented here presents the influence of alternative approaches to model conductor-soil interaction than the traditionally used API soil model. Overall, the paper presents the monitoring equipment and analytical methodology to advance the accuracy of wellhead fatigue damage measurements.


2008 ◽  
Vol 71 (2) ◽  
pp. 279-285 ◽  
Author(s):  
M. J. STASIEWICZ ◽  
B. P. MARKS ◽  
A. ORTA-RAMIREZ ◽  
D. M. SMITH

Traditional models for predicting the thermal inactivation rate of bacteria are state dependent, considering only the current state of the product. In this study, the potential for previous sublethal thermal history to increase the thermotolerance of Salmonella in ground turkey was determined, a path-dependent model for thermal inactivation was developed, and the path-dependent predictions were tested against independent data. Weibull-Arrhenius parameters for Salmonella inactivation in ground turkey thigh were determined via isothermal tests at 55, 58, 61, and 63°C. Two sets of nonisothermal heating tests also were conducted. The first included five linear heating rates (0.4, 0.9, 1.7, 3.5, and 7.0 K/min) and three holding temperatures (55, 58, and 61°C); the second also included sublethal holding periods at 40, 45, and 50°C. When the standard Weibull-Arrhenius model was applied to the nonisothermal validation data sets, the root mean squared error of prediction was 2.5 log CFU/g, with fail-dangerous residuals as large as 4.7 log CFU/g when applied to the complete nonisothermal data set. However, by using a modified path-dependent model for inactivation, the prediction errors for independent data were reduced by 56%. Under actual thermal processing conditions, use of the path-dependant model would reduce error in thermal lethality predictions for slowly cooked products.


Data in Brief ◽  
2020 ◽  
Vol 31 ◽  
pp. 105794
Author(s):  
Zakariya Dalala ◽  
Mohammad Al-Addous ◽  
Firas Alawneh ◽  
Christina B. Class

2020 ◽  
Author(s):  
Doron Goldfarb ◽  
Johannes Kobler ◽  
Johannes Peterseil

<p>As outliers in any data set may have detrimental effects on further scientific analysis, the measurement of any environmental parameter and the detection of outliers within these data are closely linked. However, outlier analysis is complicated, as the definition of an outlier is controversially discussed and thus - until now - vague. Nonetheless, multiple methods have been implemented to detect outliers in data sets. The application of these methods often requires some statistical know-how.</p><p>The present use case, developed as proof-of-concept implementation within the EOSC-Hub project, is dedicated to providing a user-friendly outlier analysis web-service via an open REST API processing environmental data either provided via Sensor Observation Service (SOS) or stored as data files in a cloud-based data repository. It is driven by an R-script performing the different operation steps consisting of data retrieval,  outlier analysis and final data export. To cope with the vague definition of an outlier, the outlier analysis step applies numerous statistical methods implemented in various R-packages.</p><p>The web-service encapsulates the R-script behind a REST API which is decribed by a dedicated OpenAPI specification defining two distinct access methods (i.e. SOS- and file-based) and the required parameters to run the R-script. This formal specification is subsequently used to automatically generate a server stub based on the Python FLASK framework which is customized to execute the R-script on the server whenever an appropriate web request arrives. The output is currently collected in a ZIP file which is returned after each successful web request. The service prototype is designed to be operated using generic resources provided by the European Open Science Cloud (EOSC) and the European Grid Initiative (EGI) in order to ensure sustainability and scalability.</p><p>Due to its user-friendliness and open availability, the presented web-service will facilitate access to standardized and scientifically-based outlier analysis methods not only for individual scientists but also for networks and research infrastructures like eLTER. It will thus contribute to the standardization of quality control procedures for data provision in distributed networks of data providers.</p><p> </p><p>Keywords: quality assessment, outlier detection, web service, REST-API, eLTER, EOSC, EGI, EOSC-Hub</p>


Author(s):  
Aldo Marchetto ◽  
Angela Boggero ◽  
Diego Fontaneto ◽  
Andrea Lami ◽  
André F. Lotter ◽  
...  

We publish a data set of environmental and biological data collected in 2000 during the ice-free period in high mountain lakes located above the local timberline in the Alps, in Italy, Switzerland and Austria. Environmental data include coordinates, geographical attributes and detailed information on vegetation, bedrock and land use in lake catchments. Chemical analyses of a sample for each lake collected at the lake surface in Summer 2000 are also reported. Biological data include phytoplankton (floating algae and cyanobacteria), zooplankton (floating animals), macroinvertebrates (aquatic organisms visible to the naked eye living in contact with sediments on lake bottom), benthic diatoms. Diatoms, cladocera and chironomids remains and algal and bacterial pigments were also analysed in lake sediments.


2018 ◽  
Author(s):  
Michael J. Bowes ◽  
Linda K. Armstrong ◽  
Sarah A. Harman ◽  
Heather D. Wickham ◽  
Peter M. Scarlett ◽  
...  

Abstract. The River Thames and 15 of its major tributaries have been monitored at weekly intervals since March 2009. Monitored determinands include major nutrient fractions, anions, cations, metals, pH, alkalinity and chlorophyll a., and linked to mean daily river flows at each site. This catchment-wide biogeochemical monitoring platform captures changes in the water quality of the Thames basin during a period of rapid change, related to increasing pressures (due to a rapidly growing human population, increasing water demand and climate change) and improvements in sewage treatment processes and agricultural practises. The platform provides the research community with a valuable data and modelling resource for furthering our understanding of pollution sources and dynamics, and interactions between water quality and aquatic ecology. Comparing Thames Initiative data with previous (non-continuous) monitoring data sets from many common study sites, dating back to 1997, has shown that there have been major reductions is phosphorus concentrations at most sites, occurring at low river flow, and these are principally due to reduced loadings from sewage treatment works. This ongoing monitoring programme will provide the vital underpinning environmental data required to best manage this vital drinking water resource, which is key for the sustainability of the city of London and the wider UK economy. The Thames Initiative data set is freely available from the Centre for Ecology & Hydrology's Environmental Information Data Centre at doi:10.5285/e4c300b1-8bc3-4df2-b23a-e72e67eef2fd.


2011 ◽  
Vol 18 (4) ◽  
pp. 515-528 ◽  
Author(s):  
R. K. Tiwari ◽  
S. Maiti

Abstract. A novel technique based on the Bayesian neural network (BNN) theory is developed and employed to model the temperature variation record from the Western Himalayas. In order to estimate an a posteriori probability function, the BNN is trained with the Hybrid Monte Carlo (HMC)/Markov Chain Monte Carlo (MCMC) simulations algorithm. The efficacy of the new algorithm is tested on the well known chaotic, first order autoregressive (AR) and random models and then applied to model the temperature variation record decoded from the tree-ring widths of the Western Himalayas for the period spanning over 1226–2000 AD. For modeling the actual tree-ring temperature data, optimum network parameters are chosen appropriately and then cross-validation test is performed to ensure the generalization skill of the network on the new data set. Finally, prediction result based on the BNN model is compared with the conventional artificial neural network (ANN) and the AR linear models results. The comparative results show that the BNN based analysis makes better prediction than the ANN and the AR models. The new BNN modeling approach provides a viable tool for climate studies and could also be exploited for modeling other kinds of environmental data.


2021 ◽  
Author(s):  
Hossein Hassani ◽  
Nadejda Komendantova ◽  
Daniel Kroos ◽  
Stephan Unger ◽  
Mohammad Reza Yeganegi

Abstract The importance of energy security for successful functioning of private companies, national economies, and the overall society should not be underestimated. Energy is a critical infrastructure for any modern society, and its reliable functioning is essential for all economic sectors and for the well-being of everybody. Uncertainty in terms of the availability of information, reliable data to make predictions and to plan for investment as well as for other actions of stakeholders at the energy markets is one of the factors, which has the highest influence on energy security. This uncertainty can be connected with many factors such as the availability of reliable data or the actions of stakeholders themselves. For example, the recent outbreak of the COVID-19 pandemic revealed negative impacts of uncertainty on decision-making processes and markets. At the time point when the market participants started to receive real-time information about the situation, the energy markets began to ease. This is one scenario where Big Data can be used to amplify information to various stakeholders to prevent panic and to ensure market stability and security of supply. In a fast-paced digital world characterized by technological advances, the use of Big Data technology provides a unique niche point to close this gap in information disparity by levering the use of unconventional data sources to integrate technologies, stakeholders, and markets to promote energy security and market stability. The potential of Big Data technology is yet to be fully utilized. Big Data can handle large data set characterized by volume, variety, velocity, value, and complexity. The challenge for energy markets is to leverage this technology to mine available socioeconomic, political, geographic, and environmental data responsibly as well as to provide indicators that predict future global supply and demand. This information is crucial for energy security and ensuring global economic prosperity.


Sign in / Sign up

Export Citation Format

Share Document