Reducing impacts of systematic errors in the observation data on inversing ecosystem model parameters using different normalization methods

Abstract. Modeling ecosystem carbon cycle on the regional and global scales is crucial to the prediction of future global atmospheric CO2 concentration and thus global temperature which features large uncertainties due mainly to the limitations in our knowledge and in the climate and ecosystem models. There is a growing body of research on parameter estimation against available carbon measurements to reduce model prediction uncertainty at regional and global scales. However, the systematic errors with the observation data have rarely been investigated in the optimization procedures in previous studies. In this study, we examined the feasibility of reducing the impact of systematic errors on parameter estimation using normalization methods, and evaluated the effectiveness of three normalization methods (i.e. maximum normalization, min-max normalization, and z-score normalization) on inversing key parameters, for example the maximum carboxylation rate (Vcmax,25) at a reference temperature of 25°C, in a process-based ecosystem model for deciduous needle-leaf forests in northern China constrained by the leaf area index (LAI) data. The LAI data used for parameter estimation were composed of the model output LAI (truth) and various designated systematic errors and random errors. We found that the estimation of Vcmax,25 could be severely biased with the composite LAI if no normalization was taken. Compared with the maximum normalization and the min-max normalization methods, the z-score normalization method was the most robust in reducing the impact of systematic errors on parameter estimation. The most probable values of estimated Vcmax,25 inversed by the z-score normalized LAI data were consistent with the true parameter values as in the model inputs though the estimation uncertainty increased with the magnitudes of random errors in the observations. We concluded that the z-score normalization method should be applied to the observed or measured data to improve model parameter estimation, especially when the potential errors in the constraining (observation) datasets are unknown.

Download Full-text

Performa Comparison of the K-Means Method for Classification in Diabetes Patients Using Two Normalization Methods

INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY RESEARCH AND ANALYSIS ◽

10.47191/ijmra/v4-i1-03 ◽

2021 ◽

Vol 04 (01) ◽

Author(s):

Dwianti Westari ◽

Keyword(s):

Classification System ◽

Health Sector ◽

Normalization Method ◽

Training Data ◽

Z Score ◽

Classification Result ◽

Normalization Methods ◽

Value Range ◽

Score Normalization

The diabetes classification system is very useful in the health sector. This paper discusses the classification system for diabetes using the K-Means algorithm. The Pima Indian Diabetes (PID) dataset is used to train and evaluate this algorithm. The unbalanced value range in the attributes affects the quality of the classification result, so it is necessary to preprocess the data which is expected to improve the accuracy of the PID dataset classification result. Two types of preprocessing methods are used that are min-max normalization and z-score normalization. These two normalization methods are used and the classification accuracies are compared. Before the data classification process is carried out, the data is divided into training data and test data. The result of the classification test using the K-Means algorithm has shown that the best accuracy lies in the PID dataset which has been normalized using the min-max normalization method, which 79% compared to z-score normalization.

Download Full-text

The impact of data errors on the outcome of randomized clinical trials

Clinical Trials ◽

10.1177/1740774517716158 ◽

2017 ◽

Vol 14 (5) ◽

pp. 499-506 ◽

Cited By ~ 9

Author(s):

Marc Buyse ◽

Pierre Squifflet ◽

Elisabeth Coart ◽

Emmanuel Quinaux ◽

Cornelis JA Punt ◽

...

Keyword(s):

Colorectal Cancer ◽

Clinical Trials ◽

Metastatic Colorectal Cancer ◽

Randomized Clinical Trials ◽

Systematic Errors ◽

Age Related Macular Degeneration ◽

Random Errors ◽

Significant Treatment ◽

Age Related ◽

The Impact

Background/aims Considerable human and financial resources are typically spent to ensure that data collected for clinical trials are free from errors. We investigated the impact of random and systematic errors on the outcome of randomized clinical trials. Methods We used individual patient data relating to response endpoints of interest in two published randomized clinical trials, one in ophthalmology and one in oncology. These randomized clinical trials enrolled 1186 patients with age-related macular degeneration and 736 patients with metastatic colorectal cancer. The ophthalmology trial tested the benefit of pegaptanib for the treatment of age-related macular degeneration and identified a statistically significant treatment benefit, whereas the oncology trial assessed the benefit of adding cetuximab to a regimen of capecitabine, oxaliplatin, and bevacizumab for the treatment of metastatic colorectal cancer and failed to identify a statistically significant treatment difference. We simulated trial results by adding errors that were independent of the treatment group (random errors) and errors that favored one of the treatment groups (systematic errors). We added such errors to the data for the response endpoint of interest for increasing proportions of randomly selected patients. Results Random errors added to up to 50% of the cases produced only slightly inflated variance in the estimated treatment effect of both trials, with no qualitative change in the p-value. In contrast, systematic errors produced bias even for very small proportions of patients with added errors. Conclusion A substantial amount of random errors is required before appreciable effects on the outcome of randomized clinical trials are noted. In contrast, even a small amount of systematic errors can severely bias the estimated treatment effects. Therefore, resources devoted to randomized clinical trials should be spent primarily on minimizing sources of systematic errors which can bias the analyses, rather than on random errors which result only in a small loss in power.

Download Full-text

Evaluating the dosimetric consequences of MLC leaf positioning errors in dynamic IMRT treatments

Journal of Radiotherapy in Practice ◽

10.1017/s1460396918000705 ◽

2019 ◽

Vol 18 (03) ◽

pp. 225-231 ◽

Cited By ~ 1

Author(s):

Arpita Agarwal ◽

Nikhil Rastogi ◽

KJ Maria Das ◽

SA Yoganathan ◽

D Udayakumar ◽

...

Keyword(s):

Intensity Modulated Radiotherapy ◽

Systematic Errors ◽

Normal Tissue Complication Probability ◽

Planning System ◽

Treatment Planning System ◽

Random Errors ◽

Equivalent Uniform Dose ◽

Tumour Control ◽

Positioning Errors ◽

The Impact

AbstractPurposeThe purpose of this study was to evaluate the dosimetric impact of multileaf collimator (MLC) positional errors on dynamic intensity-modulated radiotherapy (IMRT) treatments through planning simulation. Secondly the sensitivity of IMRT MatriXX device for detecting the MLC leaf positional errors was also evaluated.Materials and methodsIn this study five dynamic IMRT plans, each for brain and head–neck (HN), were retrospectively included. An in-house software was used to introduce random errors (uniform distribution between −2·0 and +2·0 mm) and systematic errors [±0·5, ±0·75, ±1·0 and ±2·0 mm (+: open MLC error and −: close MLC error)]. The error-introduced MLC files were imported into the treatment planning system and new dose distributions were calculated. Furthermore, the dose–volume histogram files of all plans were exported to in-house software for equivalent uniform dose (EUD), tumour control probability and normal tissue complication probability calculations. The error-introduced plans were also delivered on LINAC, and the planar fluences were measured by IMRT MatriXX. Further, 3%/3 mm and 2%/2 mm γ-criteria were used for analysis.ResultsIn planning simulation study, the impact of random errors was negligible and ΔEUD was <0·5±0·7%, for both brain and HN. The impact of systematic errors was substantial, and on average, the maximum change in EUD for systematic errors (close 2 mm) was −10·7±3·1% for brain and −15·5±2·6% for HN.ConclusionsIt can be concluded that the acceptable systematic error was 0·4 mm for brain and 0·3 mm for HN. Furthermore, IMRT MatriXX device was able to detect the MLC errors ≥2 mm in HN and >3 mm errors in brain with 2%/2 mm γ-criteria.

Download Full-text

Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i1.2880 ◽

2021 ◽

Vol 5 (1) ◽

pp. 114-122

Author(s):

Gde Agung Brahmana Suryanegara ◽

Adiwijaya ◽

Mahendra Dwifebri Purbolaksono

Keyword(s):

Random Forest ◽

The Body ◽

Classification Method ◽

Data Normalization ◽

Z Score ◽

Normal Limits ◽

Normalization Methods ◽

Score Normalization ◽

Level Of Analysis ◽

Better Than

Diabetes is a disease caused by high blood sugar in the body or beyond normal limits. Diabetics in Indonesia have experienced a significant increase, Basic Health Research states that diabetics in Indonesia were 6.9% to 8.5% increased from 2013 to 2018 with an estimated number of sufferers more than 16 million people. Therefore, it is necessary to have a technology that can detect diabetes with good performance, accurate level of analysis, so that diabetes can be treated early to reduce the number of sufferers, disabilities, and deaths. The different scale values for each attribute in Gula Karya Medika’s data can complicate the classification process, for this reason the researcher uses two data normalization methods, namely min-max normalization, z-score normalization, and a method without data normalization with Random Forest (RF) as a classification method. Random Forest (RF) as a classification method has been tested in several previous studies. Moreover, this method is able to produce good performance with high accuracy. Based on the research results, the best accuracy is model 1 (Min-max normalization-RF) of 95.45%, followed by model 2 (Z-score normalization-RF) of 95%, and model 3 (without data normalization-RF) of 92%. From these results, it can be concluded that model 1 (Min-max normalization-RF) is better than the other two data normalization models and is able to increase the performance of classification Random Forest by 95.45%.

Download Full-text

Analysis of normalization method for DNA microarray data

Asia Pacific Journal of Molecular Biology and Biotechnology ◽

10.35118/apjmbb.2019.027.4.04 ◽

2019 ◽

pp. 30-37

Author(s):

Omar Salem Baans ◽

Asral Bahari Jambek ◽

Khairul Anuar Mat Said

Keyword(s):

Dna Microarray ◽

Microarray Data ◽

Microarray Experiment ◽

Systematic Errors ◽

Normalization Method ◽

Microarray Image ◽

Dna Microarray Data ◽

Normalization Methods ◽

Microarray Result ◽

Gene Expression Levels

Normalization is a process of removing systematic variation that affects measured gene expression levels in the microarray experiment. The purpose is to get more accurate DNA microarray result by deleting the systematic errors that may have occurred during the making of DNA microarray Image. In this paper, five normalization methods of Global, Lowess, House-keeping, Quantile and Print-tip are discussed. The Print Tip normalization was chosen for its high accuracy (32.89 dB and its final MA graph shape was well normalized. Print tip normalization with PSNR value of 33.15dB has been chosen as a new normalization method. The results were validated using four images from the formal database for DNA microarray data. The new proposed method showed more accurate results than the existing methods in term of four parameters: MSE, PSNR, RMSE and MAE.

Download Full-text

A Normalization Methods for Backpropagation: A Comparative Study

Science Journal of University of Zakho ◽

10.25271/2017.5.4.381 ◽

2017 ◽

Vol 5 (4) ◽

pp. 319 ◽

Cited By ~ 6

Author(s):

Adel S. Eesa ◽

Wahab Kh. Arabo

Keyword(s):

Comparative Study ◽

Network Models ◽

Z Score ◽

Neural Network Models ◽

Normalization Methods ◽

Score Normalization ◽

Learning Parameter ◽

Artificial Neural Network Models ◽

The Comparative Study ◽

Better Than

Neural Networks (NN) have been used by many researchers to solve problems in several domains including classification and pattern recognition, and Backpropagation (BP) which is one of the most well-known artificial neural network models. Constructing effective NN applications relies on some characteristics such as the network topology, learning parameter, and normalization approaches for the input and the output vectors. The Input and the output vectors for BP need to be normalized properly in order to achieve the best performance of the network. This paper applies several normalization methods on several UCI datasets and comparing between them to find the best normalization method that works better with BP. Norm, Decimal scaling, Mean-Man, Median-Mad, Min-Max, and Z-score normalization are considered in this study. The comparative study shows that the performance of Mean-Mad and Median-Mad is better than the all remaining methods. On the other hand, the worst result is produced with Norm method.

Download Full-text

Impact of DNA microarray data transformation on gene expression analysis - comparison of two normalization methods.

Acta Biochimica Polonica ◽

10.18388/abp.2011_2227 ◽

2011 ◽

Vol 58 (4) ◽

Cited By ~ 8

Author(s):

Marcin T Schmidt ◽

Luiza Handschuh ◽

Joanna Zyprych ◽

Alicja Szabelska ◽

Agnieszka K Olejnik-Schmidt ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Dna Microarray ◽

Microarray Data ◽

Normalization Method ◽

Differentially Expressed ◽

Microarray Data Analysis ◽

Data Set ◽

Normalization Methods ◽

The Impact

Two-color DNA microarrays are commonly used for the analysis of global gene expression. They provide information on relative abundance of thousands of mRNAs. However, the generated data need to be normalized to minimize systematic variations so that biologically significant differences can be more easily identified. A large number of normalization procedures have been proposed and many softwares for microarray data analysis are available. Here, we have applied two normalization methods (median and loess) from two packages of microarray data analysis softwares. They were examined using a sample data set. We found that the number of genes identified as differentially expressed varied significantly depending on the method applied. The obtained results, i.e. lists of differentially expressed genes, were consistent only when we used median normalization methods. Loess normalization implemented in the two software packages provided less coherent and for some probes even contradictory results. In general, our results provide an additional piece of evidence that the normalization method can profoundly influence final results of DNA microarray-based analysis. The impact of the normalization method depends greatly on the algorithm employed. Consequently, the normalization procedure must be carefully considered and optimized for each individual data set.

Download Full-text

Statistics and parallaxes

International Astronomical Union Colloquium ◽

10.1017/s0252921100073875 ◽

1978 ◽

Vol 48 ◽

pp. 7-29

Author(s):

T. E. Lutz

Keyword(s):

Experimental Design ◽

Statistical Methods ◽

Review Paper ◽

Systematic Errors ◽

Past Experience ◽

Random Errors ◽

Trigonometric Parallaxes ◽

Systematic And Random Errors

This review paper deals with the use of statistical methods to evaluate systematic and random errors associated with trigonometric parallaxes. First, systematic errors which arise when using trigonometric parallaxes to calibrate luminosity systems are discussed. Next, determination of the external errors of parallax measurement are reviewed. Observatory corrections are discussed. Schilt’s point, that as the causes of these systematic differences between observatories are not known the computed corrections can not be applied appropriately, is emphasized. However, modern parallax work is sufficiently accurate that it is necessary to determine observatory corrections if full use is to be made of the potential precision of the data. To this end, it is suggested that a prior experimental design is required. Past experience has shown that accidental overlap of observing programs will not suffice to determine observatory corrections which are meaningful.

Download Full-text

Heterotrophic Kinetic Parameter Estimation for Enhanced Biological Phosphorus Removal Processes Operated in Conventional and Membrane-Assisted Modes

Water Quality Research Journal ◽

10.2166/wqrj.2006.008 ◽

2006 ◽

Vol 41 (1) ◽

pp. 72-83 ◽

Cited By ~ 2

Author(s):

Zhe Zhang ◽

Eric R. Hall

Keyword(s):

Parameter Estimation ◽

Phosphorus Removal ◽

Growth Yield ◽

Pilot Scale ◽

Enhanced Biological Phosphorus Removal ◽

Biological Phosphorus Removal ◽

Nitrogen And Phosphorus ◽

Parameter Values ◽

The Impact ◽

Biological Phosphorus

Abstract Parameter estimation and wastewater characterization are crucial for modelling of the membrane enhanced biological phosphorus removal (MEBPR) process. Prior to determining the values of a subset of kinetic and stoichiometric parameters used in ASM No. 2 (ASM2), the carbon, nitrogen and phosphorus fractions of influent wastewater at the University of British Columbia (UBC) pilot plant were characterized. It was found that the UBC wastewater contained fractions of volatile acids (SA), readily fermentable biodegradable COD (SF) and slowly biodegradable COD (XS) that fell within the ASM2 default value ranges. The contents of soluble inert COD (SI) and particulate inert COD (XI) were somewhat higher than ASM2 default values. Mixed liquor samples from pilot-scale MEBPR and conventional enhanced biological phosphorus removal (CEBPR) processes operated under parallel conditions, were then analyzed experimentally to assess the impact of operation in a membrane-assisted mode on the growth yield (YH), decay coefficient (bH) and maximum specific growth rate of heterotrophic biomass (µH). The resulting values for YH, bH and µH were slightly lower for the MEBPR train than for the CEBPR train, but the differences were not statistically significant. It is suggested that MEBPR simulation using ASM2 could be accomplished satisfactorily using parameter values determined for a conventional biological phosphorus removal process, if MEBPR parameter values are not available.

Download Full-text

Constrained Parameter Estimation for a Mechanistic Kinetic Model of Cobalt–Hydrogen Electrochemical Competition during a Cobalt Removal Process

Entropy ◽

10.3390/e23040387 ◽

2021 ◽

Vol 23 (4) ◽

pp. 387

Author(s):

Yiting Liang ◽

Yuanhua Zhang ◽

Yonggang Li

Keyword(s):

Parameter Estimation ◽

Kinetic Model ◽

Model Parameters ◽

Hydrogen Ions ◽

Removal Process ◽

Cobalt Removal ◽

Estimation Scheme ◽

Zinc Hydrometallurgy ◽

The Impact ◽

Parameter Estimation Scheme

A mechanistic kinetic model of cobalt–hydrogen electrochemical competition for the cobalt removal process in zinc hydrometallurgical was proposed. In addition, to overcome the parameter estimation difficulties arising from the model nonlinearities and the lack of information on the possible value ranges of parameters to be estimated, a constrained guided parameter estimation scheme was derived based on model equations and experimental data. The proposed model and the parameter estimation scheme have two advantages: (i) The model reflected for the first time the mechanism of the electrochemical competition between cobalt and hydrogen ions in the process of cobalt removal in zinc hydrometallurgy; (ii) The proposed constrained parameter estimation scheme did not depend on the information of the possible value ranges of parameters to be estimated; (iii) the constraint conditions provided in that scheme directly linked the experimental phenomenon metrics to the model parameters thereby providing deeper insights into the model parameters for model users. Numerical experiments showed that the proposed constrained parameter estimation algorithm significantly improved the estimation efficiency. Meanwhile, the proposed cobalt–hydrogen electrochemical competition model allowed for accurate simulation of the impact of hydrogen ions on cobalt removal rate as well as simulation of the trend of hydrogen ion concentration, which would be helpful for the actual cobalt removal process in zinc hydrometallurgy.

Download Full-text