scholarly journals An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

2014 ◽  
Vol 51 (2) ◽  
pp. 75-88 ◽  
Author(s):  
Sergio Arciniegas-Alarcón ◽  
Marisol García-Peña ◽  
Wojtek Janusz Krzanowski ◽  
Carlos Tadeu dos Santos Dias

Abstract A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

ISRN Agronomy ◽  
2013 ◽  
Vol 2013 ◽  
pp. 1-17 ◽  
Author(s):  
Sergio Arciniegas-Alarcón ◽  
Marisol García-Peña ◽  
Wojtek Janusz Krzanowski ◽  
Carlos Tadeu dos Santos Dias

This paper proposes five new imputation methods for unbalanced experiments with genotype by-environment interaction (G×E). The methods use cross-validation by eigenvector, based on an iterative scheme with the singular value decomposition (SVD) of a matrix. To test the methods, we performed a simulation study using three complete matrices of real data, obtained from G×E interaction trials of peas, cotton, and beans, and introducing lack of balance by randomly deleting in turn 10%, 20%, and 40% of the values in each matrix. The quality of the imputations was evaluated with the additive main effects and multiplicative interaction model (AMMI), using the root mean squared predictive difference (RMSPD) between the genotypes and environmental parameters of the original data set and the set completed by imputation. The proposed methodology does not make any distributional or structural assumptions and does not have any restrictions regarding the pattern or mechanism of missing values.


2021 ◽  
Vol 4 (3) ◽  
pp. 62
Author(s):  
Sergio Arciniegas-Alarcón ◽  
Marisol García-Peña ◽  
Camilo Rengifo ◽  
Wojtek J. Krzanowski

We describe imputation strategies resistant to outliers, through modifications of the simple imputation method proposed by Krzanowski and assess their performance. The strategies use a robust singular value decomposition, do not depend on distributional or structural assumptions and have no restrictions as to the pattern or missing data mechanisms. They are tested through the simulation of contamination and unbalance, both in artificially generated matrices and in a matrix of real data from an experiment with genotype-by-environment interaction. Their performance is assessed by means of prediction errors, the squared cosine between matrices, and a quality coefficient of fit between imputations and true values. For small matrices, the best results are obtained by applying robust decomposition directly, while for larger matrices the highest quality is obtained by eliminating the singular values of the imputation equation.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0233200
Author(s):  
Michel Henriques de Souza ◽  
José Domingos Pereira Júnior ◽  
Skarlet De Marco Steckling ◽  
Jussara Mencalha ◽  
Fabíola dos Santos Dias ◽  
...  

The evaluation of cultivars using multi-environment trials (MET) is an important step in plant breeding programs. One of the objectives of these evaluations is to understand the genotype by environment interaction (GEI). A method of determining the effect of GEI on the performance of cultivars is based on studies of adaptability and stability. Initial studies were based on linear regression; however, these methodologies have limitations, mainly in trials with genetic or statistical unbalanced, heterogeneity of residual variances, and genetic covariance. An alternative would be the use of random regression models (RRM), in which the behavior of the genotypes is characterized as a reaction norm using longitudinal data or repeated measurements and information regarding a covariance function. The objective of this work was the application of RRM in the study of the behavior of common bean cultivars using a MET, based on Legendre polynomials and genotype-ideotype distances. We used a set of 13 trials, which were classified as unfavorable or favorable environments. The results revealed that RRM enables the prediction of the genotypic values of cultivars in environments where they were not evaluated with high accuracy values, thereby circumventing the unbalanced of the experiments. From these values, it was possible to measure the genotypic adaptability according to ideotypes, according to their reaction norms. In addition, the stability of the cultivars can be interpreted as variation in the behavior of the ideotype. The use of ideotypes based on real data allowed a better comparison of the performance of cultivars across environments. The use of RRM in plant breeding is a good alternative to understand the behavior of cultivars in a MET, especially when we want to quantify the adaptability and stability of genotypes.


2018 ◽  
Author(s):  
Md. Bahadur Badsha ◽  
Rui Li ◽  
Boxiang Liu ◽  
Yang I. Li ◽  
Min Xian ◽  
...  

ABSTRACTBackgroundSingle-cell RNA-sequencing (scRNA-seq) is a rapidly evolving technology that enables measurement of gene expression levels at an unprecedented resolution. Despite the explosive growth in the number of cells that can be assayed by a single experiment, scRNA-seq still has several limitations, including high rates of dropouts, which result in a large number of genes having zero read count in the scRNA-seq data, and complicate downstream analyses.MethodsTo overcome this problem, we treat zeros as missing values and develop nonparametric deep learning methods for imputation. Specifically, our LATE (Learning with AuToEncoder) method trains an autoencoder with random initial values of the parameters, whereas our TRANSLATE (TRANSfer learning with LATE) method further allows for the use of a reference gene expression data set to provide LATE with an initial set of parameter estimates.ResultsOn both simulated and real data, LATE and TRANSLATE outperform existing scRNA-seq imputation methods, achieving lower mean squared error in most cases, recovering nonlinear gene-gene relationships, and better separating cell types. They are also highly scalable and can efficiently process over 1 million cells in just a few hours on a GPU.ConclusionsWe demonstrate that our nonparametric approach to imputation based on autoencoders is powerful and highly efficient.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 573
Author(s):  
Elisabetta Peri ◽  
Lin Xu ◽  
Christian Ciccarelli ◽  
Nele L. Vandenbussche ◽  
Hongji Xu ◽  
...  

A new algorithm based on singular value decomposition (SVD) to remove cardiac contamination from trunk electromyography (EMG) is proposed. Its performance is compared to currently available algorithms at different signal-to-noise ratios (SNRs). The algorithm is applied on individual channels. An experimental calibration curve to adjust the number of SVD components to the SNR (0–20 dB) is proposed. A synthetic dataset is generated by the combination of electrocardiography (ECG) and EMG to establish a ground truth reference for validation. The performance is compared with state-of-the-art algorithms: gating, high-pass filtering, template subtraction (TS), and independent component analysis (ICA). Its applicability on real data is investigated in an illustrative diaphragm EMG of a patient with sleep apnea. The SVD-based algorithm outperforms existing methods in reconstructing trunk EMG. It is superior to the others in the time (relative mean squared error < 15%) and frequency (shift in mean frequency < 1 Hz) domains. Its feasibility is proven on diaphragm EMG, which shows a better agreement with the respiratory cycle (correlation coefficient = 0.81, p-value < 0.01) compared with TS and ICA. Its application on real data is promising to non-obtrusively estimate respiratory effort for sleep-related breathing disorders. The algorithm is not limited to the need for additional reference ECG, increasing its applicability in clinical practice.


2017 ◽  
Vol 10 (04) ◽  
pp. 773-779
Author(s):  
V.B. Kamble ◽  
S.N. Deshmukh

Presence of missing values in the dataset leads to difficult for data analysis in data mining task. In this research work, student dataset is taken contains marks of four different subjects in engineering college. Mean, Mode, Median Imputation were used to deal with challenges of incomplete data. By using MSE and RMSE on dataset using with proposed Method and imputation methods like Mean, Mode, and Median Imputation on the dataset and found out to be values of Mean Squared Error and Root Mean Squared Error for the dataset. Accuracy also found out to be using Proposed Method with Imputation Technique. Experimental observation it was found that, MSE and RMSE gradually decreases when size of the databases is gradually increases by using proposed Method. Also MSE and RMSE gradually increase when size of the databases is gradually increases by using simple imputation technique. Accuracy is also increases with increases size of the databases.


2014 ◽  
Vol 51 (2) ◽  
pp. 89-102 ◽  
Author(s):  
Kuang Hongyu ◽  
Marisol García-Peña ◽  
Lúcio Borges de Araújo ◽  
Carlos Tadeu dos Santos Dias

Abstract The genotype by environment interaction (GEI)) has an influence on the selection and recommendation of cultivars. The aim of this work is to study the effect of GEI and evaluate the adaptability and stability of productivity (kg/ha) of nine maize genotypes using AMMI model (Additive Main effects and Multiplicative Interaction). The AMMI model is one of the most widely used statistical tools in the analysis of multiple-environment trials. It has two purposes, namely understanding complex GEI and increasing accuracy. Nevertheless, the AMMI model is a widely used tool for the analysis of multiple-environment trials, where the data are represented by a two-way table of GEI means. In the complete tables, least squares estimation for the AMMI model is equivalent to fitting an additive two-way ANOVA model for the main effects and applying a singular value decomposition to the interaction residuals. It assumes equal weights for all GEI means implicitly. The experiments were conducted in twenty environments, and the experimental design was a randomized complete block design with four repetitions. The AMMI model identified the best combinations of genotypes and environments with respect to the response variable. This paper concerns a basic and a common application of AMMI: yield-trial analysis without consideration of special structure or additional data for either genotypes or environments.


Author(s):  
Om Prakash Yadav ◽  
A. K. Razdan ◽  
Bupesh Kumar ◽  
Praveen Singh ◽  
Anjani K. Singh

Genotype by environment interaction (GEI) of 18 barley varieties was assessed during two successive rabi crop seasons so as to identify high yielding and stable barley varieties. AMMI analysis showed that genotypes (G), environment (E) and GEI accounted for 1672.35, 78.25 and 20.51 of total variance, respectively. Partitioning of sum of squares due to GEI revealed significance of interaction principal component axis IPCA1 only On the basis of AMMI biplot analysis DWRB 137 (41.03qha–1), RD 2715 (32.54qha–1), BH 902 (37.53qha–1) and RD 2907 (33.29qha–1) exhibited grain yield superiority of 64.45, 30.42, 50.42 and 33.42 per cent, respectively over farmers’ recycled variety (24.43qha–1).


Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


Sign in / Sign up

Export Citation Format

Share Document