imputation procedure
Recently Published Documents


TOTAL DOCUMENTS

29
(FIVE YEARS 12)

H-INDEX

7
(FIVE YEARS 1)

Author(s):  
Gregory J. Matthews ◽  
Karthik Bharath ◽  
Sebastian Kurtek ◽  
Juliet K. Brophy ◽  
George K. Thiruvathukal ◽  
...  

We consider the problem of classifying curves when they are observed only partially on their parameter domains. We propose computational methods for (i) completion of partially observed curves; (ii) assessment of completion variability through a nonparametric multiple imputation procedure; (iii) development of nearest neighbor classifiers compatible with the completion techniques. Our contributions are founded on exploiting the geometric notion of shape of a curve, defined as those aspects of a curve that remain unchanged under translations, rotations and reparameterizations. Explicit incorporation of shape information into the computational methods plays the dual role of limiting the set of all possible completions of a curve to those with similar shape while simultaneously enabling more efficient use of training data in the classifier through shape-informed neighborhoods. Our methods are then used for taxonomic classification of partially observed curves arising from images of fossilized Bovidae teeth, obtained from a novel anthropological application concerning paleoenvironmental reconstruction.


2021 ◽  
pp. 096228022110473
Author(s):  
Lauren J Beesley ◽  
Irina Bondarenko ◽  
Michael R Elliot ◽  
Allison W Kurian ◽  
Steven J Katz ◽  
...  

Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation, also called chained equations multiple imputation. In this approach, we impute missing values using regression models for each variable, conditional on the other variables in the data. This approach, however, assumes that the missingness mechanism is missing at random, and it is not well-justified under not-at-random missingness without additional modification. In this paper, we describe how we can generalize the sequential regression multiple imputation imputation procedure to handle missingness not at random in the setting where missingness may depend on other variables that are also missing but not on the missing variable itself, conditioning on fully observed variables. We provide algebraic justification for several generalizations of standard sequential regression multiple imputation using Taylor series and other approximations of the target imputation distribution under missingness not at random. Resulting regression model approximations include indicators for missingness, interactions, or other functions of the missingness not at random missingness model and observed data. In a simulation study, we demonstrate that the proposed sequential regression multiple imputation modifications result in reduced bias in the final analysis compared to standard sequential regression multiple imputation, with an approximation strategy involving inclusion of an offset in the imputation model performing the best overall. The method is illustrated in a breast cancer study, where the goal is to estimate the prevalence of a specific genetic pathogenic variant.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Cattram D. Nguyen ◽  
John B. Carlin ◽  
Katherine J. Lee

AbstractMultiple imputation is a recommended method for handling incomplete data problems. One of the barriers to its successful use is the breakdown of the multiple imputation procedure, often due to numerical problems with the algorithms used within the imputation process. These problems frequently occur when imputation models contain large numbers of variables, especially with the popular approach of multivariate imputation by chained equations. This paper describes common causes of failure of the imputation procedure including perfect prediction and collinearity, focusing on issues when using Stata software. We outline a number of strategies for addressing these issues, including imputation of composite variables instead of individual components, introducing prior information and changing the form of the imputation model. These strategies are illustrated using a case study based on data from the Longitudinal Study of Australian Children.


Author(s):  
Hideki Onishi ◽  
Izumi Sato ◽  
Nozomu Uchida ◽  
Takao Takahashi ◽  
Daisuke Furuya ◽  
...  

Abstract Background/Objectives Recent studies have revealed thiamine deficiency (TD) as a cause of delirium in cancer patients. However, the extent to which Wernicke encephalopathy is present and in what patients is not well understood. Subjects/Methods In this retrospective descriptive study, we investigated referred cancer patients who were diagnosed with delirium by a psycho-oncologist to clarify the proportion of TD, the therapeutic effect of thiamine administration, and the factors involved in its onset. Results Among 71 patients diagnosed with delirium by a psycho-oncologist, TD was found in 45% of the patients. Intravenous administration of thiamine led to a recovery in about 60% of these patients. We explored the factors associated with TD using a multivariable regression model with a Markov chain Monte Carlo imputation procedure. We found an association between TD and chemotherapy (adjusted odds ratio, 1.98 [95% confidence interval, 1.04–3.77]); however, there were no significant associations between TD and the other factors we considered. Conclusions TD is not particularly rare in delirium patients undergoing psychiatric consultation. The delirium was resolved in more than half of these patients by intravenous administration of thiamine. Oncologists should consider TD as a cause of delirium in cancer patients. Further prospective study is needed to clarify the relationship between TD and delirium in cancer patients.


Animals ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 86
Author(s):  
Héctor Marina ◽  
Aroa Suarez-Vega ◽  
Rocío Pelayo ◽  
Beatriz Gutiérrez-Gil ◽  
Antonio Reverter ◽  
...  

Transitioning from traditional to new genotyping technologies requires the development of bridging methodologies to avoid extra genotyping costs. This study aims to identify the optimum number of single nucleotide polymorphisms (SNPs) necessary to accurately impute microsatellite markers to develop a low-density SNP chip for parentage verification in the Assaf sheep breed. The accuracy of microsatellite marker imputation was assessed with three metrics: genotype concordance (C), genotype dosage (length r2), and allelic dosage (allelic r2), for all imputation scenarios tested (0.5–10 Mb microsatellite flanking SNP windows). The imputation accuracy for the three metrics analyzed for all haplotype lengths tested was higher than 0.90 (C), 0.80 (length r2), and 0.75 (allelic r2), indicating strong genotype concordance. The window with 2 Mb length provides the best accuracy for the imputation procedure and the design of an affordable low-density SNP chip for parentage testing. We additionally evaluated imputation performance under two null models, naive (imputing the most common allele) and random (imputing by randomly selecting the allele), which in comparison showed weak genotype concordances (0.41 and 0.15, respectively). Therefore, we describe a precise methodology in the present article to impute multiallelic microsatellite genotypes from a low-density SNP chip in sheep and solve the problem of parentage verification when different genotyping platforms have been used across generations.


2020 ◽  
Author(s):  
Hector Marina ◽  
Aroa Suarez-Vega ◽  
Rocio Pelayo ◽  
Beatriz Gutierrez-Gil ◽  
Antonio Reverter ◽  
...  

Abstract Background: Traditional and new genotyping technologies must be combined by applying bridge methodologies that avoid double genotyping costs. This study aims to identify and evaluate a reliable approach to precisely impute microsatellite markers from SNP-chip panels to perform parental verifications in sheep. Moreover, we will assess the optimum number of SNPs necessary to accurately impute microsatellite markers to develop a low-density SNP chip for parentage verification in the Assaf sheep breed.Results: A total of 4,423 animals belonging to the Spanish Assaf sheep breed were genotyped for 19 microsatellites and an ovine custom 49,897 SNP array. The accuracy of microsatellite marker imputation, performed with BEAGLE v5.1 software, was assessed with three metrics, namely, genotype concordance (C), genotype dosage (length r2), and allelic dosage (allelic r2), for all imputation scenarios tested (0.5-10 Mb microsatellite flanking SNP windows). The accuracy of our imputation results for the three metrics analyzed for all haplotype lengths tested was higher than 0.90 (C), 0.80 (length r2), and 0.75 (allelic r2). Considering that the objective of the study was to assess a SNP window length that provides the best accuracy for the microsatellite imputation procedure to design an affordable low-density SNP chip for parentage testing, we considered 2 Mb to be the best SNP haplotype length for further analyses (SNPs/window =74.05, C= 0.970; length r2= 0.952, allelic r2=0.899). We additionally evaluated imputation performance under two null models, naive and random, which showed weak genotype concordance averages in comparison with imputed microsatellites (0.41 and 0.15, respectively).Conclusions: We presented for the first time a precise methodology in dairy sheep to impute multiallelic microsatellite genotypes from biallelic SNP markers. The use of a 2 Mb SNP flanking window for each microsatellite has been shown to achieve high accuracy in the imputation procedure while providing a low-density SNP chip that could be cost-effective. The results from this study will undoubtedly have a significant impact on sheep breeders overcoming the problem of parentage verification when different genotyping platforms have been used across generations.


2020 ◽  
Vol 98 (12) ◽  
Author(s):  
Héctor Marina ◽  
Antonio Reverter ◽  
Beatriz Gutiérrez-Gil ◽  
Pamela Almeida Alexandre ◽  
Rocío Pelayo ◽  
...  

Abstract Sheep milk is mainly intended to manufacture a wide variety of high-quality cheeses. The ovine cheese industry would benefit from an improvement, through genetic selection, of traits related to the milk coagulation properties (MCPs) and cheese yield-related traits, broadly denoted as “cheese-making traits.” Considering that routine measurements of these traits needed for genetic selection are expensive and time-consuming, this study aimed to evaluate the accuracy of a cheese-making phenotype imputation method based on the information from official milk control records combined with the pH of the milk. For this study, we analyzed records of milk production traits, milk composition traits, and measurements of cheese-making traits available from a total of 1,145 dairy ewes of the Spanish Assaf sheep breed. Cheese-making traits included five related to the MCPs and two cheese yield-related traits. The milk and cheese-making phenotypes were adjusted for significant effects based on a general linear model. The adjusted phenotypes were used to define a multiple-phenotype imputation procedure for the cheese-making traits based on multivariate normality and Markov chain Monte Carlo sampling. Five of the seven cheese-making traits considered in this study achieved a prediction accuracy of 0.60 computed as the correlation between the adjusted phenotypes and the imputed phenotypes. Particularly the logarithm of curd-firming time since rennet addition (logK20) (0.68), which has been previously suggested as a potential candidate trait to improve the cheese ability in this breed, and the logarithm of the ratio between the rennet clotting time and the curd firmness at 60 min (logRCT/A60) (0.65), which has been defined by other studies as an indicator trait of milk coagulation efficiency. This study represents a first step toward the possible use of the phenotype imputation of cheese-making traits to develop a practical methodology for the dairy sheep industry to impute cheese-making traits only based on the analysis of a milk sample without the need of pedigree information. This information could be also used in future planning of specific breeding programs considering the importance of the cheese-making efficiency in dairy sheep and highlights the potential of phenotype imputation to leverage sample size on expensive, hard-to-measure phenotypes.


Author(s):  
Sixia Chen ◽  
David Haziza ◽  
Zeinab Mashreghi

Abstract Item nonresponse in surveys is usually dealt with through single imputation. It is well known that treating the imputed values as if they were observed values may lead to serious underestimation of the variance of point estimators. In this article, we propose three pseudo-population bootstrap schemes for estimating the variance of imputed estimators obtained after applying a multiply robust imputation procedure. The proposed procedures can handle large sampling fractions and enjoy the multiple robustness property. Results from a simulation study suggest that the proposed methods perform well in terms of relative bias and coverage probability, for both population totals and quantiles.


Sign in / Sign up

Export Citation Format

Share Document