data contamination
Recently Published Documents


TOTAL DOCUMENTS

55
(FIVE YEARS 22)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Vol 12 ◽  
Author(s):  
Xi Lu ◽  
Kun Fan ◽  
Jie Ren ◽  
Cen Wu

In high-throughput genetics studies, an important aim is to identify gene–environment interactions associated with the clinical outcomes. Recently, multiple marginal penalization methods have been developed and shown to be effective in G×E studies. However, within the Bayesian framework, marginal variable selection has not received much attention. In this study, we propose a novel marginal Bayesian variable selection method for G×E studies. In particular, our marginal Bayesian method is robust to data contamination and outliers in the outcome variables. With the incorporation of spike-and-slab priors, we have implemented the Gibbs sampler based on Markov Chain Monte Carlo (MCMC). The proposed method outperforms a number of alternatives in extensive simulation studies. The utility of the marginal robust Bayesian variable selection method has been further demonstrated in the case studies using data from the Nurse Health Study (NHS). Some of the identified main and interaction effects from the real data analysis have important biological implications.


Mathematics ◽  
2021 ◽  
Vol 9 (19) ◽  
pp. 2394
Author(s):  
Kang-Ping Lu ◽  
Shao-Tung Chang

Regression models with change-points have been widely applied in various fields. Most methodologies for change-point regressions assume Gaussian errors. For many real data having longer-than-normal tails or atypical observations, the use of normal errors may unduly affect the fit of change-point regression models. This paper proposes two robust algorithms called EMT and FCT for change-point regressions by incorporating the t-distribution with the expectation and maximization algorithm and the fuzzy classification procedure, respectively. For better resistance to high leverage outliers, we introduce a modified version of the proposed method, which fits the t change-point regression model to the data after moderately pruning high leverage points. The selection of the degrees of freedom is discussed. The robustness properties of the proposed methods are also analyzed and validated. Simulation studies show the effectiveness and resistance of the proposed methods against outliers and heavy-tailed distributions. Extensive experiments demonstrate the preference of the t-based approach over normal-based methods for better robustness and computational efficiency. EMT and FCT generally work well, and FCT always performs better for less biased estimates, especially in cases of data contamination. Real examples show the need and the practicability of the proposed method.


2021 ◽  
Author(s):  
Ying Wang ◽  
Hao Yuan ◽  
Junman Huang ◽  
Chenhong Li

Abstract High-throughput sequencing involves library preparation and amplification steps, which may induce contamination across samples or between samples and the environment. We tested the effect of applying an inline-index strategy, in which DNA indices of 6 bp were added to both ends of the inserts at the ligation step of library prep for resolving the data contamination problem. Our results showed that the contamination ranged from 0.29–1.25% in one experiment and from 0.83–27.01% in the other. We also found that contamination could be environmental or from reagents besides cross-contamination between samples. Inline-index method is a useful experimental design to clean up the data and address the contamination problem which has been plaguing high-throughput sequencing data in many applications.


2021 ◽  
Vol 19 ◽  
pp. 338-343
Author(s):  
A. Insuasty ◽  
◽  
C. Tutivén ◽  
Y. Vidal

This work proposes a fault prognosis methodology to predict the main bearing fault several months in advance and let turbine operators plan ahead. Reducing downtime is of paramount importance in wind energy industry to address its energy loss impact. The main advantages of the proposed methodology are the following ones. It is an unsupervised approach, thus it does not require faulty data to be trained; ii) it is based only on exogenous data and one representative temperature close to the subsystem to diagnose, thus avoiding data contamination; iii) it accomplishes the prognosis (various months in advance) of the main bearing fault; and iv) the validity and performance of the established methodology is demonstrated on a real underproduction wind turbine.


2021 ◽  
pp. 107699862199436
Author(s):  
Yue Liu ◽  
Hongyun Liu

The prevalence and serious consequences of noneffortful responses from unmotivated examinees are well-known in educational measurement. In this study, we propose to apply an iterative purification process based on a response time residual method with fixed item parameter estimates to detect noneffortful responses. The proposed method is compared with the traditional residual method and noniterative method with fixed item parameters in two simulation studies in terms of noneffort detection accuracy and parameter recovery. The results show that when severity of noneffort is high, the proposed method leads to a much higher true positive rate with a small increase of false discovery rate. In addition, parameter estimation is significantly improved by the strategies of fixing item parameters and iteratively cleansing. These results suggest that the proposed method is a potential solution to reduce the impact of data contamination due to severe low test-taking effort and to obtain more accurate parameter estimates. An empirical study is also conducted to show the differences in the detection rate and parameter estimates among different approaches.


Author(s):  
Xuehong Gao ◽  
Can Cui

To determine the optimal warehouse location, it is usually assumed that the collected data are uncontaminated. However, this assumption can be easily violated due to the uncertain environment and human error in disaster response, which results in the biased estimation of the optimal warehouse location. In this study, we investigate this possibility by examining these estimation effects on the warehouse location determination. Considering different distances, we propose the corresponding estimation methods for remedying the difficulties associated with data contamination to determine the warehouse location. Although data can be contaminated in the event of a disaster, the findings of the study is much broader and applicable to any situation where the outliers exist. Through the simulations and illustrative examples, we show that solving the problem with center of gravity lead to biased solutions even if only one outlier exists in the data. Compared with the center of gravity, the proposed methods are quite efficient and outperform the existing methods when the data contamination is involved.


Author(s):  
Manju Mohan, RN, RM ◽  
Linda Varghese, RN, RM

Background: Reflexology may help induce labour and reduce pain during childbirth. Fear of pain associated with childbirth leads to increase in the irregular use of cesarean method. Purpose: This study was performed to evaluate the effect of reflexology on relieving labour pain and assess the recipient’s opinion regarding foot reflexology. Setting: The study taken place in the labour room, Amrita Institute of Medical Sciences, Kerala, South India. Participants: 50 primigravida patients experiencing labour. Research Design: A quasi-experimental study design was used. Subjects were selected by convenience sampling technique with the first 25 patients allocated to the experimental group and the successive 25 primigravida mothers to a time-control group, to avoid data contamination. Intervention: Intervention consisted of foot reflexology applied by a trained therapist to five pressure points of both feet that correspond to the uterus. Total intervention time lasted 20 minutes. Control group rested quietly for 20 minutes to serve as a time control. Main Outcome Measure(s): Pain associated with labour was recorded on a visual analogue scale immediately prior to intervention, and at 20- and 40-minutes postintervention. Patient satisfaction with reflexology treatment was recorded. Results: Mean baseline pain score in foot reflexology group was significantly reduced across the study timeframe relative to control group (p < .001). Post hoc tests confirmed a reduction in labour pain at both the 20-min (p < .001, 95%CI 0.764–1.796) and 40-min (p < .001, 95%CI 0.643–1.677) time points. Eighty-one per cent of patients would recommend reflexology during labour. Conclusion: The findings showed that foot reflexology was effective in relief of labour pain, with a high degree of patient satisfaction in primigravida mothers.


10.2196/20924 ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. e20924 ◽  
Author(s):  
James Francis Oehmke ◽  
Theresa B Oehmke ◽  
Lauren Nadya Singh ◽  
Lori Ann Post

Background SARS-CoV-2, the novel coronavirus that causes COVID-19, is a global pandemic with higher mortality and morbidity than any other virus in the last 100 years. Without public health surveillance, policy makers cannot know where and how the disease is accelerating, decelerating, and shifting. Unfortunately, existing models of COVID-19 contagion rely on parameters such as the basic reproduction number and use static statistical methods that do not capture all the relevant dynamics needed for surveillance. Existing surveillance methods use data that are subject to significant measurement error and other contaminants. Objective The aim of this study is to provide a proof of concept of the creation of surveillance metrics that correct for measurement error and data contamination to determine when it is safe to ease pandemic restrictions. We applied state-of-the-art statistical modeling to existing internet data to derive the best available estimates of the state-level dynamics of COVID-19 infection in the United States. Methods Dynamic panel data (DPD) models were estimated with the Arellano-Bond estimator using the generalized method of moments. This statistical technique enables control of various deficiencies in a data set. The validity of the model and statistical technique was tested. Results A Wald chi-square test of the explanatory power of the statistical approach indicated that it is valid (χ210=1489.84, P<.001), and a Sargan chi-square test indicated that the model identification is valid (χ2946=935.52, P=.59). The 7-day persistence rate for the week of June 27 to July 3 was 0.5188 (P<.001), meaning that every 10,000 new cases in the prior week were associated with 5188 cases 7 days later. For the week of July 4 to 10, the 7-day persistence rate increased by 0.2691 (P=.003), indicating that every 10,000 new cases in the prior week were associated with 7879 new cases 7 days later. Applied to the reported number of cases, these results indicate an increase of almost 100 additional new cases per day per state for the week of July 4-10. This signifies an increase in the reproduction parameter in the contagion models and corroborates the hypothesis that economic reopening without applying best public health practices is associated with a resurgence of the pandemic. Conclusions DPD models successfully correct for measurement error and data contamination and are useful to derive surveillance metrics. The opening of America involves two certainties: the country will be COVID-19–free only when there is an effective vaccine, and the “social” end of the pandemic will occur before the “medical” end. Therefore, improved surveillance metrics are needed to inform leaders of how to open sections of the United States more safely. DPD models can inform this reopening in combination with the extraction of COVID-19 data from existing websites.


Sign in / Sign up

Export Citation Format

Share Document