incomplete observations
Recently Published Documents


TOTAL DOCUMENTS

119
(FIVE YEARS 22)

H-INDEX

21
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Chris Bryan ◽  
Ehsaan Nasir

Abstract Evaluating Electrical Submersible Pumps (ESPs) [SS1] [NA2] run-lives and performance in unconventional well environments is challenging due to many different factors -including the reservoir, well design, and production fluids. Moreover, reviewing the run-lives of ESPs in a field can be rather complex since the run-life data is incomplete. Often ESPs are pulled while they are still operational, or the ESP has not been allowed to run until failure. These are some of the complications that arise when gauging ESP performance. A large dataset of ESP installs was assessed using Kaplan-Meier survival analysis for the North American unconventional application to better understand those factors that may affect ESP run lives. The factors were studied including but are not limited to the following: Basin and producing formation Comparing different ESP component types such pumps and motors, and new or used ESP components Completion intensity of the frac job (lb/ft of proppant) Kaplan-Meier survival analysis is one of the commonly used methods to measure the fraction or probability of group survival after certain time periods because it accounts for incomplete observations. Using Kaplan-Meier analysis generates a survival curve to show a declining fraction of surviving ESPs over time. Survival curves can be compared by segmenting the runlife data into buckets (based on different factors), therefore to analyze the statistical significance of each and how they affect ESP survivability. Kaplan-Meier analysis was performed on the aforementioned dataset to answer these questions in order to better understand the factors that affect ESP runlives in North American unconventional plays. This work uses a unique dataset that encompasses several different ESP designs, with the ESPs installed in different North American plays. The observations and conclusions drawn from it, by applying survival analysis, can help in benchmarking ESP runtimes and identifying what works in terms of prolonging ESP runlife. The workflow is also applicable to any asset in order to better understand the drivers behind ESP runlife performance.


2021 ◽  
Vol 13 (18) ◽  
pp. 3671
Author(s):  
Andong Wang ◽  
Guoxu Zhou ◽  
Qibin Zhao

This paper conducts a rigorous analysis for the problem of robust tensor completion, which aims at recovering an unknown three-way tensor from incomplete observations corrupted by gross sparse outliers and small dense noises simultaneously due to various reasons such as sensor dead pixels, communication loss, electromagnetic interferences, cloud shadows, etc. To estimate the underlying tensor, a new penalized least squares estimator is first formulated by exploiting the low rankness of the signal tensor within the framework of tensor ∗L-Singular Value Decomposition (∗L-SVD) and leveraging the sparse structure of the outlier tensor. Then, an algorithm based on the Alternating Direction Method of Multipliers (ADMM) is designed to compute the estimator in an efficient way. Statistically, the non-asymptotic upper bound on the estimation error is established and further proved to be optimal (up to a log factor) in a minimax sense. Simulation studies on synthetic data demonstrate that the proposed error bound can predict the scaling behavior of the estimation error with problem parameters (i.e., tubal rank of the underlying tensor, sparsity of the outliers, and the number of uncorrupted observations). Both the effectiveness and efficiency of the proposed algorithm are evaluated through experiments for robust completion on seven different types of remote sensing data.


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Melissa Middleton ◽  
Margarita Moreno-Betancur ◽  
John Carlin ◽  
Katherine J Lee

Abstract Background Multiple imputation (MI) is commonly used to address missing data in epidemiological studies, but valid use requires compatibility between the imputation and analysis models. Case-cohort studies use unequal sampling probabilities for cases and controls which are often accounted for during analyses through inverse probability weighting (IPW). It is unclear how to apply MI for missing covariates while achieving compatibility in this setting. Methods A simulation study was conducted with missingness in two covariates, motivated by a case-cohort investigation within the Barwon Infant Study. MI methods considered involved including interactions between the outcome (as a proxy for weights) and analysis variables, stratification by weights, and ignoring weights, within the context of an IPW analysis. Factors such as the target estimand, proportion of incomplete observations, missing data mechanism and subcohort selection probabilities were varied to assess performance of MI methods. Results There was similar performance in terms of bias and efficiency across the MI methods, with expected improvements compared to IPW applied to the complete cases. Precision tended to decrease as the subcohort selection probability decreased. Similar results were observed irrespective of the proportion of incomplete cases. Conclusions Our results suggest that it makes little difference how weights are incorporated in the MI model in the analysis of case-cohort studies, potentially due to only two weight classes in this setting. Key messages If and how the weights are incorporated in the imputation model may have little impact in the analysis of case-cohort studies with incomplete covariates


2021 ◽  
Author(s):  
Justin Andrews ◽  
Sheldon Gorell

Abstract Missing values and incomplete observations can exist in just about ever type of recorded data. With analytical modeling, and machine learning in particular, the quantity and quality of available data is paramount to acquiring reliable results. Within the oil industry alone, priorities in which data is important can vary from company to company, leading to available knowledge of a single field to vary from place to place. With machine learning requiring very complete sets of data, this issue can require whole portions of data to be discarded in order to create an appropriate dataset. Value imputation has emerged as a valuable solution in cleaning up datasets, and as current technology has advanced new generative machine learning methods have been used to generate images and data that is all but indistinguishable from reality. Using an adaptation of the standard Generative Adversarial Networks (GAN) approach known as a Generative Adversarial Imputation Network (GAIN), this paper evaluates this method and other imputation methods for filling in missing values. Using a gathered fully observed set of data, smaller datasets with randomly masked missing values were generated to validate the effectiveness of the various imputation methods; allowing comparisons to be made against the original dataset. The study found that with various sizes of missing data percentages withing the sets, the "filled in" data could be used with surprising accuracy for further analytics. This paper compares GAIN along with several commonly used imputation methods against more standard practices such as data cropping or filling in with average values for filling in missing data. GAIN, as well as the various imputation methods described are quantified for there ability to fill in data. The study will discuss how the GAIN model can quickly provide the data necessary for analytical studies and prediction of results for future projects.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0231754
Author(s):  
Karen M. Ong ◽  
Michael S. Phillips ◽  
Charles S. Peskin

Widespread use of antibiotics has resulted in an increase in antimicrobial-resistant microorganisms. Although not all bacterial contact results in infection, patients can become asymptomatically colonized, increasing the risk of infection and pathogen transmission. Consequently, many institutions have begun active surveillance, but in non-research settings, the resulting data are often incomplete and may include non-random testing, making conventional epidemiological analysis problematic. We describe a mathematical model and inference method for in-hospital bacterial colonization and transmission of carbapenem-resistant Enterobacteriaceae that is tailored for analysis of active surveillance data with incomplete observations. The model and inference method make use of the full detailed state of the hospital unit, which takes into account the colonization status of each individual in the unit and not only the number of colonized patients at any given time. The inference method computes the exact likelihood of all possible histories consistent with partial observations (despite the exponential increase in possible states that can make likelihood calculation intractable for large hospital units), includes techniques to improve computational efficiency, is tested by computer simulation, and is applied to active surveillance data from a 13-bed rehabilitation unit in New York City. The inference method for exact likelihood calculation is applicable to other Markov models incorporating incomplete observations. The parameters that we identify are the patient–patient transmission rate, pre-existing colonization probability, and prior-to-new-patient transmission probability. Besides identifying the parameters, we predict the effects on the total prevalence (0.07 of the total colonized patient-days) of changing the parameters and estimate the increase in total prevalence attributable to patient–patient transmission (0.02) above the baseline pre-existing colonization (0.05). Simulations with a colonized versus uncolonized long-stay patient had 44% higher total prevalence, suggesting that the long-stay patient may have been a reservoir of transmission. High-priority interventions may include isolation of incoming colonized patients and repeated screening of long-stay patients.


Sign in / Sign up

Export Citation Format

Share Document