scholarly journals A Bayesian network approach incorporating imputation of missing data enables exploratory analysis of complex causal biological relationships

PLoS Genetics ◽  
2021 ◽  
Vol 17 (9) ◽  
pp. e1009811
Author(s):  
Richard Howey ◽  
Alexander D. Clark ◽  
Najib Naamane ◽  
Louise N. Reynard ◽  
Arthur G. Pratt ◽  
...  

Bayesian networks can be used to identify possible causal relationships between variables based on their conditional dependencies and independencies, which can be particularly useful in complex biological scenarios with many measured variables. Here we propose two improvements to an existing method for Bayesian network analysis, designed to increase the power to detect potential causal relationships between variables (including potentially a mixture of both discrete and continuous variables). Our first improvement relates to the treatment of missing data. When there is missing data, the standard approach is to remove every individual with any missing data before performing analysis. This can be wasteful and undesirable when there are many individuals with missing data, perhaps with only one or a few variables missing. This motivates the use of imputation. We present a new imputation method that uses a version of nearest neighbour imputation, whereby missing data from one individual is replaced with data from another individual, their nearest neighbour. For each individual with missing data, the subsets of variables to be used to select the nearest neighbour are chosen by sampling without replacement the complete data and estimating a best fit Bayesian network. We show that this approach leads to marked improvements in the recall and precision of directed edges in the final network identified, and we illustrate the approach through application to data from a recent study investigating the causal relationship between methylation and gene expression in early inflammatory arthritis patients. We also describe a second improvement in the form of a pseudo-Bayesian approach for upweighting certain network edges, which can be useful when there is prior evidence concerning their directions.

2019 ◽  
Vol 6 (339) ◽  
pp. 73-98
Author(s):  
Małgorzata Aleksandra Misztal

The problem of incomplete data and its implications for drawing valid conclusions from statistical analyses is not related to any particular scientific domain, it arises in economics, sociology, education, behavioural sciences or medicine. Almost all standard statistical methods presume that every object has information on every variable to be included in the analysis and the typical approach to missing data is simply to delete them. However, this leads to ineffective and biased analysis results and is not recommended in the literature. The state of the art technique for handling missing data is multiple imputation. In the paper, some selected multiple imputation methods were taken into account. Special attention was paid to using principal components analysis (PCA) as an imputation method. The goal of the study was to assess the quality of PCA‑based imputations as compared to two other multiple imputation techniques: multivariate imputation by chained equations (MICE) and missForest. The comparison was made by artificially simulating different proportions (10–50%) and mechanisms of missing data using 10 complete data sets from the UCI repository of machine learning databases. Then, missing values were imputed with the use of MICE, missForest and the PCA‑based method (MIPCA). The normalised root mean square error (NRMSE) was calculated as a measure of imputation accuracy. On the basis of the conducted analyses, missForest can be recommended as a multiple imputation method providing the lowest rates of imputation errors for all types of missingness. PCA‑based imputation does not perform well in terms of accuracy.


Author(s):  
Daiheng Ni ◽  
John D. Leonard

The rich data on intelligent transportation systems (ITS) are a precious resource for transportation researchers and practitioners. However, the usability of this resource is greatly limited by missing data. Many imputation methods have been proposed in the past decade. However, some issues are still not addressed or are not sufficiently addressed, for example, the missing of entire records, temporal correlation in observations, natural characteristics in raw data, and unbiased estimates for missing values. This paper proposes an advanced imputation method based on recent development in other disciplines, especially applied statistics. The method uses a Bayesian network to learn from the raw data and a Markov chain Monte Carlo technique to sample from the probability distributions learned by the Bayesian network. It imputes the missing data multiple times and makes statistical inferences about the result. In addition, the method incorporates a time series model so that it allows data missing in entire rows–-an unfavorable missing pattern frequently seen in ITS data. Empirical study shows that the proposed method is robust and accurate. It is ideal for use as a high-quality imputation method for off-line application.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


2021 ◽  
Vol 80 (Suppl 1) ◽  
pp. 744.1-744
Author(s):  
M. Russell ◽  
F. Coath ◽  
M. Yates ◽  
K. Bechman ◽  
S. Norton ◽  
...  

Background:Diagnostic delay is a significant problem in axial spondyloarthritis (axSpA), and there is a growing body of evidence showing that delayed axSpA diagnosis is associated with worse clinical, humanistic and economic outcomes.1 International guidelines have been published to inform referral pathways and improve standards of care for patients with axSpA.2,3Objectives:To describe the sociodemographic and clinical characteristics of newly-referred patients with axSpA in England and Wales in the National Early Inflammatory Arthritis Audit (NEIAA), with rheumatoid arthritis (RA) and mechanical back pain (MBP) as comparators.Methods:The NEIAA captures data on all new patients over the age of 16 referred with suspected inflammatory arthritis to rheumatology departments in England and Wales.4 We describe baseline sociodemographic and clinical characteristics of axSpA patients (n=784) recruited to the NEIAA between May 2018 and March 2020, compared with RA (n=9,270) and MBP (n=370) during the same period.Results:Symptom duration prior to initial rheumatology assessment was significantly longer in axSpA than RA patients (p<0.001), and non-significantly longer in axSpA than MBP patients (p=0.062): 79.7% of axSpA patients had symptom durations of >6 months, compared to 33.7% of RA patients and 76.0% of MBP patients; 32.6% of axSpA patients had symptom durations of >5 years, compared to 3.5% of RA patients and 24.6% of MBP patients (Figure 1A). Following referral, median time to initial rheumatology assessment was longer for axSpA than RA patients (36 vs. 24 days; p<0.001), and similar to MBP patients (39 days; p=0.30). The proportion of axSpA patients assessed within 3 weeks of referral increased from 26.7% in May 2018 to 34.7% in March 2020; compared to an increase from 38.2% to 54.5% for RA patients (Figure 1B). A large majority of axSpA referrals originated from primary care (72.4%) or musculoskeletal triage services (14.1%), with relatively few referrals from gastroenterology (1.9%), ophthalmology (1.4%) or dermatology (0.4%).Of the subset of patients with peripheral arthritis requiring EIA pathway follow-up, fewer axSpA than RA patients had disease education provided (77.5% vs. 97.8%; p<0.001), and RA patients reported a better understanding of their condition (p<0.001). HAQ-DI scores were lower at baseline in axSpA EIA patients than RA EIA patients (0.8 vs 1.1, respectively; p=0.004), whereas baseline Musculoskeletal Health Questionnaire (MSK-HQ) scores were similar (25 vs. 24, respectively; p=0.49). The burden of disease was substantial across the 14 domains comprising MSK-HQ in both axSpA and RA (Figure 1C).Conclusion:We have shown that diagnostic delay remains a major challenge in axSpA, despite improved disease understanding and updated referral guidelines. Patient education is an unmet need in axSpA, highlighting the need for specialist clinics. MSK-HQ scores demonstrated that the functional impact of axSpA is no less than for RA, whereas HAQ-DI may underrepresent disability in axSpA.References:[1]Yi E, Ahuja A, Rajput T, George AT, Park Y. Clinical, economic, and humanistic burden associated with delayed diagnosis of axial spondyloarthritis: a systematic review. Rheumatol Ther. 2020;7:65-87.[2]NICE. Spondyloarthritis in over 16s: diagnosis and management. 2017.[3]van der Heijde D, Ramiro S, Landewe R, et al. 2016 update of the ASAS-EULAR management recommendations for axial spondyloarthritis. Ann Rheum Dis. 2017;76(6):978-91.[4]British Society for Rheumatology. National Early Inflammatory Arthritis Audit (NEIAA) Second Annual Report. 2021.Acknowledgements:The National Early Inflammatory Arthritis Audit is commissioned by the Healthcare Quality Improvement Partnership, funded by NHS England and Improvement, and the Welsh Government, and carried out by the British Society for Rheumatology, King’s College London and Net Solving.Disclosure of Interests:Mark Russell Grant/research support from: UCB, Pfizer, Fiona Coath: None declared, Mark Yates Grant/research support from: UCB, Abbvie, Katie Bechman: None declared, Sam Norton: None declared, James Galloway Grant/research support from: Abbvie, Celgene, Chugai, Gilead, Janssen, Lilly, Pfizer, Roche, UCB, Jo Ledingham: None declared, Raj Sengupta Grant/research support from: AbbVie, Biogen, Celgene, Lilly, MSD, Novartis, Pfizer, Roche, UCB, Karl Gaffney Grant/research support from: AbbVie, Biogen, Cellgene, Celltrion, Janssen, Lilly, Novartis, Pfizer, Roche, UCB.


Sign in / Sign up

Export Citation Format

Share Document