scholarly journals Variable Selection and Joint Estimation of Mean and Covariance Models with an Application to eQTL Data

2018 ◽  
Vol 2018 ◽  
pp. 1-13
Author(s):  
JungJun Lee ◽  
SungHwan Kim ◽  
Jae-Hwan Jhong ◽  
Ja-Yong Koo

In genomic data analysis, it is commonplace that underlying regulatory relationship over multiple genes is hardly ascertained due to unknown genetic complexity and epigenetic regulations. In this paper, we consider a joint mean and constant covariance model (JMCCM) that elucidates conditional dependent structures of genes with controlling for potential genotype perturbations. To this end, the modified Cholesky decomposition is utilized to parametrize entries of a precision matrix. The JMCCM maximizes the likelihood function to estimate parameters involved in the model. We also develop a variable selection algorithm that selects explanatory variables and Cholesky factors by exploiting the combination of the GCV and BIC as benchmarks, together with Rao and Wald statistics. Importantly, we notice that sparse estimation of a precision matrix (or equivalently gene network) is effectively achieved via the proposed variable selection scheme and contributes to exploring significant hub genes shown to be concordant to a priori biological evidence. In simulation studies, we confirm that our model selection efficiently identifies the true underlying networks. With an application to miRNA and SNPs data from yeast (a.k.a. eQTL data), we demonstrate that constructed gene networks reproduce validated biological and clinical knowledge with regard to various pathways including the cell cycle pathway.

Mathematics ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 222
Author(s):  
Juan C. Laria ◽  
M. Carmen Aguilera-Morillo ◽  
Enrique Álvarez ◽  
Rosa E. Lillo ◽  
Sara López-Taruella ◽  
...  

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.


2015 ◽  
Author(s):  
Aurélie Pirayre ◽  
Camille Couprie ◽  
Frédérique Bidard ◽  
Laurent Duval ◽  
Jean-Christophe Pesquet

Background: Inferring gene networks from high-throughput data constitutes an important step in the discovery of relevant regulatory relationships in organism cells. Despite the large number of available Gene Regulatory Network inference methods, the problem remains challenging: the underdetermination in the space of possible solutions requires additional constraints that incorporate a priori information on gene interactions. Methods: Weighting all possible pairwise gene relationships by a probability of edge presence, we formulate the regulatory network inference as a discrete variational problem on graphs. We enforce biologically plausible coupling between groups and types of genes by minimizing an edge labeling functional coding for a priori structures. The optimization is carried out with Graph cuts, an approach popular in image processing and computer vision. We compare the inferred regulatory networks to results achieved by the mutual-information-based Context Likelihood of Relatedness (CLR) method and by the state-of-the-art GENIE3, winner of the DREAM4 multifactorial challenge. Results: Our BRANE Cut approach infers more accurately the five DREAM4 in silico networks (with improvements from 6% to 11%). On a real Escherichia coli compendium, an improvement of 11.8% compared to CLR and 3% compared to GENIE3 is obtained in terms of Area Under Precision-Recall curve. Up to 48 additional verified interactions are obtained over GENIE3 for a given precision. On this dataset involving 4345 genes, our method achieves a performance similar to that of GENIE3, while being more than seven times faster. The BRANE Cut code is available at: http://www-syscom.univ-mlv.fr/~pirayre/Codes-GRN-BRANE-cut.html Conclusions: BRANE Cut is a weighted graph thresholding method. Using biologically sound penalties and data-driven parameters, it improves three state-of-the-art GRN inference methods. It is applicable as a generic network inference post-processing, due its computational efficiency.


Author(s):  
S.G. Vorona ◽  
S.N. Bulychev

The article deals with the issue of stealth of radio-electronic means, energy and structural, radio-electronic masking and ways of its implementation. The structure of the unknown signal for exploration and its parameters, as well as the a posteriori probability of each signal associated with the a priori likelihood function and the cases of its solution. The advantages and disadvantages of broadband signals and their characteristics used in modern radars are considered. On the basis of which conclusions are drawn: LFM radio pulse and a single FCM pulse, used in target tracking modes, has high resolution capabilities in range and radial velocity. The ACF of the FCM pulse has side lobes that raise the target detection threshold, as a result of which radar targets with a weak echo signal can be missed. The considered signals do not provide energy and structural stealth of the radar operation.


Water ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 2161
Author(s):  
Ruicheng Zhang ◽  
Nianqing Zhou ◽  
Xuemin Xia ◽  
Guoxian Zhao ◽  
Simin Jiang

Multicomponent reactive transport modeling is a powerful tool for the comprehensive analysis of coupled hydraulic and biochemical processes. The performance of the simulation model depends on the accuracy of related model parameters whose values are usually difficult to determine from direct measurements. In this situation, estimates of these uncertain parameters can be obtained by solving inverse problems. In this study, an efficient data assimilation method, the iterative local updating ensemble smoother (ILUES), is employed for the joint estimation of hydraulic parameters, biochemical parameters and contaminant source characteristics in the sequential biodegradation process of tetrachloroethene (PCE). In the framework of the ILUES algorithm, parameter estimation is realized by updating local ensemble with the iterative ensemble smoother (IES). To better explore the parameter space, the original ILUES algorithm is modified by determining the local ensemble partly with a linear ranking selection scheme. Numerical case studies based on the sequential biodegradation of PCE are then used to evaluate the performance of the ILUES algorithm. The results show that the ILUES algorithm is able to achieve an accurate joint estimation of related model parameters in the reactive transport model.


Author(s):  
Xuan Cao ◽  
Lili Ding ◽  
Tesfaye B. Mersha

AbstractIn this study, we conduct a comparison of three most recent statistical methods for joint variable selection and covariance estimation with application of detecting expression quantitative trait loci (eQTL) and gene network estimation, and introduce a new hierarchical Bayesian method to be included in the comparison. Unlike the traditional univariate regression approach in eQTL, all four methods correlate phenotypes and genotypes by multivariate regression models that incorporate the dependence information among phenotypes, and use Bayesian multiplicity adjustment to avoid multiple testing burdens raised by traditional multiple testing correction methods. We presented the performance of three methods (MSSL – Multivariate Spike and Slab Lasso, SSUR – Sparse Seemingly Unrelated Bayesian Regression, and OBFBF – Objective Bayes Fractional Bayes Factor), along with the proposed, JDAG (Joint estimation via a Gaussian Directed Acyclic Graph model) method through simulation experiments, and publicly available HapMap real data, taking asthma as an example. Compared with existing methods, JDAG identified networks with higher sensitivity and specificity under row-wise sparse settings. JDAG requires less execution in small-to-moderate dimensions, but is not currently applicable to high dimensional data. The eQTL analysis in asthma data showed a number of known gene regulations such as STARD3, IKZF3 and PGAP3, all reported in asthma studies. The code of the proposed method is freely available at GitHub (https://github.com/xuan-cao/Joint-estimation-for-eQTL).


2019 ◽  
Vol 12 (05) ◽  
pp. 1950050
Author(s):  
Chun-Jing Li ◽  
Hong-Mei Zhao ◽  
Xiao-Gang Dong

This paper develops the Bayesian empirical likelihood (BEL) method and the BEL variable selection for linear regression models with censored data. Empirical likelihood is a multivariate analysis tool that has been widely applied to many fields such as biomedical and social sciences. By introducing two special priors to the empirical likelihood function, we find two obvious superiorities of the BEL methods, that is (i) more precise coverage probabilities of the BEL credible region and (ii) higher accuracy and correct identification rate of the BEL model selection using an hierarchical Bayesian model, vs. some current methods such as the LASSO, ALASSO and SCAD. The numerical simulations and empirical analysis of two data examples show strong competitiveness of the proposed method.


Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 1004
Author(s):  
Marco Antonio Florenzano Mollinetti ◽  
Bernardo Bentes Gatto ◽  
Mário Tasso Ribeiro Serra Neto ◽  
Takahito Kuno

Artificial Bee Colony (ABC) is a Swarm Intelligence optimization algorithm well known for its versatility. The selection of decision variables to update is purely stochastic, incurring several issues to the local search capability of the ABC. To address these issues, a self-adaptive decision variable selection mechanism is proposed with the goal of balancing the degree of exploration and exploitation throughout the execution of the algorithm. This selection, named Adaptive Decision Variable Matrix (A-DVM), represents both stochastic and deterministic parameter selection in a binary matrix and regulates the extent of how much each selection is employed based on the estimation of the sparsity of the solutions in the search space. The influence of the proposed approach to performance and robustness of the original algorithm is validated by experimenting on 15 highly multimodal benchmark optimization problems. Numerical comparison on those problems is made against the ABC and their variants and prominent population-based algorithms (e.g., Particle Swarm Optimization and Differential Evolution). Results show an improvement in the performance of the algorithms with the A-DVM in the most challenging instances.


2008 ◽  
Vol 54 (3) ◽  
pp. 559-566 ◽  
Author(s):  
Ferruccio Ceriotti ◽  
James C Boyd ◽  
Gerhard Klein ◽  
Joseph Henny ◽  
Josep Queraltó ◽  
...  

Abstract Background: Reference intervals for serum creatinine remain relevant despite the current emphasis on the use of the estimated glomerular filtration rate for assessing renal function. Many studies on creatinine reference values have been published in the last 20 years. Using criteria derived from published IFCC documents, we sought to identify universally applicable reference intervals for creatinine via a systematic review of the literature. Methods: Studies were selected for inclusion in the systematic review only if the following criteria were met: (a) reference individuals were selected using an “a priori” selection scheme, (b) preanalytical conditions were adequately described; (c) traceability of the produced results to the isotope dilution–mass spectrometry (IDMS) reference method was demonstrated experimentally, and (d) the collected data received adequate statistical treatment. Results: Of 37 reports dealing specifically with serum creatinine reference values, only 1 report with pediatric data and 5 reports with adult data met these criteria. The primary reason for exclusion of most papers was an inadequate demonstration of measurement traceability. Based on the data of the selected studies, we have collated recommended reference intervals for white adults and children. Conclusion: Laboratories using methods producing traceable results to IDMS can apply the selected reference intervals for serum creatinine in evaluating white individuals.


2016 ◽  
Vol 1 ◽  
pp. 54-60
Author(s):  
Leontii Muradian

Based on the theoretical analysis and Bayesian statistics ordinary shown that Bayesian analysis begins with the known data from the following consideration changes in knowledge process of obtaining new information and mathematical statistics methods of sample observation comes only with the knowledge of some group of objects. Using Bayesian formula, we can determine the probability of any event, provided that there was another statistically correlated with it an event that counted with greater accuracy the likelihood. This used previously known information and data obtained as a result of new observations. The study of failures of freight cars, the Bayesian approach allows you to evaluate the occurrence of each failure of parts or assemblies separately, as well as through changes in the formula for the total probability. The paper, based on Bayesian method was done combining two models: the failures of freight cars and the changing physical and mechanical properties of composite materials. This posterior probability determined a priori probability of failures given using the model change of physical and mechanical properties and the likelihood function that takes into account the additional value failures. Using the expression for the posterior probability held specification mentioned developments (run) freight wagon to failure.


Biometrika ◽  
2020 ◽  
Author(s):  
Sunyoung Shin ◽  
Yufeng Liu ◽  
Stephen R Cole ◽  
Jason P Fine

Summary We consider scenarios in which the likelihood function for a semiparametric regression model factors into separate components, with an efficient estimator of the regression parameter available for each component. An optimal weighted combination of the component estimators, named an ensemble estimator, may be employed as an overall estimate of the regression parameter, and may be fully efficient under uncorrelatedness conditions. This approach is useful when the full likelihood function may be difficult to maximize, but the components are easy to maximize. It covers settings where the nuisance parameter may be estimated at different rates in the component likelihoods. As a motivating example we consider proportional hazards regression with prospective doubly censored data, in which the likelihood factors into a current status data likelihood and a left-truncated right-censored data likelihood. Variable selection is important in such regression modelling, but the applicability of existing techniques is unclear in the ensemble approach. We propose ensemble variable selection using the least squares approximation technique on the unpenalized ensemble estimator, followed by ensemble re-estimation under the selected model. The resulting estimator has the oracle property such that the set of nonzero parameters is successfully recovered and the semiparametric efficiency bound is achieved for this parameter set. Simulations show that the proposed method performs well relative to alternative approaches. Analysis of an AIDS cohort study illustrates the practical utility of the method.


Sign in / Sign up

Export Citation Format

Share Document