scholarly journals Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders

2020 ◽  
Author(s):  
Muhammad Ammar Malik ◽  
Tom Michoel

AbstractLinear mixed modelling is a popular approach for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in linear mixed models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting solution is poorly characterized and the efficiency of the method may be suboptimal. Here we prove analytically that maximumlikelihood latent variables can always be chosen orthogonal to the known confounding factors, in other words, that maximum-likelihood latent variables explain sample covariances not already explained by known factors. Based on this result we propose a restricted maximum-likelihood method which estimates the latent variables by maximizing the likelihood on the restricted subspace orthogonal to the known confounding factors, and show that this reduces to probabilistic PCA on that subspace. The method then estimates the variance-covariance parameters by maximizing the remaining terms in the likelihood function given the latent variables, using a newly derived analytic solution for this problem. Compared to gradient-based optimizers, our method attains equal or higher likelihood values, can be computed using standard matrix operations, results in latent factors that don’t overlap with any known factors, and has a runtime reduced by several orders of magnitude. We anticipate that the restricted maximum-likelihood method will facilitate the application of linear mixed modelling strategies for learning latent variance components to much larger gene expression datasets than currently possible.

Author(s):  
Muhammad Ammar Malik ◽  
Tom Michoel

Abstract Random effects models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in random effects models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting solution is poorly characterized and the efficiency of the method may be suboptimal. Here we prove analytically that maximum-likelihood latent variables can always be chosen orthogonal to the known confounding factors, in other words, that maximum-likelihood latent variables explain sample covariances not already explained by known factors. Based on this result we propose a restricted maximum-likelihood method which estimates the latent variables by maximizing the likelihood on the restricted subspace orthogonal to the known confounding factors, and show that this reduces to probabilistic PCA on that subspace. The method then estimates the variance-covariance parameters by maximizing the remaining terms in the likelihood function given the latent variables, using a newly derived analytic solution for this problem. Compared to gradient-based optimizers, our method attains greater or equal likelihood values, can be computed using standard matrix operations, results in latent factors that don’t overlap with any known factors, and has a runtime reduced by several orders of magnitude. Hence the restricted maximum-likelihood method facilitates the application of random effects modelling strategies for learning latent variance components to much larger gene expression datasets than possible with current methods.


2020 ◽  
Vol 4 (1) ◽  
pp. 55-67
Author(s):  
Reny Rian Marliana ◽  
Leni Nurhayati

In this paper, a relationship model among latent variables using Covariance Based-Structural Equation Modeling (CB-SEM) is studied. The latent variables are digital literacy, use of e-resources and reading culture of students. The goal of the study is to build a simultaneously model between those three variables, determine the influence of digital literacy on the use of e-resources and reading culture of students, and the influence of the use of e-resources on reading culture of students. The parameters of the model are estimated by the Maximum Likelihood method. This study took data from 256 questionnaires of students at STMIK Sumedang. Results showed that digital literacy significantly influences the use of e-resources and the reading culture of students. In contrast, there are no significant influences on the use of e-resources on the reading culture of the student.


2013 ◽  
Vol 5 (8) ◽  
pp. 394-400 ◽  
Author(s):  
Hasna Fadhila ◽  
Nora Amelda Rizal

Value at Risk (VaR) is a tool to predict the greater loss less than the certain confidence level over a period of time. Value at Risk Historical Simulation produce reliable value of VaR because of the historical data and measure the skewness of the observe data. So, Value at Risk well used by investors to determine the risk to be faced on their investment. To calculate VAR it is better to use maximum likelihood, which has been considered for estimating from historical data and also available for estimating nonlinear model. It is also a mathematic function that can approximate return. From the maximum likelihood function with normal distribution, we can draw the normal curve at one tail test. This research conducted to calculate Value at Risk using maximum likelihood. The normal curve will be compared with data return at each bank (Bank Mandiri, Bank BRI and Bank BNI). Empirical results demonstrated that Bank BNI in 2009, Bank BRI in 2010 and Bank BNI in 2011, had less value of VaR by historical simulation in each year. It is concluded that by using maximum likelihood method in the estimation of VaR, has certain appropriates compared with the normal curve.


Mathematics ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 62 ◽  
Author(s):  
Autcha Araveeporn

This paper compares the frequentist method that consisted of the least-squares method and the maximum likelihood method for estimating an unknown parameter on the Random Coefficient Autoregressive (RCA) model. The frequentist methods depend on the likelihood function that draws a conclusion from observed data by emphasizing the frequency or proportion of the data namely least squares and maximum likelihood methods. The method of least squares is often used to estimate the parameter of the frequentist method. The minimum of the sum of squared residuals is found by setting the gradient to zero. The maximum likelihood method carries out the observed data to estimate the parameter of a probability distribution by maximizing a likelihood function under the statistical model, while this estimator is obtained by a differential parameter of the likelihood function. The efficiency of two methods is considered by average mean square error for simulation data, and mean square error for actual data. For simulation data, the data are generated at only the first-order models of the RCA model. The results have shown that the least-squares method performs better than the maximum likelihood. The average mean square error of the least-squares method shows the minimum values in all cases that indicated their performance. Finally, these methods are applied to the actual data. The series of monthly averages of the Stock Exchange of Thailand (SET) index and daily volume of the exchange rate of Baht/Dollar are considered to estimate and forecast based on the RCA model. The result shows that the least-squares method outperforms the maximum likelihood method.


Author(s):  
I. Boujenane ◽  
A. Chikhi

L’étude a porté sur l’analyse de 1 264 et 811 performances de reproduction de brebis respectivement de races Boujaâd et Sardi. Ces données ont été collectées de 1993-94 à 1999-2000 dans le domaine expérimental Déroua de l’Institut national de la recherche agronomique de Béni Mellal. Les paramètres génétiques des caractères de reproduction ont été estimés par la méthode Reml (restricted maximum likelihood method) d’estimation des composantes de la variance et de la covariance. Les répétabilités estimées chez la race Boujaâd ont été de 0,18 et 0,17 respectivement pour la taille de portée à la naissance et au sevrage, de 0,23 et 0,18 respectivement pour le poids de portée à la naissance et au sevrage, et de 0,18 pour la durée de gravidité. Chez la race Sardi, les estimations correspondantes ont été respectivement de 0,21 et 0,18, 0,24 et 0,15, et 0,16. Les héritabilités estimées chez la race Boujaâd ont été de 0,18 et 0,11 respectivement pour la taille de portée à la naissance et au sevrage, de 0,18 et 0,11 respectivement pour le poids de portée à la naissance et au sevrage, et de 0,04 pour la durée de gravidité. Les estimations correspondantes chez la race Sardi ont été respectivement de 0,21 et 0,18, 0,24 et 0,15, et 0,16. Les corrélations génétiques et phénotypiques entre ces caractères ont varié respectivement de 0,83 à 1,00 et de 0,27 à 0,93 chez la race Boujaâd, et de 0,06 à 0,96 et de 0,07 à 0,82 chez la race Sardi. Il a été conclu que ces paramètres pourraient être utilisés dans des programmes de sélection pour améliorer la productivité des brebis des races Boujaâd et Sardi.


2012 ◽  
Vol 33 (3-4) ◽  
pp. 393-400 ◽  
Author(s):  
Bálint Üveges ◽  
Bálint Halpern ◽  
Tamás Péchy ◽  
János Posta ◽  
István Komlósi

The objective of our research was to determine the heritability of head scale numbers of Vipera ursinii rakosiensis. 430 specimens (177 males and 253 females) were included in the analysis, most of which were born and raised in the Hungarian Meadow Viper Conservation Centre between 2004 and 2008. Due to the controlled breeding conditions, the dams of the offspring were known, and the sires were known in 51% of the cases. Only the ancestors of the wild caught specimens were unknown, but these animals were included as parents in the analysis. Photographic identification was used to identify and characterise the specimens, the majority over consecutive years. We counted the following scales: loreal-, circumocular-, apical-, and crown (intercanthal- and intersupraocular-) shields, as well as presence-absence data of other characteristics which are detailed further in the article. The variance and covariance components were determined via the restricted maximum likelihood method. The repeatability animal model consisted of the year of birth and the sex of the snakes as fixed effects, the dam as permanent environmental, and the animal as random effects. Heritability values varied between 0.32 and 0.70. We also report scale numbers and statistics of differences between scale numbers of sexes.


Sign in / Sign up

Export Citation Format

Share Document