scholarly journals Matrix Normal Cluster-Weighted Models

Author(s):  
Salvatore D. Tomarchio ◽  
Paul D. McNicholas ◽  
Antonio Punzo

AbstractFinite mixtures of regressions with fixed covariates are a commonly used model-based clustering methodology to deal with regression data. However, they assume assignment independence, i.e., the allocation of data points to the clusters is made independently of the distribution of the covariates. To take into account the latter aspect, finite mixtures of regressions with random covariates, also known as cluster-weighted models (CWMs), have been proposed in the univariate and multivariate literature. In this paper, the CWM is extended to matrix data, e.g., those data where a set of variables are simultaneously observed at different time points or locations. Specifically, the cluster-specific marginal distribution of the covariates and the cluster-specific conditional distribution of the responses given the covariates are assumed to be matrix normal. Maximum likelihood parameter estimates are derived using an expectation-conditional maximization algorithm. Parameter recovery, classification assessment, and the capability of the Bayesian information criterion to detect the underlying groups are investigated using simulated data. Finally, two real data applications concerning educational indicators and the Italian non-life insurance market are presented.

1985 ◽  
Vol 231 (1) ◽  
pp. 171-177 ◽  
Author(s):  
L Matyska ◽  
J Kovář

The known jackknife methods (i.e. standard jackknife, weighted jackknife, linear jackknife and weighted linear jackknife) for the determination of the parameters (as well as of their confidence regions) were tested and compared with the simple Marquardt's technique (comprising the calculation of confidence intervals from the variance-co-variance matrix). The simulated data corresponding to the Michaelis-Menten equation with defined structure and magnitude of error of the dependent variable were used for fitting. There were no essential differences between the results of both point and interval parameter estimations by the tested methods. Marquardt's procedure yielded slightly better results than the jackknives for five scattered data points (the use of this method is advisable for routine analyses). The classical jackknife was slightly superior to the other methods for 20 data points (this method can be recommended for very precise calculations if great numbers of data are available). The weighting does not seem to be necessary in this type of equation because the parameter estimates obtained with all methods with the use of constant weights were comparable with those calculated with the weights corresponding exactly to the real error structure whereas the relative weighting led to rather worse results.


2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Amal Almohisen ◽  
Robin Henderson ◽  
Arwa M. Alshingiti

In any longitudinal study, a dropout before the final timepoint can rarely be avoided. The chosen dropout model is commonly one of these types: Missing Completely at Random (MCAR), Missing at Random (MAR), Missing Not at Random (MNAR), and Shared Parameter (SP). In this paper we estimate the parameters of the longitudinal model for simulated data and real data using the Linear Mixed Effect (LME) method. We investigate the consequences of misspecifying the missingness mechanism by deriving the so-called least false values. These are the values the parameter estimates converge to, when the assumptions may be wrong. The knowledge of the least false values allows us to conduct a sensitivity analysis, which is illustrated. This method provides an alternative to a local misspecification sensitivity procedure, which has been developed for likelihood-based analysis. We compare the results obtained by the method proposed with the results found by using the local misspecification method. We apply the local misspecification and least false methods to estimate the bias and sensitivity of parameter estimates for a clinical trial example.


2020 ◽  
Vol 80 (5) ◽  
pp. 870-909 ◽  
Author(s):  
Maxwell Mansolf ◽  
Annabel Vreeker ◽  
Steven P. Reise ◽  
Nelson B. Freimer ◽  
David C. Glahn ◽  
...  

Large-scale studies spanning diverse project sites, populations, languages, and measurements are increasingly important to relate psychological to biological variables. National and international consortia already are collecting and executing mega-analyses on aggregated data from individuals, with different measures on each person. In this research, we show that Asparouhov and Muthén’s alignment method can be adapted to align data from disparate item sets and response formats. We argue that with these adaptations, the alignment method is well suited for combining data across multiple sites even when they use different measurement instruments. The approach is illustrated using data from the Whole Genome Sequencing in Psychiatric Disorders consortium and a real-data-based simulation is used to verify accurate parameter recovery. Factor alignment appears to increase precision of measurement and validity of scores with respect to external criteria. The resulting parameter estimates may further inform development of more effective and efficient methods to assess the same constructs in prospectively designed studies.


2019 ◽  
Author(s):  
Hanchen Yu ◽  
Alexander Stewart Fotheringham ◽  
Ziqi Li ◽  
Taylor M. Oshan ◽  
Levi John Wolf

Under the realization that Geographically Weighted Regression (GWR) is a data-borrowing technique, this paper derives expressions for the amount of bias introduced to local parameter estimates by borrowing data from locations where the processes might be different from those at the regression location. This is done for both GWR and Multiscale GWR (MGWR). We demonstrate the accuracy of our expressions for bias through a comparison with empirically derived estimates based on a simulated data set with known local parameter values. By being able to compute the bias in both models we are able to demonstrate the superiority of MGWR. We then demonstrate the utility of a corrected Akaike Information Criterion statistic in finding optimal bandwidths in both GWR and MGWR as a trade-off between minimizing both bias and uncertainty. We further show how bias in one set of local parameter estimates can affect the bias in another set of local estimates. The bias derived from borrowing data from other locations appears to be very small.


2019 ◽  
Vol 8 (2) ◽  
pp. 159
Author(s):  
Morteza Marzjarani

Heteroscedasticity plays an important role in data analysis. In this article, this issue along with a few different approaches for handling heteroscedasticity are presented. First, an iterative weighted least square (IRLS) and an iterative feasible generalized least square (IFGLS) are deployed and proper weights for reducing heteroscedasticity are determined. Next, a new approach for handling heteroscedasticity is introduced. In this approach, through fitting a multiple linear regression (MLR) model or a general linear model (GLM) to a sufficiently large data set, the data is divided into two parts through the inspection of the residuals based on the results of testing for heteroscedasticity, or via simulations. The first part contains the records where the absolute values of the residuals could be assumed small enough to the point that heteroscedasticity would be ignorable. Under this assumption, the error variances are small and close to their neighboring points. Such error variances could be assumed known (but, not necessarily equal).The second or the remaining portion of the said data is categorized as heteroscedastic. Through real data sets, it is concluded that this approach reduces the number of unusual (such as influential) data points suggested for further inspection and more importantly, it will lowers the root MSE (RMSE) resulting in a more robust set of parameter estimates.


Behaviour ◽  
2007 ◽  
Vol 144 (11) ◽  
pp. 1315-1332 ◽  
Author(s):  
Sebastián Luque ◽  
Christophe Guinet

AbstractForaging behaviour frequently occurs in bouts, and considerable efforts to properly define those bouts have been made because they partly reflect different scales of environmental variation. Methods traditionally used to identify such bouts are diverse, include some level of subjectivity, and their accuracy and precision is rarely compared. Therefore, the applicability of a maximum likelihood estimation method (MLM) for identifying dive bouts was investigated and compared with a recently proposed sequential differences analysis (SDA). Using real data on interdive durations from Antarctic fur seals (Arctocephalus gazella Peters, 1875), the MLM-based model produced briefer bout ending criterion (BEC) and more precise parameter estimates than the SDA approach. The MLM-based model was also in better agreement with real data, as it predicted the cumulative frequency of differences in interdive duration more accurately. Using both methods on simulated data showed that the MLM-based approach produced less biased estimates of the given model parameters than the SDA approach. Different choices of histogram bin widths involved in SDA had a systematic effect on the estimated BEC, such that larger bin widths resulted in longer BECs. These results suggest that using the MLM-based procedure with the sequential differences in interdive durations, and possibly other dive characteristics, may be an accurate, precise, and objective tool for identifying dive bouts.


2021 ◽  
Author(s):  
Maarten van der Velde ◽  
Florian Sense ◽  
Jelmer P Borst ◽  
Hedderik van Rijn

The parameters governing our behaviour are in constant flux. Accurately capturing these dynamics in cognitive models poses a challenge to modellers. Here, we demonstrate a mapping of ACT-R's declarative memory onto the linear ballistic accumulator, a mathematical model describing a competition between evidence accumulation processes. We show that this mapping provides a method for inferring individual ACT-R parameters without requiring the modeller to build and fit an entire ACT-R model. We conduct a parameter recovery study to confirm that the LBA can recover ACT-R parameters from simulated data. Then, as a proof of concept, we use the LBA to estimate ACT-R parameters from an empirical data set. The resulting parameter estimates provide a cognitively meaningful explanation for observed differences in behaviour over time and between individuals.


Author(s):  
Tin Lok James Ng ◽  
Thomas Brendan Murphy

AbstractA probabilistic model for random hypergraphs is introduced to represent unary, binary and higher order interactions among objects in real-world problems. This model is an extension of the latent class analysis model that introduces two clustering structures for hyperedges and captures variation in the size of hyperedges. An expectation maximization algorithm with minorization maximization steps is developed to perform parameter estimation. Model selection using Bayesian Information Criterion is proposed. The model is applied to simulated data and two real-world data sets where interesting results are obtained.


Genetics ◽  
1996 ◽  
Vol 143 (4) ◽  
pp. 1819-1829 ◽  
Author(s):  
G Thaller ◽  
L Dempfle ◽  
I Hoeschele

Abstract Maximum likelihood methodology was applied to determine the mode of inheritance of rare binary traits with data structures typical for swine populations. The genetic models considered included a monogenic, a digenic, a polygenic, and three mixed polygenic and major gene models. The main emphasis was on the detection of major genes acting on a polygenic background. Deterministic algorithms were employed to integrate and maximize likelihoods. A simulation study was conducted to evaluate model selection and parameter estimation. Three designs were simulated that differed in the number of sires/number of dams within sires (10/10, 30/30, 100/30). Major gene effects of at least one SD of the liability were detected with satisfactory power under the mixed model of inheritance, except for the smallest design. Parameter estimates were empirically unbiased with acceptable standard errors, except for the smallest design, and allowed to distinguish clearly between the genetic models. Distributions of the likelihood ratio statistic were evaluated empirically, because asymptotic theory did not hold. For each simulation model, the Average Information Criterion was computed for all models of analysis. The model with the smallest value was chosen as the best model and was equal to the true model in almost every case studied.


Metabolites ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 214
Author(s):  
Aneta Sawikowska ◽  
Anna Piasecka ◽  
Piotr Kachlicki ◽  
Paweł Krajewski

Peak overlapping is a common problem in chromatography, mainly in the case of complex biological mixtures, i.e., metabolites. Due to the existence of the phenomenon of co-elution of different compounds with similar chromatographic properties, peak separation becomes challenging. In this paper, two computational methods of separating peaks, applied, for the first time, to large chromatographic datasets, are described, compared, and experimentally validated. The methods lead from raw observations to data that can form inputs for statistical analysis. First, in both methods, data are normalized by the mass of sample, the baseline is removed, retention time alignment is conducted, and detection of peaks is performed. Then, in the first method, clustering is used to separate overlapping peaks, whereas in the second method, functional principal component analysis (FPCA) is applied for the same purpose. Simulated data and experimental results are used as examples to present both methods and to compare them. Real data were obtained in a study of metabolomic changes in barley (Hordeum vulgare) leaves under drought stress. The results suggest that both methods are suitable for separation of overlapping peaks, but the additional advantage of the FPCA is the possibility to assess the variability of individual compounds present within the same peaks of different chromatograms.


Sign in / Sign up

Export Citation Format

Share Document