scholarly journals ExploreModelMatrix: Interactive exploration for improved understanding of design matrices and linear models in R

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 512
Author(s):  
Charlotte Soneson ◽  
Federico Marini ◽  
Florian Geier ◽  
Michael I. Love ◽  
Michael B. Stadler

Linear and generalized linear models are used extensively in many scientific fields, to model observed data and as the basis for hypothesis tests. The use of such models requires specification of a design matrix, and subsequent formulation of contrasts representing scientific hypotheses of interest. Proper execution of these steps requires a thorough understanding of the meaning of the individual coefficients, and is a frequent source of uncertainty for end users. Here, we present an R/Bioconductor package, ExploreModelMatrix, which enables interactive exploration of design matrices and linear model diagnostics. Given a sample annotation table and a desired design formula, the package displays how the model coefficients are combined to give the fitted values for each combination of predictor variables, which allows users to both extract the interpretation of each individual coefficient, and formulate desired linear contrasts. In addition, the interactive interface displays informative characteristics for the regular linear model corresponding to the provided design, such as variance inflation factors and the pseudoinverse of the design matrix.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 512
Author(s):  
Charlotte Soneson ◽  
Federico Marini ◽  
Florian Geier ◽  
Michael I. Love ◽  
Michael B. Stadler

Linear and generalized linear models are used extensively in many scientific fields, to model observed data and as the basis for hypothesis tests. The use of such models requires specification of a design matrix, and subsequent formulation of contrasts representing scientific hypotheses of interest. Proper execution of these steps requires a thorough understanding of the meaning of the individual coefficients, and is a frequent source of uncertainty for end users. Here, we present an R/Bioconductor package, ExploreModelMatrix, which enables interactive exploration of design matrices and linear model diagnostics. Given a sample data table and a desired design formula, the package displays how the model coefficients are combined to give the fitted values for each combination of predictor variables, which allows users to both extract the interpretation of each individual coefficient, and formulate desired linear contrasts. In addition, the interactive interface displays informative characteristics for the regular linear model corresponding to the provided design, such as variance inflation factors and the pseudoinverse of the design matrix. We envision the package and the built-in collection of common types of linear model designs to be useful for teaching and self-learning purposes, as well as for assisting more experienced users in the interpretation of complex model designs.


Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 533-546 ◽  
Author(s):  
Guo Yu ◽  
Jacob Bien

Summary The lasso has been studied extensively as a tool for estimating the coefficient vector in the high-dimensional linear model; however, considerably less is known about estimating the error variance in this context. In this paper, we propose the natural lasso estimator for the error variance, which maximizes a penalized likelihood objective. A key aspect of the natural lasso is that the likelihood is expressed in terms of the natural parameterization of the multi-parameter exponential family of a Gaussian with unknown mean and variance. The result is a remarkably simple estimator of the error variance with provably good performance in terms of mean squared error. These theoretical results do not require placing any assumptions on the design matrix or the true regression coefficients. We also propose a companion estimator, called the organic lasso, which theoretically does not require tuning of the regularization parameter. Both estimators do well empirically compared to pre-existing methods, especially in settings where successful recovery of the true support of the coefficient vector is hard. Finally, we show that existing methods can do well under fewer assumptions than previously known, thus providing a fuller story about the problem of estimating the error variance in high-dimensional linear models.


Author(s):  
Christoph Brandstetter ◽  
Sina Stapelfeldt

Non-synchronous vibrations arising near the stall boundary of compressors are a recurring and potentially safety-critical problem in modern aero-engines. Recent numerical and experimental investigations have shown that these vibrations are caused by the lock-in of circumferentially convected aerodynamic disturbances and structural vibration modes, and that it is possible to predict unstable vibration modes using coupled linear models. This paper aims to further investigate non-synchronous vibrations by casting a reduced model for NSV in the frequency domain and analysing stability for a range of parameters. It is shown how, and why, under certain conditions linear models are able to capture a phenomenon, which has traditionally been associated with aerodynamic non-linearities. The formulation clearly highlights the differences between convective non-synchronous vibrations and flutter and identifies the modifications necessary to make quantitative predictions.


Author(s):  
Yan Wang ◽  
Feng Hao ◽  
Yunxia Liu

Population change and environmental degradation have become two of the most pressing issues for sustainable development in the contemporary world, while the effect of population aging on pro-environmental behavior remains controversial. In this paper, we examine the effects of individual and population aging on pro-environmental behavior through multilevel analyses of cross-national data from 31 countries. Hierarchical linear models with random intercepts are employed to analyze the data. The findings reveal a positive relationship between aging and pro-environmental behavior. At the individual level, older people are more likely to participate in environmental behavior (b = 0.052, p < 0.001), and at the national level, living in a country with a greater share of older persons encourages individuals to behave sustainably (b = 0.023, p < 0.01). We also found that the elderly are more environmentally active in an aging society. The findings imply that the longevity of human beings may offer opportunities for the improvement of the natural environment.


Author(s):  
Necva Bölücü ◽  
Burcu Can

Part of speech (PoS) tagging is one of the fundamental syntactic tasks in Natural Language Processing, as it assigns a syntactic category to each word within a given sentence or context (such as noun, verb, adjective, etc.). Those syntactic categories could be used to further analyze the sentence-level syntax (e.g., dependency parsing) and thereby extract the meaning of the sentence (e.g., semantic parsing). Various methods have been proposed for learning PoS tags in an unsupervised setting without using any annotated corpora. One of the widely used methods for the tagging problem is log-linear models. Initialization of the parameters in a log-linear model is very crucial for the inference. Different initialization techniques have been used so far. In this work, we present a log-linear model for PoS tagging that uses another fully unsupervised Bayesian model to initialize the parameters of the model in a cascaded framework. Therefore, we transfer some knowledge between two different unsupervised models to leverage the PoS tagging results, where a log-linear model benefits from a Bayesian model’s expertise. We present results for Turkish as a morphologically rich language and for English as a comparably morphologically poor language in a fully unsupervised framework. The results show that our framework outperforms other unsupervised models proposed for PoS tagging.


2021 ◽  
Author(s):  
Mohammadreza Vatani

AC-DC power systems have been operating more than sixty years. Nonlinear bus-wise power balance equations provide accurate model of AC-DC power systems. However, optimization tools for planning and operation require linear version, even if approximate, for creating tractable algorithms, considering modern elements such as DERs (distributed energy resources). Hitherto, linear models of only AC power systems are available, which coincidentally are called DC power flow. To address this drawback, linear bus-wise power balance equations are developed for AC-DC power systems and presented. As a first contribution, while AC and DC lines are represented by susceptance and conductance elements, AC-DC power converters are represented by a proposed linear relationship. As a second contribution, a three-step linear AC-DC power flow method is proposed. The first step solves the whole network considering it as a linear AC network, yielding bus phase angles at all busses. The second step computes attributes of the proposed linear model of all AC-DC power converters. The third step solves the linear model of the AC-DC system including converters, yielding bus phase angles at AC busses and voltage magnitudes at DC busses. The benefit of the proposed linear power flow model of AC-DC power system, while an approximation of the nonlinear model, enables representation of bus-wise power balance of AC-DC systems in complex planning and operational optimization formulations and hence holds the promise of phenomenal progress. The proposed linear AC-DC power systems is tested on numerous IEEE test systems and demonstrated to be fast, reliable, and consistent.


2020 ◽  
Vol 24 (6 Part A) ◽  
pp. 3795-3806
Author(s):  
Predrag Zivkovic ◽  
Mladen Tomic ◽  
Vukman Bakic

Wind power assessment in complex terrain is a very demanding task. Modeling wind conditions with standard linear models does not sufficiently reproduce wind conditions in complex terrains, especially on leeward sides of terrain slopes, primarily due to the vorticity. A more complex non-linear model, based on Reynolds averaged Navier-Stokes equations has been used. Turbulence was modeled by modified two-equations k-? model for neutral atmospheric boundary-layer conditions, written in general curvelinear non-orthogonal co-ordinate system. The full set of mass and momentum conservation equations as well as turbulence model equations are numerically solved, using the as CFD technique. A comparison of the application of linear model and non-linear model is presented. Considerable discrepancies of estimated wind speed have been obtained using linear and non-linear models. Statistics of annual electricity production vary up to 30% of the model site. Even anemometer measurements directly at a wind turbine?s site do not necessarily deliver the results needed for prediction calculations, as extrapolations of wind speed to hub height is tricky. The results of the simulation are compared by means of the turbine type, quality and quantity of the wind data and capacity factor. Finally, the comparison of the estimated results with the measured data at 10, 30, and 50 m is shown.


Author(s):  
Bernardo Lopes ◽  
Allan Luz ◽  
Bruno Fontes ◽  
Isaac C Ramos ◽  
Fernando Correia ◽  
...  

ABSTRACT Purpose To compare and assess the ability of pressure-derived parameters and corneal deformation waveform signal-derived parameters of the ocular response analyzer (ORA) measurement to distinguish between keratoconus and normal eyes, and to develop a combined parameter to optimize the diagnosis of keratoconus. Materials and methods One hundred and seventy-seven eyes (177 patients) with keratoconus (group KC) and 205 normal eyes (205 patients; group N) were included. One eye from each subject was randomly selected for analysis. Patients underwent a complete clinical eye examination, corneal topography (Humphrey ATLAS), tomography (Pentacam Oculus) and biomechanical evaluations (ORA Reichert). Differences in the distributions between the groups were assessed using the Mann- Whitney test. The receiver operating characteristic (ROC) curve was used to identify cutoff points that maximized sensitivity and specificity in discriminating keratoconus from normal corneas. Logistic regression was used to identify a combined linear model (Fisher 1.0). Results Significant differences in all studied parameters were detected (p < 0.05), except for W2. For the corneal resistance factor (CRF): Area under the ROC curve (AUROC) 89.1%, sensitivity 81.36%, specificity 84.88%. For the p1area: AUROC 91.5%, sensitivity 87.1%, specificity 81.95%. Of the individual parameters, the highest predictive accuracy was for the Fisher 1.0, which represents the combination of all parameters (AUROC 95.5%, sensitivity 88.14%, specificity 93.17%). Conclusion Waveform-derived ORA parameters displayed greater accuracy than pressure-derived parameters for identifying keratoconus. Corneal hysteresis (CH) and CRF, a diagnostic linear model that combines different parameters, provided the greatest accuracy for differentiating keratoconus from normal corneas. How to cite this article Luz A, Fontes B, Ramos IC, Lopes B, Correia F, Schor P, Ambrósio R. Evaluation of Ocular Biomechanical Indices to Distinguish Normal from Keratoconus Eyes. Int J Kerat Ect Cor Dis 2012;1(3):145-150.


2016 ◽  
Vol 8 (1) ◽  
pp. 140-143
Author(s):  
J. V. Thaker ◽  
R. P. Kuvad ◽  
V. S. Thaker

Leaf area is an important parameter in physiology and agronomy studies. Linear models for leaf area measurement are developed for plant species as a nondestructive method. The plant Adhatoda vasica L. (a medicinal plant) was selected and the leaves of this plant were used for development of linear model for leaf area using Leaf Area Meter (LAM) software. Planimetric parameters (length, length2, width and width2) and gravimetric (dry weight and water content) parameters are considered for the development of linear model for this plant species. Single factor ANOVA and linear correlations were worked out using these parameters and leaf area. The plant was showed significant relationship with the parameters studied. The best correlation as represented by regression coefficient (R2) was used and improved R2 is worked out. It is observed that with increase in leaf area, water content is also increased and showed best correlation with the leaf area. Thus water content can be taken as a parameter for developing linear model for leaf area is concluded.


2020 ◽  
pp. 1-7
Author(s):  
Fatin N.S.A. ◽  
Norlida M.N. ◽  
Siti Z.M.J.

Log-linear model is a technique used to analyze the cross-classification categorical data or the contingency table. It is used to obtain the parsimony models that describe the interaction between the categorical variables in contingency tables. Log-linear models are commonly used in evaluating higher dimensional contingency tables that involves more than two categorical variables. This study focuses on analyzing data of poisoned patients from 2012 to 2014 using log-linear model. There are two model analyzed; model for demographic data of patients and model of poisoning information. For the first model, the variables involved are gender, age, race and state. Variables for the second model are circumstance of exposure, type of exposure, location of exposure, route of exposure and types of poison. Both log-linear models are developed to investigate the association between variables in the model. As a result of this study, the best model for demographic data and poisoning information are the model with three-ways interaction. For the best model of demographic data, there is an association between gender, age and race, race, gender and state as well as age, race and state. Meanwhile, the best model for poisoning information reveals that there is relationship between circumstance of exposure, route of exposure and type of poison, location of exposure, route of exposure and type of poison, circumstance of exposure, type of exposure and route of exposure, circumstance of exposure, location of exposure and route of exposure, circumstance of exposure, type of exposure and type of poison and also type of exposure, location of exposure and type of poison. Keywords: log-linear; demographic; gender; age; race; state; circumstance of exposure; type of exposure; location of exposure; route of exposure; types of poison


Sign in / Sign up

Export Citation Format

Share Document