scholarly journals ExploreModelMatrix: Interactive exploration for improved understanding of design matrices and linear models in R

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 512
Author(s):  
Charlotte Soneson ◽  
Federico Marini ◽  
Florian Geier ◽  
Michael I. Love ◽  
Michael B. Stadler

Linear and generalized linear models are used extensively in many scientific fields, to model observed data and as the basis for hypothesis tests. The use of such models requires specification of a design matrix, and subsequent formulation of contrasts representing scientific hypotheses of interest. Proper execution of these steps requires a thorough understanding of the meaning of the individual coefficients, and is a frequent source of uncertainty for end users. Here, we present an R/Bioconductor package, ExploreModelMatrix, which enables interactive exploration of design matrices and linear model diagnostics. Given a sample data table and a desired design formula, the package displays how the model coefficients are combined to give the fitted values for each combination of predictor variables, which allows users to both extract the interpretation of each individual coefficient, and formulate desired linear contrasts. In addition, the interactive interface displays informative characteristics for the regular linear model corresponding to the provided design, such as variance inflation factors and the pseudoinverse of the design matrix. We envision the package and the built-in collection of common types of linear model designs to be useful for teaching and self-learning purposes, as well as for assisting more experienced users in the interpretation of complex model designs.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 512
Author(s):  
Charlotte Soneson ◽  
Federico Marini ◽  
Florian Geier ◽  
Michael I. Love ◽  
Michael B. Stadler

Linear and generalized linear models are used extensively in many scientific fields, to model observed data and as the basis for hypothesis tests. The use of such models requires specification of a design matrix, and subsequent formulation of contrasts representing scientific hypotheses of interest. Proper execution of these steps requires a thorough understanding of the meaning of the individual coefficients, and is a frequent source of uncertainty for end users. Here, we present an R/Bioconductor package, ExploreModelMatrix, which enables interactive exploration of design matrices and linear model diagnostics. Given a sample annotation table and a desired design formula, the package displays how the model coefficients are combined to give the fitted values for each combination of predictor variables, which allows users to both extract the interpretation of each individual coefficient, and formulate desired linear contrasts. In addition, the interactive interface displays informative characteristics for the regular linear model corresponding to the provided design, such as variance inflation factors and the pseudoinverse of the design matrix.


Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 533-546 ◽  
Author(s):  
Guo Yu ◽  
Jacob Bien

Summary The lasso has been studied extensively as a tool for estimating the coefficient vector in the high-dimensional linear model; however, considerably less is known about estimating the error variance in this context. In this paper, we propose the natural lasso estimator for the error variance, which maximizes a penalized likelihood objective. A key aspect of the natural lasso is that the likelihood is expressed in terms of the natural parameterization of the multi-parameter exponential family of a Gaussian with unknown mean and variance. The result is a remarkably simple estimator of the error variance with provably good performance in terms of mean squared error. These theoretical results do not require placing any assumptions on the design matrix or the true regression coefficients. We also propose a companion estimator, called the organic lasso, which theoretically does not require tuning of the regularization parameter. Both estimators do well empirically compared to pre-existing methods, especially in settings where successful recovery of the true support of the coefficient vector is hard. Finally, we show that existing methods can do well under fewer assumptions than previously known, thus providing a fuller story about the problem of estimating the error variance in high-dimensional linear models.


Author(s):  
Christoph Brandstetter ◽  
Sina Stapelfeldt

Non-synchronous vibrations arising near the stall boundary of compressors are a recurring and potentially safety-critical problem in modern aero-engines. Recent numerical and experimental investigations have shown that these vibrations are caused by the lock-in of circumferentially convected aerodynamic disturbances and structural vibration modes, and that it is possible to predict unstable vibration modes using coupled linear models. This paper aims to further investigate non-synchronous vibrations by casting a reduced model for NSV in the frequency domain and analysing stability for a range of parameters. It is shown how, and why, under certain conditions linear models are able to capture a phenomenon, which has traditionally been associated with aerodynamic non-linearities. The formulation clearly highlights the differences between convective non-synchronous vibrations and flutter and identifies the modifications necessary to make quantitative predictions.


Author(s):  
Yan Wang ◽  
Feng Hao ◽  
Yunxia Liu

Population change and environmental degradation have become two of the most pressing issues for sustainable development in the contemporary world, while the effect of population aging on pro-environmental behavior remains controversial. In this paper, we examine the effects of individual and population aging on pro-environmental behavior through multilevel analyses of cross-national data from 31 countries. Hierarchical linear models with random intercepts are employed to analyze the data. The findings reveal a positive relationship between aging and pro-environmental behavior. At the individual level, older people are more likely to participate in environmental behavior (b = 0.052, p < 0.001), and at the national level, living in a country with a greater share of older persons encourages individuals to behave sustainably (b = 0.023, p < 0.01). We also found that the elderly are more environmentally active in an aging society. The findings imply that the longevity of human beings may offer opportunities for the improvement of the natural environment.


Foods ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 1187
Author(s):  
Ivana Generalić Mekinić ◽  
Vida Šimat ◽  
Viktorija Botić ◽  
Anita Crnjac ◽  
Marina Smoljo ◽  
...  

In this study, the influences of temperature (20, 40 and 60 °C) and extraction solvents (water, ethanol) on the ultrasound-assisted extraction of phenolics from the Adriatic macroalgae Dictyota dichotoma and Padina pavonica were studied. The extracts were analysed for major phenolic sub-groups (total phenolics, flavonoids and tannins) using spectrometric methods, while the individual phenolics were detected by HPLC. The antioxidant activities were evaluated using three methods: Ferric Reducing/Antioxidant Power (FRAP), scavenging of the stabile 2,2-diphenyl-1-picrylhydrazyl (DPPH) radical and Oxygen Radical Antioxidant Capacity (ORAC). The aim of the study was also to find the connection between the chemical composition of the extracts and their biological activity. Therefore, principal component analysis (PCA), which permits simple representation of different sample data and better visualisation of their correlations, was used. Higher extraction yields of the total phenolics, flavonoids and tannins were obtained using an alcoholic solvent, while a general conclusion about the applied temperature was not established. These extracts also showed good antioxidant activity, especially D. dichotoma extracts, with high reducing capacity (690–792 mM TE) and ORAC values (38.7–40.8 mM TE in 400-fold diluted extracts). The PCA pointed out the significant influence of flavonoids and tannins on the investigated properties. The results of this investigation could be interesting for future studies dealing with the application of these two algae in foods, cosmetics and pharmaceuticals.


2014 ◽  
Vol 143 (8) ◽  
pp. 1681-1691 ◽  
Author(s):  
M. E. ARNOLD ◽  
R. J. GOSLING ◽  
F. MARTELLI ◽  
D. MUELLER-DOBLIES ◽  
R. H. DAVIES

SUMMARYThere has been a rapid rise in the prevalence of cases of monophasic Salmonella Typhimurium (mST) in both humans and farm animals, and it has been found in pigs, cattle and poultry. It is therefore vital to have a good understanding of how to efficiently detect infected farms. The objective of this project was to determine sample type sensitivity in the detection of Salmonella to detect infected groups of animals on both pig (breeder, grower and finisher sites) and cattle (beef and dairy) farms, using data collected from a study investigating farms that were positive for mST, and to explore any variation between different age groups and management practices. A Bayesian approach in the absence of a gold standard was adopted to analyse the individual and pooled faecal sample data collected from each epidemiological group on each of the farms. The sensitivity of pooled sampling depended on the prevalence of infection in the group being sampled, with a higher prevalence leading to higher sensitivity. Pooled sampling was found to be more efficient at detecting positive groups of animals than individual sampling, with the probability of a random sample from a group of animals with 5% prevalence testing positive being equal to 15·5% for immature pigs (3·6% for an individual faecal sample, taking into account the sensitivity and infection prevalence), 7·1% for adult pigs (1·2% for individual sampling), 30% for outdoor cattle (2% for individual sampling) and 34% for indoor cattle (1% for individual sampling). The mean prevalence of each epidemiological group was higher in outdoor farms than indoor for both pigs and cattle (mean within-farm prevalence of 29·4% and 38·7% for outdoor pigs and cattle, respectively, compared to 19·8% and 22·1% for indoor pigs and cattle)


Author(s):  
Necva Bölücü ◽  
Burcu Can

Part of speech (PoS) tagging is one of the fundamental syntactic tasks in Natural Language Processing, as it assigns a syntactic category to each word within a given sentence or context (such as noun, verb, adjective, etc.). Those syntactic categories could be used to further analyze the sentence-level syntax (e.g., dependency parsing) and thereby extract the meaning of the sentence (e.g., semantic parsing). Various methods have been proposed for learning PoS tags in an unsupervised setting without using any annotated corpora. One of the widely used methods for the tagging problem is log-linear models. Initialization of the parameters in a log-linear model is very crucial for the inference. Different initialization techniques have been used so far. In this work, we present a log-linear model for PoS tagging that uses another fully unsupervised Bayesian model to initialize the parameters of the model in a cascaded framework. Therefore, we transfer some knowledge between two different unsupervised models to leverage the PoS tagging results, where a log-linear model benefits from a Bayesian model’s expertise. We present results for Turkish as a morphologically rich language and for English as a comparably morphologically poor language in a fully unsupervised framework. The results show that our framework outperforms other unsupervised models proposed for PoS tagging.


2021 ◽  
Author(s):  
Mohammadreza Vatani

AC-DC power systems have been operating more than sixty years. Nonlinear bus-wise power balance equations provide accurate model of AC-DC power systems. However, optimization tools for planning and operation require linear version, even if approximate, for creating tractable algorithms, considering modern elements such as DERs (distributed energy resources). Hitherto, linear models of only AC power systems are available, which coincidentally are called DC power flow. To address this drawback, linear bus-wise power balance equations are developed for AC-DC power systems and presented. As a first contribution, while AC and DC lines are represented by susceptance and conductance elements, AC-DC power converters are represented by a proposed linear relationship. As a second contribution, a three-step linear AC-DC power flow method is proposed. The first step solves the whole network considering it as a linear AC network, yielding bus phase angles at all busses. The second step computes attributes of the proposed linear model of all AC-DC power converters. The third step solves the linear model of the AC-DC system including converters, yielding bus phase angles at AC busses and voltage magnitudes at DC busses. The benefit of the proposed linear power flow model of AC-DC power system, while an approximation of the nonlinear model, enables representation of bus-wise power balance of AC-DC systems in complex planning and operational optimization formulations and hence holds the promise of phenomenal progress. The proposed linear AC-DC power systems is tested on numerous IEEE test systems and demonstrated to be fast, reliable, and consistent.


2020 ◽  
Vol 24 (6 Part A) ◽  
pp. 3795-3806
Author(s):  
Predrag Zivkovic ◽  
Mladen Tomic ◽  
Vukman Bakic

Wind power assessment in complex terrain is a very demanding task. Modeling wind conditions with standard linear models does not sufficiently reproduce wind conditions in complex terrains, especially on leeward sides of terrain slopes, primarily due to the vorticity. A more complex non-linear model, based on Reynolds averaged Navier-Stokes equations has been used. Turbulence was modeled by modified two-equations k-? model for neutral atmospheric boundary-layer conditions, written in general curvelinear non-orthogonal co-ordinate system. The full set of mass and momentum conservation equations as well as turbulence model equations are numerically solved, using the as CFD technique. A comparison of the application of linear model and non-linear model is presented. Considerable discrepancies of estimated wind speed have been obtained using linear and non-linear models. Statistics of annual electricity production vary up to 30% of the model site. Even anemometer measurements directly at a wind turbine?s site do not necessarily deliver the results needed for prediction calculations, as extrapolations of wind speed to hub height is tricky. The results of the simulation are compared by means of the turbine type, quality and quantity of the wind data and capacity factor. Finally, the comparison of the estimated results with the measured data at 10, 30, and 50 m is shown.


Author(s):  
Bernardo Lopes ◽  
Allan Luz ◽  
Bruno Fontes ◽  
Isaac C Ramos ◽  
Fernando Correia ◽  
...  

ABSTRACT Purpose To compare and assess the ability of pressure-derived parameters and corneal deformation waveform signal-derived parameters of the ocular response analyzer (ORA) measurement to distinguish between keratoconus and normal eyes, and to develop a combined parameter to optimize the diagnosis of keratoconus. Materials and methods One hundred and seventy-seven eyes (177 patients) with keratoconus (group KC) and 205 normal eyes (205 patients; group N) were included. One eye from each subject was randomly selected for analysis. Patients underwent a complete clinical eye examination, corneal topography (Humphrey ATLAS), tomography (Pentacam Oculus) and biomechanical evaluations (ORA Reichert). Differences in the distributions between the groups were assessed using the Mann- Whitney test. The receiver operating characteristic (ROC) curve was used to identify cutoff points that maximized sensitivity and specificity in discriminating keratoconus from normal corneas. Logistic regression was used to identify a combined linear model (Fisher 1.0). Results Significant differences in all studied parameters were detected (p < 0.05), except for W2. For the corneal resistance factor (CRF): Area under the ROC curve (AUROC) 89.1%, sensitivity 81.36%, specificity 84.88%. For the p1area: AUROC 91.5%, sensitivity 87.1%, specificity 81.95%. Of the individual parameters, the highest predictive accuracy was for the Fisher 1.0, which represents the combination of all parameters (AUROC 95.5%, sensitivity 88.14%, specificity 93.17%). Conclusion Waveform-derived ORA parameters displayed greater accuracy than pressure-derived parameters for identifying keratoconus. Corneal hysteresis (CH) and CRF, a diagnostic linear model that combines different parameters, provided the greatest accuracy for differentiating keratoconus from normal corneas. How to cite this article Luz A, Fontes B, Ramos IC, Lopes B, Correia F, Schor P, Ambrósio R. Evaluation of Ocular Biomechanical Indices to Distinguish Normal from Keratoconus Eyes. Int J Kerat Ect Cor Dis 2012;1(3):145-150.


Sign in / Sign up

Export Citation Format

Share Document