Finding the best-fit background function for whole-powder-pattern fitting using LASSO combined with tree search

2021 ◽  
Vol 54 (2) ◽  
Author(s):  
Hideo Toraya

A new linear function for modelling the background in whole-powder-pattern fitting has been derived by applying LASSO (least absolute shrinkage and selection operator) and the technique of tree search. The background function (BGF) consists of terms b n L(2θ/180)−n/2 and b n H(1 − 2θ/180)−n/2 for the low- and high-angle sides, respectively. Some variable parameters of the BGF should be fixed at zero while others should be varied in order to find the best fit for a given data set without inducing overfitting. The LASSO algorithm can automatically select the variables in linear regression analysis. However, it finds the best-fit BGF with a set of adjustable parameters for a given data set while it derives a different set of parameters for a different data set. Thus, LASSO derives multiple solutions depending on the data set used. By regarding the individual solutions from LASSO as nodes of trees, tree structures were constructed from these solutions. The root node has the maximum number of adjustable parameters, P. P decreases with descending levels of the tree one by one, and leaf nodes have just one parameter. By evaluating individual solutions (nodes) by their χ2 index, the best-fit single path from a root node to a leaf node was found. The present BGF can be used simply by varying P in the range 1–10. The BGF thus derived as a final single solution was incorporated into computer programs for Pawley-based whole-powder-pattern decomposition and Rietveld refinement, and the performance of the BGF was tested in comparison with the polynomials currently widely used as the BGF. The present BGF has been demonstrated to be stable and to give an excellent fit, comparable to polynomials but with a smaller number of adjustable parameters and without introducing undulation into the calculated background curve. Basic algorithms used in statistics and machine learning have been demonstrated to be useful in developing an analytical model in X-ray crystallography.

1997 ◽  
Vol 25 (3) ◽  
pp. 359-370 ◽  
Author(s):  
Lisa A. Kirchner ◽  
Richard P. Moody ◽  
Edward Doyle ◽  
Ranjan Bose ◽  
Jamie Jeffery ◽  
...  

A database on physicochemical properties and skin permeation compiled by Health Canada was analysed by using linear regression analysis. The correlation between permeability coefficient (Kp) and the octanol–water partition coefficient (Kow) has been improved by grouping the compounds according to their respective molar volumes. Linear regression analysis of the individual groups has demonstrated a positive correlation for the majority of the groups, with the compounds in the lowest molar volume range (≤ 75Å3) having the best correlation (r2 = 0.86), and the compounds in the highest molar volume range (≥ 30lÅ3) being the least well-correlated (r2 = 0.55). Due to the diversity of the chemicals used in this analysis, and the statistically significant correlations obtained, this model could permit the prediction of skin permeation of a wide variety of chemical compounds. Although of a simplistic nature, and not yet experimentally validated, this quantitative structure-activity relationship may be useful for predicting human skin permeability coefficients for compounds that fall within the constraints of this data set.


Author(s):  
Jules S. Jaffe ◽  
Robert M. Glaeser

Although difference Fourier techniques are standard in X-ray crystallography it has only been very recently that electron crystallographers have been able to take advantage of this method. We have combined a high resolution data set for frozen glucose embedded Purple Membrane (PM) with a data set collected from PM prepared in the frozen hydrated state in order to visualize any differences in structure due to the different methods of preparation. The increased contrast between protein-ice versus protein-glucose may prove to be an advantage of the frozen hydrated technique for visualizing those parts of bacteriorhodopsin that are embedded in glucose. In addition, surface groups of the protein may be disordered in glucose and ordered in the frozen state. The sensitivity of the difference Fourier technique to small changes in structure provides an ideal method for testing this hypothesis.


Author(s):  
D. E. Becker

An efficient, robust, and widely-applicable technique is presented for computational synthesis of high-resolution, wide-area images of a specimen from a series of overlapping partial views. This technique can also be used to combine the results of various forms of image analysis, such as segmentation, automated cell counting, deblurring, and neuron tracing, to generate representations that are equivalent to processing the large wide-area image, rather than the individual partial views. This can be a first step towards quantitation of the higher-level tissue architecture. The computational approach overcomes mechanical limitations, such as hysterisis and backlash, of microscope stages. It also automates a procedure that is currently done manually. One application is the high-resolution visualization and/or quantitation of large batches of specimens that are much wider than the field of view of the microscope.The automated montage synthesis begins by computing a concise set of landmark points for each partial view. The type of landmarks used can vary greatly depending on the images of interest. In many cases, image analysis performed on each data set can provide useful landmarks. Even when no such “natural” landmarks are available, image processing can often provide useful landmarks.


2020 ◽  

BACKGROUND: This paper deals with territorial distribution of the alcohol and drug addictions mortality at a level of the districts of the Slovak Republic. AIM: The aim of the paper is to explore the relations within the administrative territorial division of the Slovak Republic, that is, between the individual districts and hence, to reveal possibly hidden relation in alcohol and drug mortality. METHODS: The analysis is divided and executed into the two fragments – one belongs to the female sex, the other one belongs to the male sex. The standardised mortality rate is computed according to a sequence of the mathematical relations. The Euclidean distance is employed to compute the similarity within each pair of a whole data set. The cluster analysis examines is performed. The clusters are created by means of the mutual distances of the districts. The data is collected from the database of the Statistical Office of the Slovak Republic for all the districts of the Slovak Republic. The covered time span begins in the year 1996 and ends in the year 2015. RESULTS: The most substantial point is that the Slovak Republic possesses the regional disparities in a field of mortality expressed by the standardised mortality rate computed particularly for the diagnoses assigned to the alcohol and drug addictions at a considerably high level. However, the female sex and the male sex have the different outcome. The Bratislava III District keeps absolutely the most extreme position. It forms an own cluster for the both sexes too. The Topoľčany District bears a similar extreme position from a point of view of the male sex. All the Bratislava districts keep their mutual notable dissimilarity. Contrariwise, evaluation of a development of the regional disparities among the districts looks like notably heterogeneously. CONCLUSIONS: There are considerable regional discrepancies throughout the districts of the Slovak Republic. Hence, it is necessary to create a common platform how to proceed with the solution of this issue.


Nutrients ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 1671
Author(s):  
Luigi Barrea ◽  
Giovanna Muscogiuri ◽  
Gabriella Pugliese ◽  
Chiara Graziadio ◽  
Maria Maisto ◽  
...  

Individual differences in the chronotype, an attitude that best expresses the individual circadian preference in behavioral and biological rhythms, have been associated with cardiometabolic risk and gut dysbiosis. Up to now, there are no studies evaluating the association between chronotypes and circulating TMAO concentrations, a predictor of cardiometabolic risk and a useful marker of gut dysbiosis. In this study population (147 females and 100 males), subjects with the morning chronotype had the lowest BMI and waist circumference (p < 0.001), and a better metabolic profile compared to the other chronotypes. In addition, the morning chronotype had the highest adherence to the Mediterranean diet (p < 0.001) and the lowest circulating TMAO concentrations (p < 0.001). After adjusting for BMI and adherence to the Mediterranean diet, the correlation between circulating TMAO concentrations and chronotype score was still kept (r = −0.627, p < 0.001). Using a linear regression analysis, higher chronotype scores were mostly associated with lower circulating TMAO concentrations (β = −0.479, t = −12.08, and p < 0.001). Using a restricted cubic spline analysis, we found that a chronotype score ≥59 (p < 0.001, R2 = −0.824) demonstrated a more significant inverse linear relationship with circulating TMAO concentrations compared with knots <59 (neither chronotype) and <41 (evening chronotype). The current study reported the first evidence that higher circulating TMAO concentrations were associated with the evening chronotype that, in turn, is usually linked to an unhealthy lifestyle mostly characterized by low adherence to the MD.


2010 ◽  
Vol 43 (5) ◽  
pp. 1113-1120 ◽  
Author(s):  
Esko Oksanen ◽  
François Dauvergne ◽  
Adrian Goldman ◽  
Monika Budayova-Spano

H atoms play a central role in enzymatic mechanisms, but H-atom positions cannot generally be determined by X-ray crystallography. Neutron crystallography, on the other hand, can be used to determine H-atom positions but it is experimentally very challenging. Yeast inorganic pyrophosphatase (PPase) is an essential enzyme that has been studied extensively by X-ray crystallography, yet the details of the catalytic mechanism remain incompletely understood. The temperature instability of PPase crystals has in the past prevented the collection of a neutron diffraction data set. This paper reports how the crystal growth has been optimized in temperature-controlled conditions. To stabilize the crystals during neutron data collection a Peltier cooling device that minimizes the temperature gradient along the capillary has been developed. This device allowed the collection of a full neutron diffraction data set.


2016 ◽  
Vol 311 (3) ◽  
pp. F539-F547 ◽  
Author(s):  
Minhtri K. Nguyen ◽  
Dai-Scott Nguyen ◽  
Minh-Kevin Nguyen

Because changes in the plasma water sodium concentration ([Na+]pw) are clinically due to changes in the mass balance of Na+, K+, and H2O, the analysis and treatment of the dysnatremias are dependent on the validity of the Edelman equation in defining the quantitative interrelationship between the [Na+]pw and the total exchangeable sodium (Nae), total exchangeable potassium (Ke), and total body water (TBW) (Edelman IS, Leibman J, O'Meara MP, Birkenfeld LW. J Clin Invest 37: 1236–1256, 1958): [Na+]pw = 1.11(Nae + Ke)/TBW − 25.6. The interrelationship between [Na+]pw and Nae, Ke, and TBW in the Edelman equation is empirically determined by accounting for measurement errors in all of these variables. In contrast, linear regression analysis of the same data set using [Na+]pw as the dependent variable yields the following equation: [Na+]pw = 0.93(Nae + Ke)/TBW + 1.37. Moreover, based on the study by Boling et al. (Boling EA, Lipkind JB. 18: 943–949, 1963), the [Na+]pw is related to the Nae, Ke, and TBW by the following linear regression equation: [Na+]pw = 0.487(Nae + Ke)/TBW + 71.54. The disparities between the slope and y-intercept of these three equations are unknown. In this mathematical analysis, we demonstrate that the disparities between the slope and y-intercept in these three equations can be explained by how the osmotically inactive Na+ and K+ storage pool is quantitatively accounted for. Our analysis also indicates that the osmotically inactive Na+ and K+ storage pool is dynamically regulated and that changes in the [Na+]pw can be predicted based on changes in the Nae, Ke, and TBW despite dynamic changes in the osmotically inactive Na+ and K+ storage pool.


2020 ◽  
Vol 8 ◽  
Author(s):  
Devasis Bassu ◽  
Peter W. Jones ◽  
Linda Ness ◽  
David Shallcross

Abstract In this paper, we present a theoretical foundation for a representation of a data set as a measure in a very large hierarchically parametrized family of positive measures, whose parameters can be computed explicitly (rather than estimated by optimization), and illustrate its applicability to a wide range of data types. The preprocessing step then consists of representing data sets as simple measures. The theoretical foundation consists of a dyadic product formula representation lemma, and a visualization theorem. We also define an additive multiscale noise model that can be used to sample from dyadic measures and a more general multiplicative multiscale noise model that can be used to perturb continuous functions, Borel measures, and dyadic measures. The first two results are based on theorems in [15, 3, 1]. The representation uses the very simple concept of a dyadic tree and hence is widely applicable, easily understood, and easily computed. Since the data sample is represented as a measure, subsequent analysis can exploit statistical and measure theoretic concepts and theories. Because the representation uses the very simple concept of a dyadic tree defined on the universe of a data set, and the parameters are simply and explicitly computable and easily interpretable and visualizable, we hope that this approach will be broadly useful to mathematicians, statisticians, and computer scientists who are intrigued by or involved in data science, including its mathematical foundations.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Avinash Jawade

Purpose This study aims to analyze the influence of firm characteristics in dividend payout in a concentrated ownership setting. Design/methodology/approach This study is probably the first to use the lasso technique for model selection and error prediction in the study of dividend payout in India. The lasso method comprises subsampling the available data set and performing reiterative regressions on those samples to generate the model with the best fit. This study incorporates four different ways of performing lasso treatment to get the best fit among them. Findings This study analyzes the influence of firm characteristics on dividend payout in the Indian context and asserts that firms with growth potential and earnings volatility do not hesitate to cut dividends. This study does not find evidence for signaling, agency cost and life cycle theories in a concentrated ownership setting. Earnings is the single most important factor to have a positive influence on dividend, while excessively leveraged firms are restrictive of dividend payout. Taxation has a prominent role in altering the way firms pay dividend. Research limitations/implications The recent changes in buyback taxation offer another opportunity to test the reactive behavior of firms. Also, given the disregard for traditional motivations, further research needs to be done to determine if dividend adjustments (on the lower side) help enhance firm value or not. Practical implications This study may help investors view dividends in a proper perspective. Firms give importance to investments over dividends and thus investors need not dwell on dividend changes if firms fulfill their growth potential. Social implications It lends perspective to investors about dividend changes and its importance. Originality/value The methodology used for analysis is absolutely original in the literature pertaining to dividend policy in the Indian context. The literature is abundant with theories advocating or opposing the eminence of dividend payout; however, this study takes a holistic view of all influential dividend determinants in literature to understand dividend payout.


2018 ◽  
Vol 15 (6) ◽  
pp. 172988141881470
Author(s):  
Nezih Ergin Özkucur ◽  
H Levent Akın

Self-localization in autonomous robots is one of the fundamental issues in the development of intelligent robots, and processing of raw sensory information into useful features is an integral part of this problem. In a typical scenario, there are several choices for the feature extraction algorithm, and each has its weaknesses and strengths depending on the characteristics of the environment. In this work, we introduce a localization algorithm that is capable of capturing the quality of a feature type based on the local environment and makes soft selection of feature types throughout different regions. A batch expectation–maximization algorithm is developed for both discrete and Monte Carlo localization models, exploiting the probabilistic pose estimations of the robot without requiring ground truth poses and also considering different observation types as blackbox algorithms. We tested our method in simulations, data collected from an indoor environment with a custom robot platform and a public data set. The results are compared with the individual feature types as well as naive fusion strategy.


Sign in / Sign up

Export Citation Format

Share Document