scholarly journals The Nested_fit Data Analysis Program

Proceedings ◽  
2019 ◽  
Vol 33 (1) ◽  
pp. 14 ◽  
Author(s):  
Martino Trassinelli

We present here Nested_fit, a Bayesian data analysis code developed for investigations of atomic spectra and other physical data. It is based on the nested sampling algorithm with the implementation of an upgraded lawn mower robot method for finding new live points. For a given data set and a chosen model, the program provides the Bayesian evidence, for the comparison of different hypotheses/models, and the different parameter probability distributions. A large database of spectral profiles is already available (Gaussian, Lorentz, Voigt, Log-normal, etc.) and additional ones can easily added. It is written in Fortran, for an optimized parallel computation, and it is accompanied by a Python library for the results visualization.

2018 ◽  
Author(s):  
Daniel Mortlock

Mathematics is the language of quantitative science, and probability and statistics are the extension of classical logic to real world data analysis and experimental design. The basics of mathematical functions and probability theory are summarized here, providing the tools for statistical modeling and assessment of experimental results. There is a focus on the Bayesian approach to such problems (ie, Bayesian data analysis); therefore, the basic laws of probability are stated, along with several standard probability distributions (eg, binomial, Poisson, Gaussian). A number of standard classical tests (eg, p values, the t-test) are also defined and, to the degree possible, linked to the underlying principles of probability theory. This review contains 5 figures, 1 table, and 15 references. Keywords: Bayesian data analysis, mathematical models, power analysis, probability, p values, statistical tests, statistics, survey design


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Maciej Lisicki ◽  
Marcos F Velho Rodrigues ◽  
Raymond E Goldstein ◽  
Eric Lauga

One approach to quantifying biological diversity consists of characterizing the statistical distribution of specific properties of a taxonomic group or habitat. Microorganisms living in fluid environments, and for whom motility is key, exploit propulsion resulting from a rich variety of shapes, forms, and swimming strategies. Here, we explore the variability of swimming speed for unicellular eukaryotes based on published data. The data naturally partitions into that from flagellates (with a small number of flagella) and from ciliates (with tens or more). Despite the morphological and size differences between these groups, each of the two probability distributions of swimming speed are accurately represented by log-normal distributions, with good agreement holding even to fourth moments. Scaling of the distributions by a characteristic speed for each data set leads to a collapse onto an apparently universal distribution. These results suggest a universal way for ecological niches to be populated by abundant microorganisms.


Author(s):  
Eun-Young Mun ◽  
Anne E. Ray

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.


2020 ◽  
Vol 499 (4) ◽  
pp. 5641-5652
Author(s):  
Georgios Vernardos ◽  
Grigorios Tsagkatakis ◽  
Yannis Pantazis

ABSTRACT Gravitational lensing is a powerful tool for constraining substructure in the mass distribution of galaxies, be it from the presence of dark matter sub-haloes or due to physical mechanisms affecting the baryons throughout galaxy evolution. Such substructure is hard to model and is either ignored by traditional, smooth modelling, approaches, or treated as well-localized massive perturbers. In this work, we propose a deep learning approach to quantify the statistical properties of such perturbations directly from images, where only the extended lensed source features within a mask are considered, without the need of any lens modelling. Our training data consist of mock lensed images assuming perturbing Gaussian Random Fields permeating the smooth overall lens potential, and, for the first time, using images of real galaxies as the lensed source. We employ a novel deep neural network that can handle arbitrary uncertainty intervals associated with the training data set labels as input, provides probability distributions as output, and adopts a composite loss function. The method succeeds not only in accurately estimating the actual parameter values, but also reduces the predicted confidence intervals by 10 per cent in an unsupervised manner, i.e. without having access to the actual ground truth values. Our results are invariant to the inherent degeneracy between mass perturbations in the lens and complex brightness profiles for the source. Hence, we can quantitatively and robustly quantify the smoothness of the mass density of thousands of lenses, including confidence intervals, and provide a consistent ranking for follow-up science.


2020 ◽  
Vol 72 (1) ◽  
Author(s):  
Ryuho Kataoka

Abstract Statistical distributions are investigated for magnetic storms, sudden commencements (SCs), and substorms to identify the possible amplitude of the one in 100-year and 1000-year events from a limited data set of less than 100 years. The lists of magnetic storms and SCs are provided from Kakioka Magnetic Observatory, while the lists of substorms are obtained from SuperMAG. It is found that majorities of events essentially follow the log-normal distribution, as expected from the random output from a complex system. However, it is uncertain that large-amplitude events follow the same log-normal distributions, and rather follow the power-law distributions. Based on the statistical distributions, the probable amplitudes of the 100-year (1000-year) events can be estimated for magnetic storms, SCs, and substorms as approximately 750 nT (1100 nT), 230 nT (450 nT), and 5000 nT (6200 nT), respectively. The possible origin to cause the statistical distributions is also discussed, consulting the other space weather phenomena such as solar flares, coronal mass ejections, and solar energetic particles.


2008 ◽  
Vol 06 (02) ◽  
pp. 261-282 ◽  
Author(s):  
AO YUAN ◽  
WENQING HE

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ruchi Mittal ◽  
Wasim Ahmed ◽  
Amit Mittal ◽  
Ishan Aggarwal

Purpose Using data from Twitter, the purpose of this paper is to assess the coping behaviour and reactions of social media users in response to the initial days of the COVID-19-related lockdown in different parts of the world. Design/methodology/approach This study follows the quasi-inductive approach which allows the development of pre-categories from other theories before the sampling and coding processes begin, for use in those processes. Data was extracted using relevant keywords from Twitter, and a sample was drawn from the Twitter data set to ensure the data is more manageable from a qualitative research standpoint and that meaningful interpretations can be drawn from the data analysis results. The data analysis is discussed in two parts: extraction and classification of data from Twitter using automated sentiment analysis; and qualitative data analysis of a smaller Twitter data sample. Findings This study found that during the lockdown the majority of users on Twitter shared positive opinions towards the lockdown. The results also found that people are keeping themselves engaged and entertained. Governments around the world have also gained support from Twitter users. This is despite the hardships being faced by citizens. The authors also found a number of users expressing negative sentiments. The results also found that several users on Twitter were fence-sitters and their opinions and emotions could swing either way depending on how the pandemic progresses and what action is taken by governments around the world. Research limitations/implications The authors add to the body of literature that has examined Twitter discussions around H1N1 using in-depth qualitative methods and conspiracy theories around COVID-19. In the long run, the government can help citizens develop routines that help the community adapt to a new dangerous environment – this has very effectively been shown in the context of wildfires in the context of disaster management. In the context of this research, the dominance of the positive themes within tweets is promising for policymakers and governments around the world. However, sentiments may wish to be monitored going forward as large-spikes in negative sentiment may highlight lockdown-fatigue. Social implications The psychology of humans during a pandemic can have a profound impact on how COVID-19 shapes up, and this shall also include how people behave with other people and with the larger environment. Lockdowns are the opposite of what societies strive to achieve, i.e. socializing. Originality/value This study is based on original Twitter data collected during the initial days of the COVID-19-induced lockdown. The topic of “lockdowns” and the “COVID-19” pandemic have not been studied together thus far. This study is highly topical.


2021 ◽  
Author(s):  
Adel Mehrabadi ◽  
Gabriele Urbani ◽  
Simona Renna ◽  
Lucia Rossi ◽  
Italo Luciani ◽  
...  

Abstract In case of giant brown fields, a proper water injection management can result in a very complex process, due to the quality and quantity of data to be analysed. Main issue is the understanding of the injected water preferential paths, especially in carbonate environment characterized by strong vertical and areal heterogeneities (karst). A structured workflow is presented to analyze and integrate a massive data set, in order to understand and optimize the water injection scheme. An extensive Production Data Analysis (PDA) has been performed, based on the integration of available geological data (including NMR and Cased Hole Logs), production (allocated rates, Well Tests, PLT), pressure (SBHP, RFT, MDT, ESP) and salinity data. The applied workflow led to build a Fluid Path Conceptual Model (FPCM), an easy but powerful tool to visualize the complex dynamic connections between injectors-producers and aquifer influence areas. Several diagnostic plots were performed to support and validate the main outcomes. On this basis, proper actions were implemented to optimize the current water injection scheme. The workflow was applied on a carbonate giant brown field characterized by three different reservoir members, hydraulically communicating at original conditions, characterized by high vertical heterogeneity and permeability contrast. Moreover, dissolution phenomena, localized in the uppermost reservoir section, led to important permeability enhancement through a wide network of connected vugs, acting as water preferential communication pathways. The geological analysis played a key role to investigate the reservoir water flooding mechanism in dynamic conditions. The water rising mechanism was identified to be driven by the high permeability contrast, hence characterized by lateral independent movements in the different reservoir members. The integrated analysis identified room for optimization of the current water injection strategy. In particular, key factor was the analysis and optimization at block scale, intended as areal and vertical sub-units, as identified by the PDA and visualized through the FPCM. Actions were suggested, including injection rates optimization and the definition of new injections points. A detailed surveillance plan was finally implemented to monitor the effects of the proposed actions on the field performances, proving the robustness of the methodology. Eni workflow for water injection analysis and optimization was previously successfully tested only in sandstone reservoirs. This paper shows the robustness of the methodology also in carbonate environment, where water encroachment is strongly driven by karst network. The result is a clear understanding of the main dynamics in the reservoir, which allows to better tune any action aimed to optimize water injection and increase the value of mature assets.


Sign in / Sign up

Export Citation Format

Share Document