Monte Carlo Physarum Machine: Characteristics of Pattern Formation in Continuous Stochastic Transport Networks

Abstract We present Monte Carlo Physarum Machine (MCPM): a computational model suitable for reconstructing continuous transport networks from sparse 2D and 3D data. MCPM is a probabilistic generalization of Jones's (2010) agent-based model for simulating the growth of Physarum polycephalum (slime mold). We compare MCPM to Jones's work on theoretical grounds, and describe a task-specific variant designed for reconstructing the large-scale distribution of gas and dark matter in the Universe known as the cosmic web. To analyze the new model, we first explore MCPM's self-patterning behavior, showing a wide range of continuous network-like morphologies—called polyphorms—that the model produces from geometrically intuitive parameters. Applying MCPM to both simulated and observational cosmological data sets, we then evaluate its ability to produce consistent 3D density maps of the cosmic web. Finally, we examine other possible tasks where MCPM could be useful, along with several examples of fitting to domain-specific data as proofs of concept.

Download Full-text

Comparative Cladistics: Fossils, Morphological Data Partitions and Lost Branches in the Fossil Tree of Life

10.31237/osf.io/7sa6d ◽

2017 ◽

Author(s):

Ross Mounce

Keyword(s):

Large Scale ◽

Phylogenetic Signal ◽

Scientific Progress ◽

Molecular Data ◽

Morphological Data ◽

Data Sets ◽

Phylogenetic Groups ◽

Use Of Data ◽

Wide Range ◽

Cladistic Analyses

In this thesis I attempt to gather together a wide range of cladistic analyses of fossil and extant taxa representing a diverse array of phylogenetic groups. I use this data to quantitatively compare the effect of fossil taxa relative to extant taxa in terms of support for relationships, number of most parsimonious trees (MPTs) and leaf stability. In line with previous studies I find that the effects of fossil taxa are seldom different to extant taxa – although I highlight some interesting exceptions. I also use this data to compare the phylogenetic signal within vertebrate morphological data sets, by choosing to compare cranial data to postcranial data. Comparisons between molecular data and morphological data have been previously well explored, as have signals between different molecular loci. But comparative signal within morphological data sets is much less commonly characterized and certainly not across a wide array of clades. With this analysis I show that there are many studies in which the evidence provided by cranial data appears to be be significantly incongruent with the postcranial data – more than one would expect to see just by the effect of chance and noise alone. I devise and implement a modification to a rarely used measure of homoplasy that will hopefully encourage its wider usage. Previously it had some undesirable bias associated with the distribution of missing data in a dataset, but my modification controls for this. I also take an in-depth and extensive review of the ILD test, noting it is often misused or reported poorly, even in recent studies. Finally, in attempting to collect data and metadata on a large scale, I uncovered inefficiencies in the research publication system that obstruct re-use of data and scientific progress. I highlight the importance of replication and reproducibility – even simple reanalysis of high profile papers can turn up some very different results. Data is highly valuable and thus it must be retained and made available for further re-use to maximize the overall return on research investment.

Download Full-text

Denoising large-scale biological data using network filters

10.21203/rs.3.rs-66071/v2 ◽

2021 ◽

Author(s):

Andrew J Kavran ◽

Aaron Clauset

Keyword(s):

Large Scale ◽

Synthetic Data ◽

Interaction Network ◽

Learning Task ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Life History Variation ◽

Wide Range ◽

Underlying Processes

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “ﬁltered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network ﬁlter may be applied to an entire system, or the system may be ﬁrst decomposed into distinct modules and a diﬀerent ﬁlter applied to each. Applied to synthetic data with known network structure and signal, network ﬁlters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network ﬁltering prior to training increases accuracy up to 43% compared to using unﬁltered data.Conclusions: Network ﬁlters are a general way to denoise biological data and can account for both correlation and anti-correlation between diﬀerent measurements. Furthermore, we ﬁnd that partitioning a network prior to ﬁltering can signiﬁcantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diﬀusion based methods. Our results on proteomics data indicate the broad potential utility of network ﬁlters to applications in systems biology.

Download Full-text

Usage and Scaling of an Open-Source Spiking Multi-Area Model of Monkey Cortex

Lecture Notes in Computer Science - Brain-Inspired Computing ◽

10.1007/978-3-030-82427-3_4 ◽

2021 ◽

pp. 47-59

Author(s):

Sacha J. van Albada ◽

Jari Pronold ◽

Alexander van Meegen ◽

Markus Diesmann

Keyword(s):

Open Source ◽

Large Scale ◽

Network Models ◽

Macaque Monkey ◽

Source Model ◽

Model Specification ◽

Data Sets ◽

Neural Network Models ◽

Wide Range ◽

Ict Infrastructure

AbstractWe are entering an age of ‘big’ computational neuroscience, in which neural network models are increasing in size and in numbers of underlying data sets. Consolidating the zoo of models into large-scale models simultaneously consistent with a wide range of data is only possible through the effort of large teams, which can be spread across multiple research institutions. To ensure that computational neuroscientists can build on each other’s work, it is important to make models publicly available as well-documented code. This chapter describes such an open-source model, which relates the connectivity structure of all vision-related cortical areas of the macaque monkey with their resting-state dynamics. We give a brief overview of how to use the executable model specification, which employs NEST as simulation engine, and show its runtime scaling. The solutions found serve as an example for organizing the workflow of future models from the raw experimental data to the visualization of the results, expose the challenges, and give guidance for the construction of an ICT infrastructure for neuroscience.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Phase Diagram of a Lattice Model for Ternary Mixtures of Water, Oil, and Surfactants

MRS Proceedings ◽

10.1557/proc-248-23 ◽

1991 ◽

Vol 248 ◽

Cited By ~ 1

Author(s):

Mohamed Laradji ◽

Hong Guo ◽

Martin Grant ◽

Martin J. Zuchkermann

Keyword(s):

Phase Diagram ◽

Monte Carlo ◽

Phase Behavior ◽

Lattice Model ◽

Large Scale ◽

Ternary Mixtures ◽

Rich Phase ◽

Component System ◽

Chemical Potentials ◽

Wide Range

AbstractLarge scale Monte-Carlo simulations have been performed on a lattice model for a three component system of water, oil, and surfactants to obtain the phase equilibria and scattering behavior for a wide range of temperatures and chemical potentials. We observed that this model has a rich phase behavior, namely a water-oil phase coexistence, a microemulsion phase, a lamellar phase, and a square phase. This phase diagram is consistent with experiments, and is in qualitative agreement with a model of Gompper and Schick [ Phys. Rev. Lett. 62, 1647 (1989)].

Download Full-text

The response of the thermosphere and ionosphere to magnetospheric forcing

Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences ◽

10.1098/rsta.1989.0029 ◽

1989 ◽

Vol 328 (1598) ◽

pp. 139-171 ◽

Cited By ~ 28

Keyword(s):

Electric Fields ◽

Plasma Density ◽

Large Scale ◽

Data Sets ◽

Polar Regions ◽

Ionospheric Model ◽

Plasma Convection ◽

Geomagnetic Disturbances ◽

F Region ◽

Wide Range

During the past six years, rapid advances in three observational techniques (groundbased radars, optical interferometers and satellite-borne instruments) have provided a means of observing a wide range of spectacular interactions between the coupled magnetosphere, ionosphere and thermosphere system. Perhaps the most fundamental gain has come from the combined data-sets from the NASA Dynamics Explorer ( DE ) Satellites. These have unambiguously described the global nature of thermospheric flows, and their response to magnetospheric forcing. The DE spacecraft have also described, at the same time, the magnetospheric particle precipitation and convective electric fields which force the polar thermosphere and ionosphere. The response of the thermosphere to magnetospheric forcing is far more complex than merely the rare excitation of 1 km s -1 wind speeds and strong heating; the heating causes large-scale convection and advection within the thermosphere. These large winds grossly change the compositional structure of the upper thermosphere at high and middle latitudes during major geomagnetic disturbances. Some of the major seasonal and geomagnetic storm-related anomalies of the ionosphere are directly attributable to the gross windinduced changes of thermospheric composition; the mid-latitude ionospheric storm ‘negative phase’, however, is yet to be fully understood. The combination of very strong polar wind velocities and rapid plasma convection forced by magnetospheric electric fields strongly and rapidly modify F-region plasma distributions generated by the combination of local solar and auroral ionization sources. Until recently, however, it has been difficult to interpret the observed complex spatial and timedependent structures and motions of the thermosphere and ionosphere because of their strong and nonlinear coupling. It has recently been possible to complete a numerical and computational merging of the University College London (UCL) global thermospheric model and the Sheffield University ionospheric model. This has produced a self-consistent coupled thermospheric-ionospheric model, which has become a valuable diagnostic tool for examining thermospheric-ionospheric interactions in the polar regions. In particular, it is possible to examine the effects of induced winds, ion transport, and the seasonal and diurnal U.T. variations of solar heating and photoionization within the polar regions. Polar and high-latitude plasma density structure at F-region altitudes can be seen to be strongly controlled by U.T., and by season, even for constant solar and geomagnetic activity. In the winter, the F-region polar plasma density is generally dominated by the effects of transport of plasma from the dayside (sunlit cusp). In the summer polar region, however, an increase in the proportion of molecular to atomic species, created by the global seasonal circulation and augmented by the geomagnetic forcing, controls the plasma composition and generally depresses plasma densities at all U.Ts. A number of these complex effects can be seen in data obtained from ground-based radars, Fabry-Perot interferometers and in the combined DE data-sets. Several of these observations will be used, in combination with simulations using the UCL-Sheffield coupled model, to illustrate the major features of large-scale thermosphere-ionosphere interactions in response to geomagnetic forcing.

Download Full-text

Tensor Approximation for Multidimensional and Multivariate Data

Mathematics and Visualization - Anisotropy Across Fields and Scales ◽

10.1007/978-3-030-56215-1_4 ◽

2021 ◽

pp. 73-98

Author(s):

Renato Pajarola ◽

Susanne K. Suter ◽

Rafael Ballester-Ripoll ◽

Haiyan Yang

Keyword(s):

Large Scale ◽

Multivariate Data ◽

Tensor Decomposition ◽

Decomposition Methods ◽

Compact Representation ◽

Data Sets ◽

Tensor Fields ◽

Large Scale Data ◽

Wide Range ◽

Tensor Approximation

AbstractTensor decomposition methods and multilinear algebra are powerful tools to cope with challenges around multidimensional and multivariate data in computer graphics, image processing and data visualization, in particular with respect to compact representation and processing of increasingly large-scale data sets. Initially proposed as an extension of the concept of matrix rank for 3 and more dimensions, tensor decomposition methods have found applications in a remarkably wide range of disciplines. We briefly review the main concepts of tensor decompositions and their application to multidimensional visual data. Furthermore, we will include a first outlook on porting these techniques to multivariate data such as vector and tensor fields.

Download Full-text

Monte Carlo simulation for uncertainty estimation on structural data in implicit 3-D geological modeling, a guide for disturbance distribution selection and parameterization

Solid Earth ◽

10.5194/se-9-385-2018 ◽

2018 ◽

Vol 9 (2) ◽

pp. 385-402 ◽

Cited By ~ 20

Author(s):

Evren Pakyuz-Charrier ◽

Mark Lindsay ◽

Vitaliy Ogarko ◽

Jeremie Giraud ◽

Mark Jessell

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Input Data ◽

Structural Data ◽

Uncertainty Estimation ◽

Data Sets ◽

Geological Modeling ◽

Input Uncertainty ◽

Wide Range ◽

Geological Models

Abstract. Three-dimensional (3-D) geological structural modeling aims to determine geological information in a 3-D space using structural data (foliations and interfaces) and topological rules as inputs. This is necessary in any project in which the properties of the subsurface matters; they express our understanding of geometries in depth. For that reason, 3-D geological models have a wide range of practical applications including but not restricted to civil engineering, the oil and gas industry, the mining industry, and water management. These models, however, are fraught with uncertainties originating from the inherent flaws of the modeling engines (working hypotheses, interpolator's parameterization) and the inherent lack of knowledge in areas where there are no observations combined with input uncertainty (observational, conceptual and technical errors). Because 3-D geological models are often used for impactful decision-making it is critical that all 3-D geological models provide accurate estimates of uncertainty. This paper's focus is set on the effect of structural input data measurement uncertainty propagation in implicit 3-D geological modeling. This aim is achieved using Monte Carlo simulation for uncertainty estimation (MCUE), a stochastic method which samples from predefined disturbance probability distributions that represent the uncertainty of the original input data set. MCUE is used to produce hundreds to thousands of altered unique data sets. The altered data sets are used as inputs to produce a range of plausible 3-D models. The plausible models are then combined into a single probabilistic model as a means to propagate uncertainty from the input data to the final model. In this paper, several improved methods for MCUE are proposed. The methods pertain to distribution selection for input uncertainty, sample analysis and statistical consistency of the sampled distribution. Pole vector sampling is proposed as a more rigorous alternative than dip vector sampling for planar features and the use of a Bayesian approach to disturbance distribution parameterization is suggested. The influence of incorrect disturbance distributions is discussed and propositions are made and evaluated on synthetic and realistic cases to address the sighted issues. The distribution of the errors of the observed data (i.e., scedasticity) is shown to affect the quality of prior distributions for MCUE. Results demonstrate that the proposed workflows improve the reliability of uncertainty estimation and diminish the occurrence of artifacts.

Download Full-text

HMCtomo: A framework for Hamiltonian Monte Carlo sampling of Bayesian geophysical inverse problems

10.5194/egusphere-egu2020-9394 ◽

2020 ◽

Author(s):

Lars Gebraad ◽

Andrea Zunino ◽

Andreas Fichtner ◽

Klaus Mosegaard

Keyword(s):

Monte Carlo ◽

Bayesian Inference ◽

Inverse Problems ◽

Large Scale ◽

Monte Carlo Sampling ◽

Hamiltonian Monte Carlo ◽

Forward Models ◽

Inference Problems ◽

Wide Range ◽

Gradient Based

<div>We present a framework to solve geophysical inverse problems using the Hamiltonian Monte Carlo (HMC) method, with a focus on Bayesian tomography. Recent work in the geophysical community has shown the potential for gradient-based Monte Carlo sampling for a wide range of inverse problems across several fields.</div><div>&#160;</div><div>Many high-dimensional (non-linear) problems in geophysics have readily accessible gradient information which is unused in classical probabilistic inversions. Using HMC is a way to help improve traditional Monte Carlo sampling while increasing the scalability of inference problems, allowing access to uncertainty quantification for problems with many free parameters (>10'000). The result of HMC sampling is a collection of models representing the posterior probability density function, from which not only "best" models can be inferred, but also uncertainties and potentially different plausible scenarios, all compatible with the observed data. However, the amount of tuning parameters required by HMC, as well as the complexity of existing statistical modeling software, has limited the geophysical community in widely adopting a specific tool for performing efficient large-scale Bayesian inference.</div><div>&#160;</div><div>This work attempts to make a step towards filling that gap by providing an HMC sampler tailored for geophysical inverse problems (by e.g. supplying relevant priors and visualizations) combined with a set of different forward models, ranging from elastic and acoustic wave propagation to magnetic anomaly modeling, traveltimes, etc.. The framework is coded in the didactic but performant languages Julia and Python, with the possibility for the user to combine their own forward models, which are linked to the sampler routines by proper interfaces. In this way, we hope to illustrate the usefulness and potential of HMC in Bayesian inference. Tutorials featuring an array of physical experiments are written with the aim of both showcasing Bayesian inference and successful HMC usage. It additionally includes examples on how to speed up HMC e.g. with automated tuning techniques and GPU computations.</div>

Download Full-text

Muti-technique observations and modelling of the gas and dust phases of protoplanetary disks

Proceedings of the International Astronomical Union ◽

10.1017/s1743921310011506 ◽

2009 ◽

Vol 5 (H15) ◽

pp. 767-767

Author(s):

C. Pinte ◽

F. Ménard ◽

G. Duchěne ◽

J. C. Augereau

Keyword(s):

Gas Phase ◽

Observational Data ◽

Large Scale ◽

Scattered Light ◽

Protoplanetary Disks ◽

Quality Data ◽

Data Sets ◽

Wide Range ◽

Dust Component ◽

Chemical Conditions

A wide range of high-quality data is becoming available for protoplanetary disks. From these data sets many issues have already been addressed, such as constraining the large scale geometry of disks, finding evidence of dust grain evolution, as well as constraining the kinematics and physico-chemical conditions of the gas phase. Most of these results are based on models that emphasise fitting observations of either the dust component (SEDs or scattered light images or, more recently, interferometric visibilities), or the gas phase (resolved maps in molecular lines). In this contribution, we present a more global approach which aims at interpreting consistently the increasing amount of observational data in the framework of a single model, in order to to better characterize both the dust population and the gas disk properties, as well as their interactions. We present results of such modeling applied to a few disks (e.g. IM Lup, see Figure) with large observational data-sets available (scattered light images, polarisation maps, IR spectroscopy, X-ray spectrum, CO maps). These kinds of multi-wavelengths studies will become very powerful in the context of forthcoming instruments such as Herschel and ALMA.

Download Full-text