Exact and Efficient Inference for Collective Flow Diffusion Model via Minimum Convex Cost Flow Algorithm

Collective Flow Diffusion Model (CFDM) is a general framework to find the hidden movements underlying aggregated population data. The key procedure in CFDM analysis is MAP inference of hidden variables. Unfortunately, existing approaches fail to offer exact MAP inferences, only approximate versions, and take a lot of computation time when applied to large scale problems. In this paper, we propose an exact and efficient method for MAP inference in CFDM. Our key idea is formulating the MAP inference problem as a combinatorial optimization problem called Minimum Convex Cost Flow Problem (C-MCFP) with no approximation or continuous relaxation. On the basis of this formulation, we propose an efficient inference method that employs the C-MCFP algorithm as a subroutine. Our experiments on synthetic and real datasets show that the proposed method is effective both in single MAP inference and people flow estimation with EM algorithm.

Download Full-text

Double Precision Is Not Needed for Many-Body Calculations: New Conventional Wisdom

10.26434/chemrxiv.6104804.v1 ◽

2018 ◽

Author(s):

Pavel Pokhilko ◽

Evgeny Epifanovsky ◽

Anna I. Krylov

Keyword(s):

Large Scale ◽

Computation Time ◽

Coupled Cluster ◽

Double Precision ◽

Many Body ◽

Single Precision ◽

Parallel Performance ◽

Point Representation ◽

Electron Repulsion Integrals ◽

Cluster Methods

Using single precision floating point representation reduces the size of data and computation time by a factor of two relative to double precision conventionally used in electronic structure programs. For large-scale calculations, such as those encountered in many-body theories, reduced memory footprint alleviates memory and input/output bottlenecks. Reduced size of data can lead to additional gains due to improved parallel performance on CPUs and various accelerators. However, using single precision can potentially reduce the accuracy of computed observables. Here we report an implementation of coupled-cluster and equation-of-motion coupled-cluster methods with single and double excitations in single precision. We consider both standard implementation and one using Cholesky decomposition or resolution-of-the-identity of electron-repulsion integrals. Numerical tests illustrate that when single precision is used in correlated calculations, the loss of accuracy is insignificant and pure single-precision implementation can be used for computing energies, analytic gradients, excited states, and molecular properties. In addition to pure single-precision calculations, our implementation allows one to follow a single-precision calculation by clean-up iterations, fully recovering double-precision results while retaining significant savings.

Download Full-text

Distributed learning with indefinite kernels

Analysis and Applications ◽

10.1142/s021953051850032x ◽

2019 ◽

Vol 17 (06) ◽

pp. 947-975 ◽

Cited By ~ 2

Author(s):

Lei Shi

Keyword(s):

Large Scale ◽

Substantial Reduction ◽

Computation Time ◽

Distributed Learning ◽

Rates Of Convergence ◽

Regression Problem ◽

Data Set ◽

Regularization Scheme ◽

Original Algorithm ◽

Indefinite Kernel

We investigate the distributed learning with coefficient-based regularization scheme under the framework of kernel regression methods. Compared with the classical kernel ridge regression (KRR), the algorithm under consideration does not require the kernel function to be positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods. The distributed learning approach partitions a massive data set into several disjoint data subsets, and then produces a global estimator by taking an average of the local estimator on each data subset. Easy exercisable partitions and performing algorithm on each subset in parallel lead to a substantial reduction in computation time versus the standard approach of performing the original algorithm on the entire samples. We establish the first mini-max optimal rates of convergence for distributed coefficient-based regularization scheme with indefinite kernels. We thus demonstrate that compared with distributed KRR, the concerned algorithm is more flexible and effective in regression problem for large-scale data sets.

Download Full-text

Towards A Multi-FPGA Infrared Simulator

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

10.1177/154851290700400404 ◽

2007 ◽

Vol 4 (4) ◽

pp. 343-355 ◽

Cited By ~ 1

Author(s):

Vinay Sriram ◽

David Kearney

Keyword(s):

Homeland Security ◽

Reconfigurable Computing ◽

High Speed ◽

High Performance ◽

Large Scale ◽

Computation Time ◽

Ccd Camera ◽

Hardware Acceleration ◽

Limiting Factor ◽

Scene Simulation

High speed infrared (IR) scene simulation is used extensively in defense and homeland security to test sensitivity of IR cameras and accuracy of IR threat detection and tracking algorithms used commonly in IR missile approach warning systems (MAWS). A typical MAWS requires an input scene rate of over 100 scenes/second. Infrared scene simulations typically take 32 minutes to simulate a single IR scene that accounts for effects of atmospheric turbulence, refraction, optical blurring and charge-coupled device (CCD) camera electronic noise on a Pentium 4 (2.8GHz) dual core processor [7]. Thus, in IR scene simulation, the processing power of modern computers is a limiting factor. In this paper we report our research to accelerate IR scene simulation using high performance reconfigurable computing. We constructed a multi Field Programmable Gate Array (FPGA) hardware acceleration platform and accelerated a key computationally intensive IR algorithm over the hardware acceleration platform. We were successful in reducing the computation time of IR scene simulation by over 36%. This research acts as a unique case study for accelerating large scale defense simulations using a high performance multi-FPGA reconfigurable computer.

Download Full-text

Selection and Substantiation of Methods for Evaluating Cosmetic Products for Deodorant and Antiperspirant Action

Biotekhnologiya ◽

10.21519/0234-2758-2021-37-2-54-64 ◽

2021 ◽

Vol 37 (2) ◽

pp. 54-64

Author(s):

D.V. Barabash ◽

I.A. Butorova

Keyword(s):

Large Scale ◽

Gravimetric Method ◽

Population Data ◽

Diffusion Method ◽

Disc Diffusion ◽

Disc Diffusion Method ◽

Large Scale Testing ◽

Sweat Secretion ◽

Microbiological Methods ◽

Number Of Microorganisms

The possibility of using simple and available methods for analyzing deodorants/antiperspirants has been studied. The gravimetric method was shown to have acceptable metrological characteristics under repeatability conditions when evaluating antiperspirant activity. A decrease in the number of microorganisms (CFU) on the axilla skin was observed in a rinse test experiment 4 h and 8 h after the application of deodorants/antiperspirants. The microbial population data were inversely proportional to the antiperspirant activity values of the tested compositions. The sweat secretion reducing decreases the amount of nutrients required for microbial development, which makes it possible to use the rinse test to indirectly evaluate deodorant activity in research and development of personal care products. However, due to its laboriousness and the need for volunteers, the method cannot be recommended for large-scale testing. It was shown that the disc diffusion method (DDM) used to detect Staphylococcus aureus, Pseudomonas aeruginosa and Bacillus subtilis cannot be applied to the assessment of the intrinsic antimicrobial activity of the tested cosmetic compositions. This indicates the necessity of additional studies to select test microorganisms typical for the armpit area. In addition, DDM is useful if the deodorant effect of the composition is created by the addition of low-volatile antibacterial compounds. Therefore, microbiological methods have limited applications and are not suitable for widespread use. deodorant action; antiperspirant action, gravimetry, disc diffusion method, rinse test; deodorant; antiperspirant; cosmetic; efficiency; consumer properties, functional properties This work was supported by MUCTR (project no. K-2020-007).

Download Full-text

Towards Intelligent Road Traffic Management over Weighted Large Graphs Hybrid Meta-heuristic-Based Approach

Journal of Cases on Information Technology ◽

10.4018/jcit.20220801oa06 ◽

2022 ◽

Vol 24 (3) ◽

pp. 0-0

Keyword(s):

Traffic Management ◽

Large Scale ◽

Road Traffic ◽

Optimization Technique ◽

Shortest Paths ◽

Hybrid Genetic Algorithm ◽

Optimal Solution ◽

Computation Time ◽

Exact Algorithm ◽

On The Road

This paper introduces a new approach of hybrid meta-heuristics based optimization technique for decreasing the computation time of the shortest paths algorithm. The problem of finding the shortest paths is a combinatorial optimization problem which has been well studied from various fields. The number of vehicles on the road has increased incredibly. Therefore, traffic management has become a major problem. We study the traffic network in large scale routing problems as a field of application. The meta-heuristic we propose introduces new hybrid genetic algorithm named IOGA. The problem consists of finding the k optimal paths that minimizes a metric such as distance, time, etc. Testing was performed using an exact algorithm and meta-heuristic algorithm on random generated network instances. Experimental analyses demonstrate the efficiency of our proposed approach in terms of runtime and quality of the result. Empirical results obtained show that the proposed algorithm outperforms some of the existing technique in term of the optimal solution in every generation.

Download Full-text

HiBuffer: Buffer Analysis of 10-Million-Scale Spatial Data in Real Time

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120467 ◽

2018 ◽

Vol 7 (12) ◽

pp. 467 ◽

Cited By ~ 3

Author(s):

Mengyu Ma ◽

Ye Wu ◽

Wenze Luo ◽

Luo Chen ◽

Jun Li ◽

...

Keyword(s):

Real Time ◽

Spatial Data ◽

High Performance ◽

Large Scale ◽

Computation Time ◽

Buffer Analysis ◽

Data Volume ◽

Time Buffer ◽

Real World Datasets ◽

Spatial Indexes

Buffer analysis, a fundamental function in a geographic information system (GIS), identifies areas by the surrounding geographic features within a given distance. Real-time buffer analysis for large-scale spatial data remains a challenging problem since the computational scales of conventional data-oriented methods expand rapidly with increasing data volume. In this paper, we introduce HiBuffer, a visualization-oriented model for real-time buffer analysis. An efficient buffer generation method is proposed which introduces spatial indexes and a corresponding query strategy. Buffer results are organized into a tile-pyramid structure to enable stepless zooming. Moreover, a fully optimized hybrid parallel processing architecture is proposed for the real-time buffer analysis of large-scale spatial data. Experiments using real-world datasets show that our approach can reduce computation time by up to several orders of magnitude while preserving superior visualization effects. Additional experiments were conducted to analyze the influence of spatial data density, buffer radius, and request rate on HiBuffer performance, and the results demonstrate the adaptability and stability of HiBuffer. The parallel scalability of HiBuffer was also tested, showing that HiBuffer achieves high performance of parallel acceleration. Experimental results verify that HiBuffer is capable of handling 10-million-scale data.

Download Full-text

Multilevel Regression and Poststratification Versus Survey Sample Weighting for Estimating Population Quantities in Large Population Health Studies

American Journal of Epidemiology ◽

10.1093/aje/kwaa053 ◽

2020 ◽

Vol 189 (7) ◽

pp. 717-725 ◽

Cited By ~ 1

Author(s):

Marnie Downes ◽

John B Carlin

Keyword(s):

Population Health ◽

Large Scale ◽

Census Data ◽

Large Population ◽

Population Data ◽

Superior Performance ◽

Population Parameter ◽

Multilevel Regression ◽

Sample Weighting ◽

Survey Weighting

Abstract Multilevel regression and poststratification (MRP) is a model-based approach for estimating a population parameter of interest, generally from large-scale surveys. It has been shown to be effective in highly selected samples, which is particularly relevant to investigators of large-scale population health and epidemiologic surveys facing increasing difficulties in recruiting representative samples of participants. We aimed to further examine the accuracy and precision of MRP in a context where census data provided reasonable proxies for true population quantities of interest. We considered 2 outcomes from the baseline wave of the Ten to Men study (Australia, 2013–2014) and obtained relevant population data from the 2011 Australian Census. MRP was found to achieve generally superior performance relative to conventional survey weighting methods for the population as a whole and for population subsets of varying sizes. MRP resulted in less variability among estimates across population subsets relative to sample weighting, and there was some evidence of small gains in precision when using MRP, particularly for smaller population subsets. These findings offer further support for MRP as a promising analytical approach for addressing participation bias in the estimation of population descriptive quantities from large-scale health surveys and cohort studies.

Download Full-text

Space-time clustering-based method to optimize shareability in real-time ride-sharing

PLoS ONE ◽

10.1371/journal.pone.0262499 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0262499

Author(s):

Negin Alisoltani ◽

Mostafa Ameli ◽

Mahdi Zargayouna ◽

Ludovic Leclercq

Keyword(s):

Real Time ◽

Large Scale ◽

Computation Time ◽

Clustering Method ◽

Matching Problem ◽

Solution Quality ◽

Mobility Service ◽

Ride Sharing ◽

Large Scale Problems ◽

Spatio Temporal

Real-time ride-sharing has become popular in recent years. However, the underlying optimization problem for this service is highly complex. One of the most critical challenges when solving the problem is solution quality and computation time, especially in large-scale problems where the number of received requests is huge. In this paper, we rely on an exact solving method to ensure the quality of the solution, while using AI-based techniques to limit the number of requests that we feed to the solver. More precisely, we propose a clustering method based on a new shareability function to put the most shareable trips inside separate clusters. Previous studies only consider Spatio-temporal dependencies to do clustering on the mobility service requests, which is not efficient in finding the shareable trips. Here, we define the shareability function to consider all the different sharing states for each pair of trips. Each cluster is then managed with a proposed heuristic framework in order to solve the matching problem inside each cluster. As the method favors sharing, we present the number of sharing constraints to allow the service to choose the number of shared trips. To validate our proposal, we employ the proposed method on the network of Lyon city in France, with half-million requests in the morning peak from 6 to 10 AM. The results demonstrate that the algorithm can provide high-quality solutions in a short time for large-scale problems. The proposed clustering method can also be used for different mobility service problems such as car-sharing, bike-sharing, etc.

Download Full-text

Same model, different conclusions: An identifiability issue in the linear ballistic accumulator model of decision-making

10.31234/osf.io/2xu7f ◽

2020 ◽

Author(s):

Nathan J. Evans

Keyword(s):

Decision Making ◽

Diffusion Model ◽

Large Scale ◽

Drift Rate ◽

Decision Time ◽

Measurement Properties ◽

Decision Threshold ◽

Experimental Conditions ◽

Model Application ◽

Linear Ballistic Accumulator

Evidence accumulation models (EAMs) – the dominant modelling framework for speeded decision-making – have become an important tool for model application. Model application involves using specific model to estimate parameter values that relate to different components of the cognitive process, and how these values differ over experimental conditions and/or between groups of participants. In this context, researchers are often agnostic to the specific theoretical assumptions made by different EAM variants, and simply desire a model that will provide them with an accurate measurement of the parameters that they are interested in. However, recent research has suggested that the two most commonly applied EAMs – the diffusion model and the linear ballistic accumulator (LBA) – come to fundamentally different conclusions when applied to the same empirical data. The current study provides an in-depth assessment of the measurement properties of the two models, as well as the mapping between, using two large scale simulation studies and a reanalysis of Evans (2020a). Importantly, the findings indicate that there is a major identifiability issue within the standard LBA, where differences in decision threshold between conditions are practically unidentifiable, which appears to be caused by a tradeoff between the threshold parameter and the overall drift rate across the different accumulators. While this issue can be remedied by placing some constraint on the overall drift rate across the different accumulators – such as constraining the average drift rate or the drift rate of one accumulator to have the same value in each condition – these constraints can qualitatively change the conclusions of the LBA regarding other constructs, such as non-decision time. Furthermore, all LBA variants considered in the current study still provide qualitatively different conclusions to the diffusion model. Importantly, the current findings suggest that researchers should not use the unconstrained version of the LBA for model application, and bring into question the conclusions of previous studies using the unconstrained LBA.

Download Full-text

The spatial allocation of population: a review of large-scale gridded population data products and their fitness for use

Earth System Science Data ◽

10.5194/essd-11-1385-2019 ◽

2019 ◽

Vol 11 (3) ◽

pp. 1385-1409 ◽

Cited By ~ 36

Author(s):

Stefan Leyk ◽

Andrea E. Gaughan ◽

Susana B. Adamo ◽

Alex de Sherbinin ◽

Deborah Balk ◽

...

Keyword(s):

Large Scale ◽

Population Data ◽

Disaster Risk ◽

Spatial Allocation ◽

Quality Aspects ◽

Data Products ◽

Methodological Approaches ◽

Different Characteristics ◽

Population Counts ◽

Relative Quality

Abstract. Population data represent an essential component in studies focusing on human–nature interrelationships, disaster risk assessment and environmental health. Several recent efforts have produced global- and continental-extent gridded population data which are becoming increasingly popular among various research communities. However, these data products, which are of very different characteristics and based on different modeling assumptions, have never been systematically reviewed and compared, which may impede their appropriate use. This article fills this gap and presents, compares and discusses a set of large-scale (global and continental) gridded datasets representing population counts or densities. It focuses on data properties, methodological approaches and relative quality aspects that are important to fully understand the characteristics of the data with regard to the intended uses. Written by the data producers and members of the user community, through the lens of the “fitness for use” concept, the aim of this paper is to provide potential data users with the knowledge base needed to make informed decisions about the appropriateness of the data products available in relation to the target application and for critical analysis.

Download Full-text