Adaptive divergence for rapid adversarial optimization

Adversarial Optimization provides a reliable, practical way to match two implicitly defined distributions, one of which is typically represented by a sample of real data, and the other is represented by a parameterized generator. Matching of the distributions is achieved by minimizing a divergence between these distribution, and estimation of the divergence involves a secondary optimization task, which, typically, requires training a model to discriminate between these distributions. The choice of the model has its trade-off: high-capacity models provide good estimations of the divergence, but, generally, require large sample sizes to be properly trained. In contrast, low-capacity models tend to require fewer samples for training; however, they might provide biased estimations. Computational costs of Adversarial Optimization becomes significant when sampling from the generator is expensive. One of the practical examples of such settings is fine-tuning parameters of complex computer simulations. In this work, we introduce a novel family of divergences that enables faster optimization convergence measured by the number of samples drawn from the generator. The variation of the underlying discriminator model capacity during optimization leads to a significant speed-up. The proposed divergence family suggests using low-capacity models to compare distant distributions (typically, at early optimization steps), and the capacity gradually grows as the distributions become closer to each other. Thus, it allows for a significant acceleration of the initial stages of optimization. This acceleration was demonstrated on two fine-tuning problems involving Pythia event generator and two of the most popular black-box optimization algorithms: Bayesian Optimization and Variational Optimization. Experiments show that, given the same budget, adaptive divergences yield results up to an order of magnitude closer to the optimum than Jensen-Shannon divergence. While we consider physics-related simulations, adaptive divergences can be applied to any stochastic simulation.

Download Full-text

Parallel Likelihood Calculation for Phylogenetic Comparative Models: the SPLITT C++ Library

10.1101/235739 ◽

2017 ◽

Author(s):

Venelin Mitov ◽

Tanja Stadler

Keyword(s):

Mixed Model ◽

Likelihood Function ◽

Real Data ◽

List Type ◽

Large Trees ◽

Tree Traversal ◽

Order Of Magnitude ◽

Speed Up ◽

Memory Architectures ◽

Phylogenetic Models

AbstractPhylogenetic comparative models (PCMs) have been used to study macroevolutionary patterns, to characterize adaptive phenotypic landscapes, to quantify rates of evolution, to measure the heritability of traits, and to test various evolutionary hypotheses. A major obstacle to applying these models has been the complexity of evaluating their likelihood function. Recent works have shown that for many PCMs, the likelihood can be obtained in time proportional to the size of the tree based on post-order tree traversal, also known as pruning. Despite this progress, inferring complex multi-trait PCMs on large trees remains a time-intensive task. Here, we study parallelizing the pruning algorithm as a generic technique for speeding-up PCM-inference.We implement several parallel traversal algorithms in the form of a generic C++ library for Serial and Parallel LIneage Traversal of Trees (SPLITT). Based on SPLITT, we provide examples of parallel likelihood evaluation for several popular PCMs, ranging from a single-trait Brownian motion model to complex multi-trait Ornstein-Uhlenbeck and mixed Gaussian phylogenetic models.Using the phylogenetic Ornstein-Uhlenbeck mixed model (POUMM) as a showcase, we run benchmarks on up to 24 CPU cores, reporting up to an order of magnitude parallel speed-up on simulated balanced and unbalanced trees of up to 100,000 tips with up to 16 traits. Noticing that the parallel speed-up depends on multiple factors, the SPLITT library is capable to automatically select the fastest traversal strategy for a given hardware, tree-topology, and data. Combining SPLITT likelihood calculation with adaptive Metropolis sampling on real data, we show that the time for Bayesian POUMM inference on a tree of 10,000 tips can be reduced from several days to minutes.We conclude that parallel pruning effectively accelerates the likelihood calculation and, thus, the statistical inference of Gaussian phylogenetic models. For time-intensive Bayesian inferences, we recommend combining this technique with adaptive Metropolis sampling. Beyond Gaussian models, the parallel tree traversal can be applied to numerous other models, including discrete trait and birth-death population dynamics models. Currently, SPLITT supports multi-core shared memory architectures, but can be extended to distributed memory architectures as well as graphical processing units.

Download Full-text

Fine-Tuning Textrank for Legal Document Summarization: A Bayesian Optimization Based Approach

Forum for Information Retrieval Evaluation ◽

10.1145/3441501.3441502 ◽

2020 ◽

Author(s):

Deepali Jain ◽

Malaya Dutta Borah ◽

Anupam Biswas

Keyword(s):

Fine Tuning ◽

Bayesian Optimization ◽

Document Summarization ◽

Legal Document

Download Full-text

Fine-tuning CLB placement to speed up reconfigurations in NVM-based FPGAs

2015 25th International Conference on Field Programmable Logic and Applications (FPL) ◽

10.1109/fpl.2015.7294013 ◽

2015 ◽

Cited By ~ 9

Author(s):

Yuan Xue ◽

Patrick Cronin ◽

Chengmo Yang ◽

Jingtong Hu

Keyword(s):

Fine Tuning ◽

Speed Up

Download Full-text

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0003 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 26-46 ◽

Cited By ~ 2

Author(s):

Thee Chanyaswad ◽

Changchang Liu ◽

Prateek Mittal

Keyword(s):

Machine Learning ◽

Real World ◽

Differential Privacy ◽

Real Data ◽

The Novel ◽

Private Data ◽

Data Release ◽

Machine Learning Applications ◽

Order Of Magnitude ◽

Real World Datasets

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Download Full-text

Enhanced Language Model with Hybrid Knowledge Graph for Mathematical Topic Prediction

10.22541/au.163491250.03226531/v1 ◽

2021 ◽

Author(s):

Minghui Wu ◽

Canghong Jin ◽

Wenkang Hu ◽

Yabo Chen

Keyword(s):

Language Model ◽

Real Data ◽

Mathematical Concept ◽

Fine Tuning ◽

Knowledge Graph ◽

Mathematics Knowledge ◽

Hybrid Knowledge ◽

Model Set ◽

Set Up ◽

Mathematical Topic

Understanding mathematical topics is important for both educators and students to capture latent concepts of questions, evaluate study performance, and recommend content in online learning systems. Compared to traditional text classification, mathematical topic classification has several main challenges: (1) the length of mathematical questions is relatively short; (2) there are various representations of the same mathematical concept(i.e., calculations and application); (3) the content of question is complex including algebra, geometry, and calculus. In order to overcome these problems, we propose a framework that combines content tokens and mathematical knowledge concepts in whole procedures. We embed entities from mathematics knowledge graphs, integrate entities into tokens in a masked language model, set up semantic similarity-based tasks for next-sentence prediction, and fuse knowledge vectors and token vectors during the fine-tuning procedure. We also build a Chinese mathematical topic prediction dataset consisting of more than 70,000 mathematical questions with topics. Our experiments using real data demonstrate that our knowledge graph-based mathematical topic prediction model outperforms other state-of-the-art methods.

Download Full-text

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6289 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7839-7846

Author(s):

Junliang Guo ◽

Xu Tan ◽

Linli Xu ◽

Tao Qin ◽

Enhong Chen ◽

...

Keyword(s):

Machine Translation ◽

Fine Tuning ◽

Inference Process ◽

Neural Machine Translation ◽

Training Strategy ◽

Speed Up ◽

Good Improvement ◽

Tuning Process ◽

Translation Accuracy ◽

The Cost

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines.

Download Full-text

A 3D Complexity-Adaptive Approach to Exploit Sparsity in Elastic Wave Propagation

Geophysics ◽

10.1190/geo2020-0490.1 ◽

2021 ◽

pp. 1-64

Author(s):

Claudia Haindl ◽

Kuangdai Leng ◽

Tarje Nissen-Meyer

Keyword(s):

Degrees Of Freedom ◽

Computational Cost ◽

Parameter Tuning ◽

Performance Limits ◽

Absorbing Boundary ◽

Adaptive Approach ◽

Order Of Magnitude ◽

Speed Up ◽

Complex Settings ◽

Close Fit

We present an adaptive approach to seismic modeling by which the computational cost of a 3D simulation can be reduced while retaining resolution and accuracy. This Azimuthal Complexity Adaptation (ACA) approach relies upon the inherent smoothness of wavefields around the azimuth of a source-centered cylindrical coordinate system. Azimuthal oversampling is thereby detected and eliminated. The ACA method has recently been introduced as part of AxiSEM3D, an open-source solver for global seismology. We employ a generalization of this solver which can handle local-scale Cartesian models, and which features a combination of an absorbing boundary condition and a sponge boundary with automated parameter tuning. The ACA method is benchmarked against an established 3D method using a model featuring bathymetry and a salt body. We obtain a close fit where the models are implemented equally in both solvers and an expectedly poor fit otherwise, with the ACA method running an order of magnitude faster than the classic 3D method. Further, we present maps of maximum azimuthal wavenumbers that are created to facilitate azimuthal complexity adaptation. We show how these maps can be interpreted in terms of the 3D complexity of the wavefield and in terms of seismic resolution. The expected performance limits of the ACA method for complex 3D structures are tested on the SEG/EAGE salt model. In this case, ACA still reduces the overall degrees of freedom by 92% compared to a complexity-blind AxiSEM3D simulation. In comparison with the reference 3D method, we again find a close fit and a speed-up of a factor 7. We explore how the performance of ACA is affected by model smoothness by subjecting the SEG/EAGE salt model to Gaussian smoothing. This results in a doubling of the speed-up. ACA thus represents a convergent, versatile and efficient method for a variety of complex settings and scales.

Download Full-text

Proactive Rebalancing and Speed-Up Techniques for On-Demand High Capacity Ridesourcing Services

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2020.3016128 ◽

2020 ◽

pp. 1-8

Author(s):

Yang Liu ◽

Samitha Samaranayake

Keyword(s):

High Capacity ◽

On Demand ◽

Speed Up

Download Full-text

Accelerating numerical wave propagation using wavefield adapted meshes. Part I: forward and adjoint modelling

Geophysical Journal International ◽

10.1093/gji/ggaa058 ◽

2020 ◽

Vol 221 (3) ◽

pp. 1580-1590 ◽

Cited By ~ 3

Author(s):

M van Driel ◽

C Boehm ◽

L Krischer ◽

M Afanasiev

Keyword(s):

Wave Propagation ◽

Adaptive Mesh Refinement ◽

Waveform Inversion ◽

Computational Cost ◽

Model Space ◽

Adaptive Mesh ◽

Order Of Magnitude ◽

Speed Up ◽

Adjoint Modelling ◽

The Waves

SUMMARY An order of magnitude speed-up in finite-element modelling of wave propagation can be achieved by adapting the mesh to the anticipated space-dependent complexity and smoothness of the waves. This can be achieved by designing the mesh not only to respect the local wavelengths, but also the propagation direction of the waves depending on the source location, hence by anisotropic adaptive mesh refinement. Discrete gradients with respect to material properties as needed in full waveform inversion can still be computed exactly, but at greatly reduced computational cost. In order to do this, we explicitly distinguish the discretization of the model space from the discretization of the wavefield and derive the necessary expressions to map the discrete gradient into the model space. While the idea is applicable to any wave propagation problem that retains predictable smoothness in the solution, we highlight the idea of this approach with instructive 2-D examples of forward as well as inverse elastic wave propagation. Furthermore, we apply the method to 3-D global seismic wave simulations and demonstrate how meshes can be constructed that take advantage of high-order mappings from the reference coordinates of the finite elements to physical coordinates. Error level and speed-ups are estimated based on convergence tests with 1-D and 3-D models.

Download Full-text

Accelerated methods for direct computation of fusion alpha particle losses within, stellarator optimization

Journal of Plasma Physics ◽

10.1017/s0022377820000203 ◽

2020 ◽

Vol 86 (2) ◽

Author(s):

Christopher G. Albert ◽

Sergei V. Kasilov ◽

Winfried Kernbichler

Keyword(s):

Alpha Particle ◽

Symplectic Integrators ◽

Particle Loss ◽

Early Classification ◽

Statistical Computation ◽

Computational Speed ◽

Order Of Magnitude ◽

Speed Up ◽

Orbit Types ◽

Optimization Loop

Accelerated statistical computation of collisionless fusion alpha particle losses in stellarator configurations is presented based on direct guiding-centre orbit tracing. The approach relies on the combination of recently developed symplectic integrators in canonicalized magnetic flux coordinates and early classification into regular and chaotic orbit types. Only chaotic orbits have to be traced up to the end, as their behaviour is unpredictable. An implementation of this technique is provided in the code SIMPLE (symplectic integration methods for particle loss estimation, Albert et al., 2020b, doi:10.5281/zenodo.3666820). Reliable results were obtained for an ensemble of 1000 orbits in a quasi-isodynamic, a quasi-helical and a quasi-axisymmetric configuration. Overall, a computational speed up of approximately one order of magnitude is achieved compared to direct integration via adaptive Runge–Kutta methods. This reduces run times to the range of typical magnetic equilibrium computations and makes direct alpha particle loss computation adequate for use within a stellarator optimization loop.

Download Full-text