Trend Extraction from High Dimensional Stock News based on Markov Chain

Author(s):  
Ei Thwe Khaing ◽  
Myint Myint Thein ◽  
Myint Myint Lwin
2019 ◽  
Author(s):  
Richard Scalzo ◽  
David Kohn ◽  
Hugo Olierook ◽  
Gregory Houseman ◽  
Rohitash Chandra ◽  
...  

Abstract. The rigorous quantification of uncertainty in geophysical inversions is a challenging problem. Inversions are often ill-posed and the likelihood surface may be multimodal; properties of any single mode become inadequate uncertainty measures, and sampling methods become inefficient for irregular posteriors or high-dimensional parameter spaces. We explore the influences of different choices made by the practitioner on the efficiency and accuracy of Bayesian geophysical inversion methods that rely on Markov chain Monte Carlo sampling to assess uncertainty, using a multi-sensor inversion of the three-dimensional structure and composition of a region in the Cooper Basin of South Australia as a case study. The inversion is performed using an updated version of the Obsidian distributed inversion software. We find that the posterior for this inversion has complex local covariance structure, hindering the efficiency of adaptive sampling methods that adjust the proposal based on the chain history. Within the context of a parallel-tempered Markov chain Monte Carlo scheme for exploring high-dimensional multi-modal posteriors, a preconditioned Crank-Nicholson proposal outperforms more conventional forms of random walk. Aspects of the problem setup, such as priors on petrophysics or on 3-D geological structure, affect the shape and separation of posterior modes, influencing sampling performance as well as the inversion results. Use of uninformative priors on sensor noise can improve inversion results by enabling optimal weighting among multiple sensors even if noise levels are uncertain. Efficiency could be further increased by using posterior gradient information within proposals, which Obsidian does not currently support, but which could be emulated using posterior surrogates.


2020 ◽  
Vol 35 (24) ◽  
pp. 1950142
Author(s):  
Allen Caldwell ◽  
Philipp Eller ◽  
Vasyl Hafych ◽  
Rafael Schick ◽  
Oliver Schulz ◽  
...  

Numerically estimating the integral of functions in high dimensional spaces is a nontrivial task. A oft-encountered example is the calculation of the marginal likelihood in Bayesian inference, in a context where a sampling algorithm such as a Markov Chain Monte Carlo provides samples of the function. We present an Adaptive Harmonic Mean Integration (AHMI) algorithm. Given samples drawn according to a probability distribution proportional to the function, the algorithm will estimate the integral of the function and the uncertainty of the estimate by applying a harmonic mean estimator to adaptively chosen regions of the parameter space. We describe the algorithm and its mathematical properties, and report the results using it on multiple test cases.


Biometrika ◽  
2020 ◽  
Vol 107 (4) ◽  
pp. 1005-1012 ◽  
Author(s):  
Deborshee Sen ◽  
Matthias Sachs ◽  
Jianfeng Lu ◽  
David B Dunson

Summary Classification with high-dimensional data is of widespread interest and often involves dealing with imbalanced data. Bayesian classification approaches are hampered by the fact that current Markov chain Monte Carlo algorithms for posterior computation become inefficient as the number $p$ of predictors or the number $n$ of subjects to classify gets large, because of the increasing computational time per step and worsening mixing rates. One strategy is to employ a gradient-based sampler to improve mixing while using data subsamples to reduce the per-step computational complexity. However, the usual subsampling breaks down when applied to imbalanced data. Instead, we generalize piecewise-deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch subsampling. These maintain the correct stationary distribution with arbitrarily small subsamples and substantially outperform current competitors. We provide theoretical support for the proposed approach and demonstrate its performance gains in simulated data examples and an application to cancer data.


Sign in / Sign up

Export Citation Format

Share Document