A Knowledge Extraction Pipeline between Supervised and Unsupervised Machine Learning Using Gaussian Mixture Models for Anomaly Detection

2021 ◽  
Vol 15 (1) ◽  
pp. 1-17
Author(s):  
Reda Chefira ◽  
Said Rakrak
2020 ◽  
Vol 498 (4) ◽  
pp. 5498-5510
Author(s):  
P W Hatfield ◽  
I A Almosallam ◽  
M J Jarvis ◽  
N Adams ◽  
R A A Bowler ◽  
...  

ABSTRACT Wide-area imaging surveys are one of the key ways of advancing our understanding of cosmology, galaxy formation physics, and the large-scale structure of the Universe in the coming years. These surveys typically require calculating redshifts for huge numbers (hundreds of millions to billions) of galaxies – almost all of which must be derived from photometry rather than spectroscopy. In this paper, we investigate how using statistical models to understand the populations that make up the colour–magnitude distribution of galaxies can be combined with machine learning photometric redshift codes to improve redshift estimates. In particular, we combine the use of Gaussian mixture models with the high-performing machine-learning photo-z algorithm GPz and show that modelling and accounting for the different colour–magnitude distributions of training and test data separately can give improved redshift estimates, reduce the bias on estimates by up to a half, and speed up the run-time of the algorithm. These methods are illustrated using data from deep optical and near-infrared data in two separate deep fields, where training and test data of different colour–magnitude distributions are constructed from the galaxies with known spectroscopic redshifts, derived from several heterogeneous surveys.


Author(s):  
Thomas Dierckx ◽  
Jesse Davis ◽  
Wim Schoutens

AbstractThe theory of Narrative Economics suggests that narratives present in media influence market participants and drive economic events. In this chapter, we investigate how financial news narratives relate to movements in the CBOE Volatility Index. To this end, we first introduce an uncharted dataset where news articles are described by a set of financial keywords. We then perform topic modeling to extract news themes, comparing the canonical latent Dirichlet analysis to a technique combining doc2vec and Gaussian mixture models. Finally, using the state-of-the-art XGBoost (Extreme Gradient Boosted Trees) machine learning algorithm, we show that the obtained news features outperform a simple baseline when predicting CBOE Volatility Index movements on different time horizons.


2017 ◽  
Vol 34 (10) ◽  
pp. 1399-1414 ◽  
Author(s):  
Wanxia Deng ◽  
Huanxin Zou ◽  
Fang Guo ◽  
Lin Lei ◽  
Shilin Zhou ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document