scholarly journals Outlier Detection in Large-Scale Traffic Data by Naïve Bayes Method and Gaussian Mixture Model Method

2017 ◽  
Vol 2017 (9) ◽  
pp. 73-78 ◽  
Author(s):  
Philip Lam ◽  
Lili Wang ◽  
HenryY.T. Ngan ◽  
NelsonH.C. Yung ◽  
AnthonyG.O. Yeh
Symmetry ◽  
2020 ◽  
Vol 13 (1) ◽  
pp. 19
Author(s):  
Hsiuying Wang

High-dimensional data recognition problem based on the Gaussian Mixture model has useful applications in many area, such as audio signal recognition, image analysis, and biological evolution. The expectation-maximization algorithm is a popular approach to the derivation of the maximum likelihood estimators of the Gaussian mixture model (GMM). An alternative solution is to adopt a generalized Bayes estimator for parameter estimation. In this study, an estimator based on the generalized Bayes approach is established. A simulation study shows that the proposed approach has a performance competitive to that of the conventional method in high-dimensional Gaussian mixture model recognition. We use a musical data example to illustrate this recognition problem. Suppose that we have audio data of a piece of music and know that the music is from one of four compositions, but we do not know exactly which composition it comes from. The generalized Bayes method shows a higher average recognition rate than the conventional method. This result shows that the generalized Bayes method is a competitor to the conventional method in this real application.


2019 ◽  
Author(s):  
Guohua Gao ◽  
Hao Jiang ◽  
Chaohui Chen ◽  
Jeroen C. Vink ◽  
Yaakoub El Khamra ◽  
...  

2021 ◽  
Author(s):  
Milana Gataric ◽  
Jun Sung Park ◽  
Tong Li ◽  
Vasy Vaskivskyi ◽  
Jessica Svedlund ◽  
...  

Realising the full potential of novel image-based spatial transcriptomic (IST) technologies requires robust and accurate algorithms for decoding the hundreds of thousand fluorescent signals each derived from single molecules of mRNA. In this paper, we introduce PoSTcode, a probabilistic method for transcript decoding from cyclic multi-channel images, whose effectiveness is demonstrated on multiple large-scale datasets generated using different versions of the in situ sequencing protocols. PoSTcode is based on a re-parametrised matrix-variate Gaussian mixture model designed to account for correlated noise across fluorescence channels and imaging cycles. PoSTcode is shown to recover up to 50% more confidently decoded molecules while simultaneously decreasing transcript mislabeling when compared to existing decoding techniques. In addition, we demonstrate its increased stability to various types of noise and tuning parameters, which makes this new approach reliable and easy to use in practice. Lastly, we show that PoSTcode produces fewer doublet signals compared to a pixel-based decoding algorithm.


2019 ◽  
Vol 1 (2) ◽  
pp. 145-153
Author(s):  
Jin-jun Tang ◽  
Jin Hu ◽  
Yi-wei Wang ◽  
He-lai Huang ◽  
Yin-hai Wang

Abstract The data collected from taxi vehicles using the global positioning system (GPS) traces provides abundant temporal-spatial information, as well as information on the activity of drivers. Using taxi vehicles as mobile sensors in road networks to collect traffic information is an important emerging approach in efforts to relieve congestion. In this paper, we present a hybrid model for estimating driving paths using a density-based spatial clustering of applications with noise (DBSCAN) algorithm and a Gaussian mixture model (GMM). The first step in our approach is to extract the locations from pick-up and drop-off records (PDR) in taxi GPS equipment. Second, the locations are classified into different clusters using DBSCAN. Two parameters (density threshold and radius) are optimized using real trace data recorded from 1100 drivers. A GMM is also utilized to estimate a significant number of locations; the parameters of the GMM are optimized using an expectation-maximum (EM) likelihood algorithm. Finally, applications are used to test the effectiveness of the proposed model. In these applications, locations distributed in two regions (a residential district and a railway station) are clustered and estimated automatically.


Sign in / Sign up

Export Citation Format

Share Document