scholarly journals A Unifying Framework for Analysis of Spatial-Temporal Event Sequence Similarity and Its Applications

2021 ◽  
Vol 10 (9) ◽  
pp. 594
Author(s):  
Fuyu Xu ◽  
Kate Beard

Measures of similarity or differences between data objects are applied frequently in geography, biology, computer science, linguistics, logic, business analytics, and statistics, among other fields. This work focuses on event sequence similarity among event sequences extracted from time series observed at spatially deployed monitoring locations with the aim of enhancing the understanding of process similarity over time and geospatial locations. We present a framework for a novel matrix-based spatiotemporal event sequence representation that unifies punctual and interval-based representation of events. This unified representation of spatiotemporal event sequences (STES) supports different event data types and provides support for data mining and sequence classification and clustering. The similarity measure is based on the Jaccard index with temporal order constraints and accommodates different event data types. The approach is demonstrated through simulated data examples and the performance of the similarity measures is evaluated with a k-nearest neighbor algorithm (k-NN) classification test on synthetic datasets. As a case study, we demonstrate the use of these similarity measures in a spatiotemporal analysis of event sequences extracted from space time series of a water quality monitoring system.

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ali A. Amer ◽  
Hassan I. Abdalla

Abstract Similarity measures have long been utilized in information retrieval and machine learning domains for multi-purposes including text retrieval, text clustering, text summarization, plagiarism detection, and several other text-processing applications. However, the problem with these measures is that, until recently, there has never been one single measure recorded to be highly effective and efficient at the same time. Thus, the quest for an efficient and effective similarity measure is still an open-ended challenge. This study, in consequence, introduces a new highly-effective and time-efficient similarity measure for text clustering and classification. Furthermore, the study aims to provide a comprehensive scrutinization for seven of the most widely used similarity measures, mainly concerning their effectiveness and efficiency. Using the K-nearest neighbor algorithm (KNN) for classification, the K-means algorithm for clustering, and the bag of word (BoW) model for feature selection, all similarity measures are carefully examined in detail. The experimental evaluation has been made on two of the most popular datasets, namely, Reuters-21 and Web-KB. The obtained results confirm that the proposed set theory-based similarity measure (STB-SM), as a pre-eminent measure, outweighs all state-of-art measures significantly with regards to both effectiveness and efficiency.


2019 ◽  
Vol 116 ◽  
pp. 35-47 ◽  
Author(s):  
Joel Fredrickson ◽  
Michael Mannino ◽  
Omar Alqahtani ◽  
Farnoush Banaei-Kashani

2019 ◽  
Vol 88 ◽  
pp. 506-517 ◽  
Author(s):  
Izaskun Oregi ◽  
Aritz Pérez ◽  
Javier Del Ser ◽  
Jose A. Lozano

Geophysics ◽  
2019 ◽  
Vol 84 (2) ◽  
pp. O39-O47 ◽  
Author(s):  
Ryan Smith ◽  
Tapan Mukerji ◽  
Tony Lupo

Predicting well production in unconventional oil and gas settings is challenging due to the combined influence of engineering, geologic, and geophysical inputs on well productivity. We have developed a machine-learning workflow that incorporates geophysical and geologic data, as well as engineering completion parameters, into a model for predicting well production. The study area is in southwest Texas in the lower Eagle Ford Group. We make use of a time-series method known as functional principal component analysis to summarize the well-production time series. Next, we use random forests, a machine-learning regression technique, in combination with our summarized well data to predict the full time series of well production. The inputs to this model are geologic, geophysical, and engineering data. We are then able to predict the well-production time series, with 65%–76% accuracy. This method incorporates disparate data types into a robust, predictive model that predicts well production in unconventional resources.


Author(s):  
Kun Xie ◽  
Kang Liu ◽  
Haque A K Alvi ◽  
Yuehui Chen ◽  
Shuzhen Wang ◽  
...  

Copy number variation (CNV) is a well-known type of genomic mutation that is associated with the development of human cancer diseases. Detection of CNVs from the human genome is a crucial step for the pipeline of starting from mutation analysis to cancer disease diagnosis and treatment. Next-generation sequencing (NGS) data provides an unprecedented opportunity for CNVs detection at the base-level resolution, and currently, many methods have been developed for CNVs detection using NGS data. However, due to the intrinsic complexity of CNVs structures and NGS data itself, accurate detection of CNVs still faces many challenges. In this paper, we present an alternative method, called KNNCNV (K-Nearest Neighbor based CNV detection), for the detection of CNVs using NGS data. Compared to current methods, KNNCNV has several distinctive features: 1) it assigns an outlier score to each genome segment based solely on its first k nearest-neighbor distances, which is not only easy to extend to other data types but also improves the power of discovering CNVs, especially the local CNVs that are likely to be masked by their surrounding regions; 2) it employs the variational Bayesian Gaussian mixture model (VBGMM) to transform these scores into a series of binary labels without a user-defined threshold. To evaluate the performance of KNNCNV, we conduct both simulation and real sequencing data experiments and make comparisons with peer methods. The experimental results show that KNNCNV could derive better performance than others in terms of F1-score.


2020 ◽  
Vol 34 (01) ◽  
pp. 173-180
Author(s):  
Zhen Pan ◽  
Zhenya Huang ◽  
Defu Lian ◽  
Enhong Chen

Many events occur in real-world and social networks. Events are related to the past and there are patterns in the evolution of event sequences. Understanding the patterns can help us better predict the type and arriving time of the next event. In the literature, both feature-based approaches and generative approaches are utilized to model the event sequence. Feature-based approaches extract a variety of features, and train a regression or classification model to make a prediction. Yet, their performance is dependent on the experience-based feature exaction. Generative approaches usually assume the evolution of events follow a stochastic point process (e.g., Poisson process or its complexer variants). However, the true distribution of events is never known and the performance depends on the design of stochastic process in practice. To solve the above challenges, in this paper, we present a novel probabilistic generative model for event sequences. The model is termed Variational Event Point Process (VEPP). Our model introduces variational auto-encoder to event sequence modeling that can better use the latent information and capture the distribution over inter-arrival time and types of event sequences. Experiments on real-world datasets prove effectiveness of our proposed model.


Sign in / Sign up

Export Citation Format

Share Document