Season- and Trend-aware Symbolic Approximation for Accurate and Efficient Time Series Matching

Datenbank-Spektrum ◽

10.1007/s13222-021-00389-5 ◽

2021 ◽

Author(s):

Lars Kegel ◽

Claudio Hartmann ◽

Maik Thiele ◽

Wolfgang Lehner

Keyword(s):

Time Series ◽

State Of The Art ◽

Dimensional Space ◽

Symbolic Aggregate Approximation ◽

Current State ◽

Optimal Representation ◽

Symbolic Approximation ◽

Low Dimensional ◽

Deterministic Behavior ◽

Support Time

AbstractProcessing and analyzing time series datasets have become a central issue in many domains requiring data management systems to support time series as a native data type. A core access primitive of time series is matching, which requires efficient algorithms on-top of appropriate representations like the symbolic aggregate approximation (SAX) representing the current state of the art. This technique reduces a time series to a low-dimensional space by segmenting it and discretizing each segment into a small symbolic alphabet. Unfortunately, SAX ignores the deterministic behavior of time series such as cyclical repeating patterns or a trend component affecting all segments, which may lead to a sub-optimal representation accuracy. We therefore introduce a novel season- and a trend-aware symbolic approximation and demonstrate an improved representation accuracy without increasing the memory footprint. Most importantly, our techniques also enable a more efficient time series matching by providing a match up to three orders of magnitude faster than SAX.

Download Full-text

Unsupervised Deep Video Hashing with Balanced Rotation

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/429 ◽

2017 ◽

Cited By ~ 10

Author(s):

Gengshen Wu ◽

Li Liu ◽

Yuchen Guo ◽

Guiguang Ding ◽

Jungong Han ◽

...

Keyword(s):

State Of The Art ◽

Dimensional Space ◽

Image Hashing ◽

Neighborhood Structure ◽

Function Learning ◽

Video Hashing ◽

Real World Datasets ◽

Low Dimensional ◽

Balanced Code ◽

Hash Codes

Recently, hashing video contents for fast retrieval has received increasing attention due to the enormous growth of online videos. As the extension of image hashing techniques, traditional video hashing methods mainly focus on seeking the appropriate video features but pay little attention to how the video-specific features can be leveraged to achieve optimal binarization. In this paper, an end-to-end hashing framework, namely Unsupervised Deep Video Hashing (UDVH), is proposed, where feature extraction, balanced code learning and hash function learning are integrated and optimized in a self-taught manner. Particularly, distinguished from previous work, our framework enjoys two novelties: 1) an unsupervised hashing method that integrates the feature clustering and feature binarization, enabling the neighborhood structure to be preserved in the binary space; 2) a smart rotation applied to the video-specific features that are widely spread in the low-dimensional space such that the variance of dimensions can be balanced, thus generating more effective hash codes. Extensive experiments have been performed on two real-world datasets and the results demonstrate its superiority, compared to the state-of-the-art video hashing methods. To bootstrap further developments, the source code will be made publically available.

Download Full-text

SCDRHA: A scRNA-Seq Data Dimensionality Reduction Algorithm Based on Hierarchical Autoencoder

Frontiers in Genetics ◽

10.3389/fgene.2021.733906 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jianping Zhao ◽

Na Wang ◽

Haiyun Wang ◽

Chunhou Zheng ◽

Yansen Su

Keyword(s):

Dimensionality Reduction ◽

Data Visualization ◽

State Of The Art ◽

Dimensional Space ◽

High Dimensional ◽

Reduction Algorithm ◽

Cell Clustering ◽

Data Dimensionality Reduction ◽

Single Cell Rna Sequencing ◽

Low Dimensional

Dimensionality reduction of high-dimensional data is crucial for single-cell RNA sequencing (scRNA-seq) visualization and clustering. One prominent challenge in scRNA-seq studies comes from the dropout events, which lead to zero-inflated data. To address this issue, in this paper, we propose a scRNA-seq data dimensionality reduction algorithm based on a hierarchical autoencoder, termed SCDRHA. The proposed SCDRHA consists of two core modules, where the first module is a deep count autoencoder (DCA) that is used to denoise data, and the second module is a graph autoencoder that projects the data into a low-dimensional space. Experimental results demonstrate that SCDRHA has better performance than existing state-of-the-art algorithms on dimension reduction and noise reduction in five real scRNA-seq datasets. Besides, SCDRHA can also dramatically improve the performance of data visualization and cell clustering.

Download Full-text

Crop Rotation Modeling for Deep Learning-Based Parcel Classification from Satellite Time Series

Remote Sensing ◽

10.3390/rs13224599 ◽

2021 ◽

Vol 13 (22) ◽

pp. 4599

Author(s):

Félix Quinton ◽

Loic Landrieu

Keyword(s):

Time Series ◽

Deep Learning ◽

Crop Rotation ◽

Large Scale ◽

State Of The Art ◽

Crop Rotations ◽

Learning Approach ◽

Type Mapping ◽

Current State ◽

Crop Type

While annual crop rotations play a crucial role for agricultural optimization, they have been largely ignored for automated crop type mapping. In this paper, we take advantage of the increasing quantity of annotated satellite data to propose to model simultaneously the inter- and intra-annual agricultural dynamics of yearly parcel classification with a deep learning approach. Along with simple training adjustments, our model provides an improvement of over 6.3% mIoU over the current state-of-the-art of crop classification, and a reduction of over 21% of the error rate. Furthermore, we release the first large-scale multi-year agricultural dataset with over 300,000 annotated parcels.

Download Full-text

Prediction of Disease-related microRNAs through Integrating Attributes of microRNA Nodes and Multiple Kinds of Connecting Edges

Molecules ◽

10.3390/molecules24173099 ◽

2019 ◽

Vol 24 (17) ◽

pp. 3099 ◽

Cited By ~ 3

Author(s):

Xuan ◽

Li ◽

Zhang ◽

Song

Keyword(s):

State Of The Art ◽

Dimensional Space ◽

Nonnegative Matrix ◽

Superior Performance ◽

Pancreatic Cancers ◽

Node Attribute ◽

Disease Associations ◽

Node Attributes ◽

Novel Method ◽

Low Dimensional

Identifying disease-associated microRNAs (disease miRNAs) contributes to the understanding of disease pathogenesis. Most previous computational biology studies focused on multiple kinds of connecting edges of miRNAs and diseases, including miRNA–miRNA similarities, disease–disease similarities, and miRNA–disease associations. Few methods exploited the node attribute information related to miRNA family and cluster. The previous methods do not completely consider the sparsity of node attributes. Additionally, it is challenging to deeply integrate the node attributes of miRNAs and the similarities and associations related to miRNAs and diseases. In the present study, we propose a novel method, known as MDAPred, based on nonnegative matrix factorization to predict candidate disease miRNAs. MDAPred integrates the node attributes of miRNAs and the related similarities and associations of miRNAs and diseases. Since a miRNA is typically subordinate to a family or a cluster, the node attributes of miRNAs are sparse. Similarly, the data for miRNA and disease similarities are sparse. Projecting the miRNA and disease similarities and miRNA node attributes into a common low-dimensional space contributes to estimating miRNA-disease associations. Simultaneously, the possibility that a miRNA is associated with a disease depends on the miRNA’s neighbour information. Therefore, MDAPred deeply integrates projections of multiple kinds of connecting edges, projections of miRNAs node attributes, and neighbour information of miRNAs. The cross-validation results showed that MDAPred achieved superior performance compared to other state-of-the-art methods for predicting disease-miRNA associations. MDAPred can also retrieve more actual miRNA-disease associations at the top of prediction results, which is very important for biologists. Additionally, case studies of breast, lung, and pancreatic cancers further confirmed the ability of MDAPred to discover potential miRNA–disease associations.

Download Full-text

HIVE-COTE 2.0: a new meta ensemble for time series classification

Machine Learning ◽

10.1007/s10994-021-06057-9 ◽

2021 ◽

Author(s):

Matthew Middlehurst ◽

James Large ◽

Michael Flynn ◽

Jason Lines ◽

Aaron Bostrom ◽

...

Keyword(s):

Time Series ◽

State Of The Art ◽

Bag Of Words ◽

Time Series Classification ◽

Current State ◽

Multiple Domains ◽

Over Time

AbstractThe Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. HIVE-COTE forms its ensemble from classifiers of multiple domains, including phase-independent shapelets, bag-of-words based dictionaries and phase-dependent intervals. Since it was first proposed in 2016, the algorithm has remained state of the art for accuracy on the UCR time series classification archive. Over time it has been incrementally updated, culminating in its current state, HIVE-COTE 1.0. During this time a number of algorithms have been proposed which match the accuracy of HIVE-COTE. We propose comprehensive changes to the HIVE-COTE algorithm which significantly improve its accuracy and usability, presenting this upgrade as HIVE-COTE 2.0. We introduce two novel classifiers, the Temporal Dictionary Ensemble and Diverse Representation Canonical Interval Forest, which replace existing ensemble members. Additionally, we introduce the Arsenal, an ensemble of ROCKET classifiers as a new HIVE-COTE 2.0 constituent. We demonstrate that HIVE-COTE 2.0 is significantly more accurate on average than the current state of the art on 112 univariate UCR archive datasets and 26 multivariate UEA archive datasets.

Download Full-text

Metric Multidimensional Scaling for Large Single-Cell Data Sets using Neural Networks

10.1101/2021.06.24.449725 ◽

2021 ◽

Author(s):

Stefan Canzar ◽

Van Hoan Do ◽

Slobodan Jelic ◽

Soeren Laue ◽

Domagoj Matijevic ◽

...

Keyword(s):

Multidimensional Scaling ◽

Single Cell ◽

State Of The Art ◽

Dimensional Space ◽

Linear Mapping ◽

Alternative Methods ◽

Dimensional Euclidean Space ◽

Data Sets ◽

Metric Multidimensional Scaling ◽

Low Dimensional

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a neural network based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.

Download Full-text

Convolutional Gaussian Embeddings for Personalized Recommendation with Uncertainty

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/367 ◽

2019 ◽

Cited By ~ 2

Author(s):

Junyang Jiang ◽

Deqing Yang ◽

Yanghua Xiao ◽

Chenlu Shen

Keyword(s):

State Of The Art ◽

Dimensional Space ◽

Monte Carlo Sampling ◽

Superior Performance ◽

Personalized Recommendation ◽

Latent Features ◽

Benchmark Datasets ◽

Low Dimensional ◽

Uncertain Preferences ◽

Candidate Item

Most of existing embedding based recommendation models use embeddings (vectors) to represent users and items which contain latent features of users and items. Each of such embeddings corresponds to a single fixed point in low-dimensional space, thus fails to precisely represent the users/items with uncertainty which are often observed in recommender systems. Addressing this problem, we propose a unified deep recommendation framework employing Gaussian embeddings, which are proven adaptive to uncertain preferences exhibited by some users, resulting in better user representations and recommendation performance. Furthermore, our framework adopts Monte-Carlo sampling and convolutional neural networks to compute the correlation between the objective user and the candidate item, based on which precise recommendations are achieved. Our extensive experiments on two benchmark datasets not only justify that our proposed Gaussian embeddings capture the uncertainty of users very well, but also demonstrate its superior performance over the state-of-the-art recommendation models.

Download Full-text

Remote Heart Rate Measurement through Broadband Video via Stochastic Bayesian Estimation

Vision Letters ◽

10.15353/vsnl.v1i1.43 ◽

2015 ◽

Vol 1 (1) ◽

Cited By ~ 3

Author(s):

Brendan Chwyl ◽

Audrey G. Chung ◽

Jason Deglint ◽

Alexander Wong ◽

David Clausi

Keyword(s):

Heart Rate ◽

Time Series ◽

Posterior Probability ◽

State Of The Art ◽

Frequency Domain Analysis ◽

Rate Measurement ◽

Current State ◽

Skin Erythema ◽

Novel Method ◽

Improved Accuracy

A novel method for remote heart rate sensing via standard broadband video is proposed. Points are stochastically sampled from the cheek region and tracked throughout the video, producing a set of skin erythema time series. From these observations, a photoplethysmogram (PPG) is estimated via Bayesian minimization, with the required posterior probability estimated through an importanceweighted Monte Carlo approach. From the estimated PPG, an estimated heart rate is produced through frequency domain analysis. Results indicate improved accuracy over current state of the art methods.

Download Full-text

TIME SERIES FOR FAULT DETECTION IN AN INDUSTRIAL PILOT PLANT

International Journal of Modern Physics B ◽

10.1142/s0217979212460034 ◽

2012 ◽

Vol 26 (25) ◽

pp. 1246003

Author(s):

ANTONIO MORÁN ◽

JUAN J. FUERTES ◽

SERAFÍN ALONSO ◽

CARLOS DEL CANTO ◽

MANUEL DOMÍNGUEZ

Keyword(s):

Time Series ◽

Dimensional Space ◽

Industrial Processes ◽

Self Organizing Map ◽

Data Mining Technique ◽

Mining Technique ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques ◽

Low Dimensional ◽

Analysis Of Time Series

Forecasting the evolution of industrial processes can be useful to discover faults. Several techniques based on analysis of time series are used to forecast the evolution of certain critical variables; however, the amount of variables makes difficult the analysis. In this way, the use of dimensionality reduction techniques such as the SOM (Self-Organizing Map) allows us to work with less data to determine the evolution of the process. SOM is a data mining technique widely used for supervision and monitoring. Since the SOM is projects data from a high dimensional space into a 2-D, the SOM reduces the number of variables. Thus, time series with the variables of the low dimensional projection can be created to make easier the prediction of future values in order to detect faults.

Download Full-text

Fast and precise single-cell data analysis using a hierarchical autoencoder

Nature Communications ◽

10.1038/s41467-021-21312-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Duc Tran ◽

Hung Nguyen ◽

Bang Tran ◽

Carlo La Vecchia ◽

Hung N. Luu ◽

...

Keyword(s):

Single Cell ◽

State Of The Art ◽

Dimensional Space ◽

Cell Decomposition ◽

Excess Noise ◽

Analysis Framework ◽

Extensive Analysis ◽

Low Dimensional ◽

Cell Segregation ◽

Cell Data

AbstractA primary challenge in single-cell RNA sequencing (scRNA-seq) studies comes from the massive amount of data and the excess noise level. To address this challenge, we introduce an analysis framework, named single-cell Decomposition using Hierarchical Autoencoder (scDHA), that reliably extracts representative information of each cell. The scDHA pipeline consists of two core modules. The first module is a non-negative kernel autoencoder able to remove genes or components that have insignificant contributions to the part-based representation of the data. The second module is a stacked Bayesian autoencoder that projects the data onto a low-dimensional space (compressed). To diminish the tendency to overfit of neural networks, we repeatedly perturb the compressed space to learn a more generalized representation of the data. In an extensive analysis, we demonstrate that scDHA outperforms state-of-the-art techniques in many research sub-fields of scRNA-seq analysis, including cell segregation through unsupervised learning, visualization of transcriptome landscape, cell classification, and pseudo-time inference.

Download Full-text