A scalable algorithm for clustering sequential data

Abstract:In this study we have attempted to classify the acceleration signal, while walking both at horizontal level, and upstairs and downstairs, using wavelet analysis. The acceleration signal close to the body’s center of gravity was measured while the subjects walked in a corridor and up and down a stairway. The data for four steps were analyzed and the Daubecies 3 wavelet transform was applied to the sequential data. The variables to be discriminated were the waveforms related to levels -4 and -5. The sum of the square values at each step was compared at levels -4 and -5. Downstairs walking could be discriminated from other types of walking, showing the largest value for level -5. Walking at horizontal level was compared with upstairs walking for level -4. It was possible to discriminate the continuous dynamic responses to walking by the wavelet transform.

Download Full-text

Comparative study of sequential data assimilation methods for the Kuramoto-Sivashinsky equation

AIAA Scitech 2021 Forum ◽

10.2514/6.2021-1749 ◽

2021 ◽

Author(s):

Suraj A. Pawar ◽

Omer San

Keyword(s):

Data Assimilation ◽

Comparative Study ◽

Sequential Data ◽

Sivashinsky Equation ◽

Sequential Data Assimilation

Download Full-text

Machine Learning Based Predictive Action on Categorical Non-Sequential Data

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190417150421 ◽

2020 ◽

Vol 13 (5) ◽

pp. 1020-1030

Author(s):

Pradeep S. ◽

Jagadish S. Kallimani

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Categorical Data ◽

Numerical Data ◽

Processing Technique ◽

Machine Learning Algorithms ◽

Sequential Data ◽

Industry Standard ◽

Robust Model ◽

Future Work

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.

Download Full-text

Historical Changes in Aging Trajectories of Two Aspects of Self-Perceptions of Aging

Innovation in Aging ◽

10.1093/geroni/igaa057.2028 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 602-602

Author(s):

Oliver Huxhold ◽

Svenja Spuling ◽

Susanne Wurm

Keyword(s):

Birth Cohort ◽

Large Scale ◽

Sequential Data ◽

Historical Changes ◽

Self Perceptions ◽

Cohort Differences ◽

Ongoing Development ◽

Perceptions Of Aging

Abstract In recent years many studies have shown that adults with more positive self-perceptions of aging (SPA) increase their likelihood of aging healthily. Other studies have documented historical changes in individual resources and contextual conditions associated with aging. We explored how these historical changes are reflected in birth-cohort differences in aging trajectories of two aspects of SPA – viewing aging as ongoing development or as increasing physical losses. Using large-scale cohort-sequential data assessed across 21 years (N ≈ 19,000), the analyses modeled birth-cohort differences in aging trajectories of SPA from 40 to 85 years of age. The results illustrated differential birth-cohort differences: Later-born cohorts may experience more potential for ongoing development with advancing age than earlier-born cohorts. However, later-born cohorts seem to view their own aging as more negative than earlier-born cohorts during their early forties but may associate their aging less with physical losses after the age of fifty.

Download Full-text

Conditional Random Fields for Multiview Sequential Data Modeling

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3041591 ◽

2020 ◽

pp. 1-12

Author(s):

Shiliang Sun ◽

Ziang Dong ◽

Jing Zhao

Keyword(s):

Random Fields ◽

Conditional Random Fields ◽

Data Modeling ◽

Sequential Data

Download Full-text

Recurrent neural networks with long term temporal dependencies in machine tool wear diagnosis and prognosis

SN Applied Sciences ◽

10.1007/s42452-021-04427-5 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Jianlei Zhang ◽

Yukun Zeng ◽

Binil Starly

Keyword(s):

Neural Network ◽

Tool Wear ◽

Machine Tool ◽

Recurrent Neural Network ◽

Machine Tools ◽

Prediction Performance ◽

Sequential Data ◽

Diagnosis And Prognosis ◽

Proposed Model

AbstractData-driven approaches for machine tool wear diagnosis and prognosis are gaining attention in the past few years. The goal of our study is to advance the adaptability, flexibility, prediction performance, and prediction horizon for online monitoring and prediction. This paper proposes the use of a recent deep learning method, based on Gated Recurrent Neural Network architecture, including Long Short Term Memory (LSTM), which try to captures long-term dependencies than regular Recurrent Neural Network method for modeling sequential data, and also the mechanism to realize the online diagnosis and prognosis and remaining useful life (RUL) prediction with indirect measurement collected during the manufacturing process. Existing models are usually tool-specific and can hardly be generalized to other scenarios such as for different tools or operating environments. Different from current methods, the proposed model requires no prior knowledge about the system and thus can be generalized to different scenarios and machine tools. With inherent memory units, the proposed model can also capture long-term dependencies while learning from sequential data such as those collected by condition monitoring sensors, which means it can be accommodated to machine tools with varying life and increase the prediction performance. To prove the validity of the proposed approach, we conducted multiple experiments on a milling machine cutting tool and applied the model for online diagnosis and RUL prediction. Without loss of generality, we incorporate a system transition function and system observation function into the neural net and trained it with signal data from a minimally intrusive vibration sensor. The experiment results showed that our LSTM-based model achieved the best overall accuracy among other methods, with a minimal Mean Square Error (MSE) for tool wear prediction and RUL prediction respectively.

Download Full-text

FastSK: fast sequence analysis with gapped string kernels

Bioinformatics ◽

10.1093/bioinformatics/btaa817 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i857-i865

Author(s):

Derrick Blakely ◽

Eamon Collins ◽

Ritambhara Singh ◽

Andrew Norton ◽

Jack Lanchantin ◽

...

Keyword(s):

Sequence Analysis ◽

Dna Sequences ◽

English Language ◽

Computation Time ◽

Entity Recognition ◽

Supplementary Information ◽

Support Vector ◽

Homology Detection ◽

Scalable Algorithm ◽

String Kernels

Abstract Motivation Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task’s alphabet size. Results In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On multiple DNA transcription factor binding site prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in area under the ROC curve, while achieving average speedups in kernel computation of ∼100× and speedups of ∼800× for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks while achieving low variance. We then extend FastSK to 7 English-language medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. Availability and implementation Our algorithm is available as a Python package and as C++ source code at https://github.com/QData/FastSK Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text