scholarly journals A scalable algorithm for clustering sequential data

Author(s):  
V. Guralnik ◽  
G. Karypis
1997 ◽  
Vol 36 (04/05) ◽  
pp. 356-359 ◽  
Author(s):  
M. Sekine ◽  
M. Ogawa ◽  
T. Togawa ◽  
Y. Fukui ◽  
T. Tamura

Abstract:In this study we have attempted to classify the acceleration signal, while walking both at horizontal level, and upstairs and downstairs, using wavelet analysis. The acceleration signal close to the body’s center of gravity was measured while the subjects walked in a corridor and up and down a stairway. The data for four steps were analyzed and the Daubecies 3 wavelet transform was applied to the sequential data. The variables to be discriminated were the waveforms related to levels -4 and -5. The sum of the square values at each step was compared at levels -4 and -5. Downstairs walking could be discriminated from other types of walking, showing the largest value for level -5. Walking at horizontal level was compared with upstairs walking for level -4. It was possible to discriminate the continuous dynamic responses to walking by the wavelet transform.


2020 ◽  
Vol 13 (5) ◽  
pp. 1020-1030
Author(s):  
Pradeep S. ◽  
Jagadish S. Kallimani

Background: With the advent of data analysis and machine learning, there is a growing impetus of analyzing and generating models on historic data. The data comes in numerous forms and shapes with an abundance of challenges. The most sorted form of data for analysis is the numerical data. With the plethora of algorithms and tools it is quite manageable to deal with such data. Another form of data is of categorical nature, which is subdivided into, ordinal (order wise) and nominal (number wise). This data can be broadly classified as Sequential and Non-Sequential. Sequential data analysis is easier to preprocess using algorithms. Objective: The challenge of applying machine learning algorithms on categorical data of nonsequential nature is dealt in this paper. Methods: Upon implementing several data analysis algorithms on such data, we end up getting a biased result, which makes it impossible to generate a reliable predictive model. In this paper, we will address this problem by walking through a handful of techniques which during our research helped us in dealing with a large categorical data of non-sequential nature. In subsequent sections, we will discuss the possible implementable solutions and shortfalls of these techniques. Results: The methods are applied to sample datasets available in public domain and the results with respect to accuracy of classification are satisfactory. Conclusion: The best pre-processing technique we observed in our research is one hot encoding, which facilitates breaking down the categorical features into binary and feeding it into an Algorithm to predict the outcome. The example that we took is not abstract but it is a real – time production services dataset, which had many complex variations of categorical features. Our Future work includes creating a robust model on such data and deploying it into industry standard applications.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 602-602
Author(s):  
Oliver Huxhold ◽  
Svenja Spuling ◽  
Susanne Wurm

Abstract In recent years many studies have shown that adults with more positive self-perceptions of aging (SPA) increase their likelihood of aging healthily. Other studies have documented historical changes in individual resources and contextual conditions associated with aging. We explored how these historical changes are reflected in birth-cohort differences in aging trajectories of two aspects of SPA – viewing aging as ongoing development or as increasing physical losses. Using large-scale cohort-sequential data assessed across 21 years (N ≈ 19,000), the analyses modeled birth-cohort differences in aging trajectories of SPA from 40 to 85 years of age. The results illustrated differential birth-cohort differences: Later-born cohorts may experience more potential for ongoing development with advancing age than earlier-born cohorts. However, later-born cohorts seem to view their own aging as more negative than earlier-born cohorts during their early forties but may associate their aging less with physical losses after the age of fifty.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Jianlei Zhang ◽  
Yukun Zeng ◽  
Binil Starly

AbstractData-driven approaches for machine tool wear diagnosis and prognosis are gaining attention in the past few years. The goal of our study is to advance the adaptability, flexibility, prediction performance, and prediction horizon for online monitoring and prediction. This paper proposes the use of a recent deep learning method, based on Gated Recurrent Neural Network architecture, including Long Short Term Memory (LSTM), which try to captures long-term dependencies than regular Recurrent Neural Network method for modeling sequential data, and also the mechanism to realize the online diagnosis and prognosis and remaining useful life (RUL) prediction with indirect measurement collected during the manufacturing process. Existing models are usually tool-specific and can hardly be generalized to other scenarios such as for different tools or operating environments. Different from current methods, the proposed model requires no prior knowledge about the system and thus can be generalized to different scenarios and machine tools. With inherent memory units, the proposed model can also capture long-term dependencies while learning from sequential data such as those collected by condition monitoring sensors, which means it can be accommodated to machine tools with varying life and increase the prediction performance. To prove the validity of the proposed approach, we conducted multiple experiments on a milling machine cutting tool and applied the model for online diagnosis and RUL prediction. Without loss of generality, we incorporate a system transition function and system observation function into the neural net and trained it with signal data from a minimally intrusive vibration sensor. The experiment results showed that our LSTM-based model achieved the best overall accuracy among other methods, with a minimal Mean Square Error (MSE) for tool wear prediction and RUL prediction respectively.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i857-i865
Author(s):  
Derrick Blakely ◽  
Eamon Collins ◽  
Ritambhara Singh ◽  
Andrew Norton ◽  
Jack Lanchantin ◽  
...  

Abstract Motivation Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task’s alphabet size. Results In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On multiple DNA transcription factor binding site prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in area under the ROC curve, while achieving average speedups in kernel computation of ∼100× and speedups of ∼800× for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks while achieving low variance. We then extend FastSK to 7 English-language medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. Availability and implementation Our algorithm is available as a Python package and as C++ source code at https://github.com/QData/FastSK Supplementary information Supplementary data are available at Bioinformatics online.


PAMM ◽  
2007 ◽  
Vol 7 (1) ◽  
pp. 1025201-1025202
Author(s):  
Radek KucÌŒera ◽  
Jaroslav Haslinger ◽  
Zdeněk Dostál

Sign in / Sign up

Export Citation Format

Share Document