scholarly journals Information Processing on Compressed Data

2021 ◽  
pp. 89-104
Author(s):  
Yoshimasa Takabatake ◽  
Tomohiro I ◽  
Hiroshi Sakamoto

AbstractWe survey our recent work related to information processing on compressed strings. Note that a “string” here contains any fixed-length sequence of symbols and therefore includes not only ordinary text but also a wide range of data, such as pixel sequences and time-series data. Over the past two decades, a variety of algorithms and their applications have been proposed for compressed information processing. In this survey, we mainly focus on two problems: recompression and privacy-preserving computation over compressed strings. Recompression is a framework in which algorithms transform a given compressed data into another compressed format without decompression. Recent studies have shown that a higher compression ratio can be achieved at lower cost by using an appropriate recompression algorithm such as preprocessing. Furthermore, various privacy-preserving computation models have been proposed for information retrieval, similarity computation, and pattern mining.

Author(s):  
Chang Xu ◽  
Run Yin ◽  
Liehuang Zhu ◽  
Chuan Zhang ◽  
Can Zhang ◽  
...  

Author(s):  
Anne Denton

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, that take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.


2020 ◽  
Vol 109 (11) ◽  
pp. 2029-2061
Author(s):  
Zahraa S. Abdallah ◽  
Mohamed Medhat Gaber

Abstract Time series classification (TSC) is a challenging task that attracted many researchers in the last few years. One main challenge in TSC is the diversity of domains where time series data come from. Thus, there is no “one model that fits all” in TSC. Some algorithms are very accurate in classifying a specific type of time series when the whole series is considered, while some only target the existence/non-existence of specific patterns/shapelets. Yet other techniques focus on the frequency of occurrences of discriminating patterns/features. This paper presents a new classification technique that addresses the inherent diversity problem in TSC using a nature-inspired method. The technique is stimulated by how flies look at the world through “compound eyes” that are made up of thousands of lenses, called ommatidia. Each ommatidium is an eye with its own lens, and thousands of them together create a broad field of vision. The developed technique similarly uses different lenses and representations to look at the time series, and then combines them for broader visibility. These lenses have been created through hyper-parameterisation of symbolic representations (Piecewise Aggregate and Fourier approximations). The algorithm builds a random forest for each lens, then performs soft dynamic voting for classifying new instances using the most confident eyes, i.e., forests. We evaluate the new technique, coined Co-eye, using the recently released extended version of UCR archive, containing more than 100 datasets across a wide range of domains. The results show the benefits of bringing together different perspectives reflecting on the accuracy and robustness of Co-eye in comparison to other state-of-the-art techniques.


2007 ◽  
Vol 23 (4) ◽  
pp. 227-237 ◽  
Author(s):  
Thomas Kubiak ◽  
Cornelia Jonas

Abstract. Patterns of psychological variables in time have been of interest to research from the beginning. This is particularly true for ambulatory monitoring research, where large (cross-sectional) time-series datasets are often the matter of investigation. Common methods for identifying cyclic variations include spectral analyses of time-series data or time-domain based strategies, which also allow for modeling cyclic components. Though the prerequisites of these sophisticated procedures, such as interval-scaled time-series variables, are seldom met, their usage is common. In contrast to the time-series approach, methods from a different field of statistics, directional or circular statistics, offer another opportunity for the detection of patterns in time, where fewer prerequisites have to be met. These approaches are commonly used in biology or geostatistics. They offer a wide range of analytical strategies to examine “circular data,” i.e., data where period of measurement is rotationally invariant (e.g., directions on the compass or daily hours ranging from 0 to 24, 24 being the same as 0). In psychology, however, circular statistics are hardly known at all. In the present paper, we intend to give a succinct introduction into the rationale of circular statistics and describe how this approach can be used for the detection of patterns in time, contrasting it with time-series analysis. We report data from a monitoring study, where mood and social interactions were assessed for 4 weeks in order to illustrate the use of circular statistics. Both the results of periodogram analyses and circular statistics-based results are reported. Advantages and possible pitfalls of the circular statistics approach are highlighted concluding that ambulatory assessment research can benefit from strategies borrowed from circular statistics.


Author(s):  
Trung Duy Pham ◽  
Dat Tran ◽  
Wanli Ma

In the biomedical and healthcare fields, the ownership protection of the outsourced data is becoming a challenging issue in sharing the data between data owners and data mining experts to extract hidden knowledge and patterns. Watermarking has been proved as a right-protection mechanism that provides detectable evidence for the legal ownership of a shared dataset, without compromising its usability under a wide range of data mining for digital data in different formats such as audio, video, image, relational database, text and software. Time series biomedical data such as Electroencephalography (EEG) or Electrocardiography (ECG) is valuable and costly in healthcare, which need to have owner protection when sharing or transmission in data mining application. However, this issue related to kind of data has only been investigated in little previous research as its characteristics and requirements. This paper proposes an optimized watermarking scheme to protect ownership for biomedical and healthcare systems in data mining. To achieve the highest possible robustness without losing watermark transparency, Particle Swarm Optimization (PSO) technique is used to optimize quantization steps to find a suitable one. Experimental results on EEG data show that the proposed scheme provides good imperceptibility and more robust against various signal processing techniques and common attacks such as noise addition, low-pass filtering, and re-sampling.


2016 ◽  
Vol 26 (09n10) ◽  
pp. 1361-1377 ◽  
Author(s):  
Daoyuan Li ◽  
Tegawende F. Bissyande ◽  
Jacques Klein ◽  
Yves Le Traon

Time series mining has become essential for extracting knowledge from the abundant data that flows out from many application domains. To overcome storage and processing challenges in time series mining, compression techniques are being used. In this paper, we investigate the loss/gain of performance of time series classification approaches when fed with lossy-compressed data. This extended empirical study is essential for reassuring practitioners, but also for providing more insights on how compression techniques can even be effective in smoothing and reducing noise in time series data. From a knowledge engineering perspective, we show that time series may be compressed by 90% using discrete wavelet transforms and still achieve remarkable classification accuracy, and that residual details left by popular wavelet compression techniques can sometimes even help to achieve higher classification accuracy than the raw time series data, as they better capture essential local features.


Sign in / Sign up

Export Citation Format

Share Document