Fractal methods in intelligent technologies for processing large data streams

Test data generated by ~60 accredited member laboratories of the American Association of Veterinary Laboratory Diagnosticians (AAVLD) is of exceptional quality. These data are captured by 1 of 13 laboratory information management systems (LIMSs) developed specifically for veterinary diagnostic laboratories (VDLs). Beginning ~2000, the National Animal Health Laboratory Network (NAHLN) developed an electronic messaging system for LIMS to automatically send standardized data streams for 14 select agents to a national repository. This messaging enables the U.S. Department of Agriculture to track and respond to high-consequence animal disease outbreaks such as highly pathogenic avian influenza. Because of the lack of standardized data collection in the LIMSs used at VDLs, there is, to date, no means of summarizing VDL large data streams for multi-state and national animal health studies or for providing near-real-time tracking for hundreds of other important animal diseases in the United States that are detected routinely by VDLs. Further, VDLs are the only state and federal resources that can provide early detection and identification of endemic and emerging zoonotic diseases. Zoonotic diseases are estimated to be responsible for 2.5 billion cases of human illness and 2.7 million deaths worldwide every year. The economic and health impact of the SARS-CoV-2 pandemic is self-evident. We review here the history and progress of data management in VDLs and discuss ways of seizing unexplored opportunities to advance data leveraging to better serve animal health, public health, and One Health.

Download Full-text

A KNOWLEDGE-BASED METHOD FOR GENERATING SUMMARIES OF SPATIAL MOVEMENT IN GEOGRAPHIC AREAS

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821301000025x ◽

2010 ◽

Vol 19 (04) ◽

pp. 393-415 ◽

Cited By ~ 5

Author(s):

MARTIN MOLINA ◽

AMANDA STENT

Keyword(s):

Mobile Phones ◽

Data Streams ◽

Large Data ◽

Gps Data ◽

Knowledge Representations ◽

Knowledge Based ◽

Spatial Movement ◽

Present Evaluation ◽

Main Components

In this article we describe a method for automatically generating text summaries of data corresponding to traces of spatial movement in geographical areas. The method can help humans to understand large data streams, such as the amounts of GPS data recorded by a variety of sensors in mobile phones, cars, etc. We describe the knowledge representations we designed for our method and the main components of our method for generating the summaries: a discourse planner, an abstraction module and a text generator. We also present evaluation results that show the ability of our method to generate certain types of geospatial and temporal descriptions.

Download Full-text

Survey on Feature Transformation Techniques for Data Streams

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/668 ◽

2020 ◽

Author(s):

Maroua Bahri ◽

Albert Bifet ◽

Silviu Maniu ◽

Heitor Murilo Gomes

Keyword(s):

Machine Learning ◽

Data Streams ◽

Large Data ◽

High Dimensional ◽

Feature Transformation ◽

Transformation Techniques ◽

Computational Costs ◽

The Past ◽

Fundamental Challenge ◽

Mining Algorithms

Mining high-dimensional data streams poses a fundamental challenge to machine learning as the presence of high numbers of attributes can remarkably degrade any mining task's performance. In the past several years, dimension reduction (DR) approaches have been successfully applied for different purposes (e.g., visualization). Due to their high-computational costs and numerous passes over large data, these approaches pose a hindrance when processing infinite data streams that are potentially high-dimensional. The latter increases the resource-usage of algorithms that could suffer from the curse of dimensionality. To cope with these issues, some techniques for incremental DR have been proposed. In this paper, we provide a survey on reduction approaches designed to handle data streams and highlight the key benefits of using these approaches for stream mining algorithms.

Download Full-text

SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm

International Journal of Distributed Sensor Networks ◽

10.1177/1550147719862206 ◽

2019 ◽

Vol 15 (7) ◽

pp. 155014771986220

Author(s):

Youngkuk Kim ◽

Siwoon Son ◽

Yang-Sae Moon

Keyword(s):

Data Streams ◽

Distributed Processing ◽

Workflow Management ◽

Large Data ◽

Integrated System ◽

Input Processing ◽

Dynamic Workflow ◽

Processing Platform ◽

Apache Storm ◽

Workflow Manager

In this article, we address dynamic workflow management for sampling and filtering data streams in Apache Storm. As many sensors generate data streams continuously, we often use sampling to choose some representative data or filtering to remove unnecessary data. Apache Storm is a real-time distributed processing platform suitable for handling large data streams. Storm, however, must stop the entire work when it changes the input data structure or processing algorithm as it needs to modify, redistribute, and restart the programs. In addition, for effective data processing, we often use Storm with Kafka and databases, but it is difficult to use these platforms in an integrated manner. In this article, we derive the problems when applying sampling and filtering algorithms to Storm and propose a dynamic workflow management model that solves these problems. First, we present the concept of a plan consisting of input, processing, and output modules of a data stream. Second, we propose Storm Plan Manager, which can operate Storm, Kafka, and database as a single integrated system. Storm Plan Manager is an integrated workflow manager that dynamically controls sampling and filtering of data streams through plans. Third, as a key feature, Storm Plan Manager provides a Web client interface to visually create, execute, and monitor plans. In this article, we show the usefulness of the proposed Storm Plan Manager by presenting its design, implementation, and experimental results in order.

Download Full-text

Visual analytics of anomaly detection in large data streams

10.1117/12.810945 ◽

2009 ◽

Cited By ~ 6

Author(s):

Ming C. Hao ◽

Umeshwar Dayal ◽

Daniel A. Keim ◽

Ratnesh K. Sharma ◽

Abhay Mehta

Keyword(s):

Anomaly Detection ◽

Data Streams ◽

Visual Analytics ◽

Large Data

Download Full-text

The high-speed compression of large data streams in ultrasonic diagnostics

Pattern Recognition and Image Analysis ◽

10.1134/s1054661806010214 ◽

2006 ◽

Vol 16 (1) ◽

pp. 68-70 ◽

Cited By ~ 1

Author(s):

S. A. Sharov ◽

Yu. V. Orlov ◽

I. G. Persiantsev

Keyword(s):

Data Streams ◽

High Speed ◽

Large Data ◽

Ultrasonic Diagnostics

Download Full-text

More on Pipelined Dynamic Scheduling of Big Data Streams

Applied Sciences ◽

10.3390/app11010061 ◽

2020 ◽

Vol 11 (1) ◽

pp. 61

Author(s):

Stavros Souravlas ◽

Sofia Anastasiadou ◽

Stefanos Katsavounis

Keyword(s):

Big Data ◽

Load Balancing ◽

Task Scheduling ◽

Data Streams ◽

Dynamic Scheduling ◽

State Of The Art ◽

Migration Policy ◽

Large Data ◽

Low Latency ◽

Big Data Streams

An important as well as challenging task in modern applications is the management and processing with very short delays of large data volumes. It is quite often, that the capabilities of individual machines are exceeded when trying to manage such large data volumes. In this regard, it is important to develop efficient task scheduling algorithms, which reduce the stream processing costs. What makes the situation more difficult is the fact that the applications as well as the processing systems are prone to changes during runtime: processing nodes may be down, temporarily or permanently, more resources may be needed by an application, and so on. Therefore, it is necessary to develop dynamic schedulers, which can effectively deal with these changes during runtime. In this work, we provide a fast and fair task migration policy while maintaining load balancing and low latency times. The experimental results have shown that our scheme offers better load balancing and reduces the overall latency compared to the state of the art strategies, due to the stepwise communication and the pipeline based processing it employs.

Download Full-text

Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift

The Scientific World JOURNAL ◽

10.1155/2015/235810 ◽

2015 ◽

Vol 2015 ◽

pp. 1-14 ◽

Cited By ~ 6

Author(s):

Agustín Ortíz Díaz ◽

José del Campo-Ávila ◽

Gonzalo Ramos-Jiménez ◽

Isvani Frías Blanco ◽

Yailé Caballero Mota ◽

...

Keyword(s):

Data Mining ◽

Data Streams ◽

Concept Drift ◽

Learning Algorithms ◽

Large Data ◽

Different Types ◽

Benchmark Datasets ◽

Mining Data Streams ◽

Concept Drifts

The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE), which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime), handling different types of concept drifts.

Download Full-text

Whole Time Series Data Streams Clustering: Dynamic Profiling of the Electricity Consumption

Entropy ◽

10.3390/e22121414 ◽

2020 ◽

Vol 22 (12) ◽

pp. 1414

Author(s):

Krzysztof Gajowniczek ◽

Marcin Bator ◽

Tomasz Ząbkowski

Keyword(s):

Time Series ◽

Data Streams ◽

Smart Grids ◽

Time Series Data ◽

Electricity Consumption ◽

Large Data ◽

Individual Characteristics ◽

Series Data ◽

Underlying Distribution ◽

Sliding Windows

Data from smart grids are challenging to analyze due to their very large size, high dimensionality, skewness, sparsity, and number of seasonal fluctuations, including daily and weekly effects. With the data arriving in a sequential form the underlying distribution is subject to changes over the time intervals. Time series data streams have their own specifics in terms of the data processing and data analysis because, usually, it is not possible to process the whole data in memory as the large data volumes are generated fast so the processing and the analysis should be done incrementally using sliding windows. Despite the proposal of many clustering techniques applicable for grouping the observations of a single data stream, only a few of them are focused on splitting the whole data streams into the clusters. In this article we aim to explore individual characteristics of electricity usage and recommend the most suitable tariff to the customer so they can benefit from lower prices. This work investigates various algorithms (and their improvements) what allows us to formulate the clusters, in real time, based on smart meter data.

Download Full-text

Differentially Private Frequency Sketches for Intermittent Queries on Large Data Streams

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9377786 ◽

2020 ◽

Author(s):

Sinan Yildirim ◽

Kamer Kaya ◽

Soner Aydin ◽

Hakan Bugra Erentug

Keyword(s):

Data Streams ◽

Large Data

Download Full-text