scholarly journals Fractal methods in intelligent technologies for processing large data streams

2019 ◽  
Vol 1189 ◽  
pp. 012045
Author(s):  
Mikhail I Turitsyn ◽  
Alexei V Myshev
2021 ◽  
pp. 104063872110030
Author(s):  
Craig N. Carter ◽  
Jacqueline L. Smith

Test data generated by ~60 accredited member laboratories of the American Association of Veterinary Laboratory Diagnosticians (AAVLD) is of exceptional quality. These data are captured by 1 of 13 laboratory information management systems (LIMSs) developed specifically for veterinary diagnostic laboratories (VDLs). Beginning ~2000, the National Animal Health Laboratory Network (NAHLN) developed an electronic messaging system for LIMS to automatically send standardized data streams for 14 select agents to a national repository. This messaging enables the U.S. Department of Agriculture to track and respond to high-consequence animal disease outbreaks such as highly pathogenic avian influenza. Because of the lack of standardized data collection in the LIMSs used at VDLs, there is, to date, no means of summarizing VDL large data streams for multi-state and national animal health studies or for providing near-real-time tracking for hundreds of other important animal diseases in the United States that are detected routinely by VDLs. Further, VDLs are the only state and federal resources that can provide early detection and identification of endemic and emerging zoonotic diseases. Zoonotic diseases are estimated to be responsible for 2.5 billion cases of human illness and 2.7 million deaths worldwide every year. The economic and health impact of the SARS-CoV-2 pandemic is self-evident. We review here the history and progress of data management in VDLs and discuss ways of seizing unexplored opportunities to advance data leveraging to better serve animal health, public health, and One Health.


2010 ◽  
Vol 19 (04) ◽  
pp. 393-415 ◽  
Author(s):  
MARTIN MOLINA ◽  
AMANDA STENT

In this article we describe a method for automatically generating text summaries of data corresponding to traces of spatial movement in geographical areas. The method can help humans to understand large data streams, such as the amounts of GPS data recorded by a variety of sensors in mobile phones, cars, etc. We describe the knowledge representations we designed for our method and the main components of our method for generating the summaries: a discourse planner, an abstraction module and a text generator. We also present evaluation results that show the ability of our method to generate certain types of geospatial and temporal descriptions.


Author(s):  
Maroua Bahri ◽  
Albert Bifet ◽  
Silviu Maniu ◽  
Heitor Murilo Gomes

Mining high-dimensional data streams poses a fundamental challenge to machine learning as the presence of high numbers of attributes can remarkably degrade any mining task's performance. In the past several years, dimension reduction (DR) approaches have been successfully applied for different purposes (e.g., visualization). Due to their high-computational costs and numerous passes over large data, these approaches pose a hindrance when processing infinite data streams that are potentially high-dimensional. The latter increases the resource-usage of algorithms that could suffer from the curse of dimensionality. To cope with these issues, some techniques for incremental DR have been proposed. In this paper, we provide a survey on reduction approaches designed to handle data streams and highlight the key benefits of using these approaches for stream mining algorithms.


2019 ◽  
Vol 15 (7) ◽  
pp. 155014771986220
Author(s):  
Youngkuk Kim ◽  
Siwoon Son ◽  
Yang-Sae Moon

In this article, we address dynamic workflow management for sampling and filtering data streams in Apache Storm. As many sensors generate data streams continuously, we often use sampling to choose some representative data or filtering to remove unnecessary data. Apache Storm is a real-time distributed processing platform suitable for handling large data streams. Storm, however, must stop the entire work when it changes the input data structure or processing algorithm as it needs to modify, redistribute, and restart the programs. In addition, for effective data processing, we often use Storm with Kafka and databases, but it is difficult to use these platforms in an integrated manner. In this article, we derive the problems when applying sampling and filtering algorithms to Storm and propose a dynamic workflow management model that solves these problems. First, we present the concept of a plan consisting of input, processing, and output modules of a data stream. Second, we propose Storm Plan Manager, which can operate Storm, Kafka, and database as a single integrated system. Storm Plan Manager is an integrated workflow manager that dynamically controls sampling and filtering of data streams through plans. Third, as a key feature, Storm Plan Manager provides a Web client interface to visually create, execute, and monitor plans. In this article, we show the usefulness of the proposed Storm Plan Manager by presenting its design, implementation, and experimental results in order.


2009 ◽  
Author(s):  
Ming C. Hao ◽  
Umeshwar Dayal ◽  
Daniel A. Keim ◽  
Ratnesh K. Sharma ◽  
Abhay Mehta

2006 ◽  
Vol 16 (1) ◽  
pp. 68-70 ◽  
Author(s):  
S. A. Sharov ◽  
Yu. V. Orlov ◽  
I. G. Persiantsev

2020 ◽  
Vol 11 (1) ◽  
pp. 61
Author(s):  
Stavros Souravlas ◽  
Sofia Anastasiadou ◽  
Stefanos Katsavounis

An important as well as challenging task in modern applications is the management and processing with very short delays of large data volumes. It is quite often, that the capabilities of individual machines are exceeded when trying to manage such large data volumes. In this regard, it is important to develop efficient task scheduling algorithms, which reduce the stream processing costs. What makes the situation more difficult is the fact that the applications as well as the processing systems are prone to changes during runtime: processing nodes may be down, temporarily or permanently, more resources may be needed by an application, and so on. Therefore, it is necessary to develop dynamic schedulers, which can effectively deal with these changes during runtime. In this work, we provide a fast and fair task migration policy while maintaining load balancing and low latency times. The experimental results have shown that our scheme offers better load balancing and reduces the overall latency compared to the state of the art strategies, due to the stepwise communication and the pipeline based processing it employs.


2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Agustín Ortíz Díaz ◽  
José del Campo-Ávila ◽  
Gonzalo Ramos-Jiménez ◽  
Isvani Frías Blanco ◽  
Yailé Caballero Mota ◽  
...  

The treatment of large data streams in the presence of concept drifts is one of the main challenges in the field of data mining, particularly when the algorithms have to deal with concepts that disappear and then reappear. This paper presents a new algorithm, called Fast Adapting Ensemble (FAE), which adapts very quickly to both abrupt and gradual concept drifts, and has been specifically designed to deal with recurring concepts. FAE processes the learning examples in blocks of the same size, but it does not have to wait for the batch to be complete in order to adapt its base classification mechanism. FAE incorporates a drift detector to improve the handling of abrupt concept drifts and stores a set of inactive classifiers that represent old concepts, which are activated very quickly when these concepts reappear. We compare our new algorithm with various well-known learning algorithms, taking into account, common benchmark datasets. The experiments show promising results from the proposed algorithm (regarding accuracy and runtime), handling different types of concept drifts.


Entropy ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. 1414
Author(s):  
Krzysztof Gajowniczek ◽  
Marcin Bator ◽  
Tomasz Ząbkowski

Data from smart grids are challenging to analyze due to their very large size, high dimensionality, skewness, sparsity, and number of seasonal fluctuations, including daily and weekly effects. With the data arriving in a sequential form the underlying distribution is subject to changes over the time intervals. Time series data streams have their own specifics in terms of the data processing and data analysis because, usually, it is not possible to process the whole data in memory as the large data volumes are generated fast so the processing and the analysis should be done incrementally using sliding windows. Despite the proposal of many clustering techniques applicable for grouping the observations of a single data stream, only a few of them are focused on splitting the whole data streams into the clusters. In this article we aim to explore individual characteristics of electricity usage and recommend the most suitable tariff to the customer so they can benefit from lower prices. This work investigates various algorithms (and their improvements) what allows us to formulate the clusters, in real time, based on smart meter data.


Sign in / Sign up

Export Citation Format

Share Document