Learning from ubiquitous data streams: Clustering data and data sources

2012 ◽  
Vol 25 (1) ◽  
pp. 69-71
Author(s):  
Pedro Pereira Rodrigues
2020 ◽  
Author(s):  
Carrie Manore ◽  
Geoffrey Fairchild ◽  
Amanda Ziemann ◽  
Nidhi Parikh ◽  
Katherine Kempfert ◽  
...  

ABSTRACTPredicting an infectious disease can help reduce its impact by advising public health interventions and personal preventive measures. While availability of heterogeneous data streams and sensors such as satellite imagery and the Internet have increased the opportunity to indirectly measure, understand, and predict global dynamics, the data may be prohibitively large and/or require intensive data management while also requiring subject matter experts to properly exploit the data sources (e.g., deriving features from fundamentally different data sets). Few efforts have quantitatively assessed the predictive benefit of novel data streams in comparison to more traditional data sources, especially at fine spatio-temporal resolutions. We have combined multiple traditional and non-traditional data streams (satellite imagery, Internet, weather, census, and clinical surveillance data) and assessed their combined ability to predict dengue in Brazil’s 27 states on a weekly and yearly basis over seven years. For each state, we nowcast dengue based on several time series models, which vary in complexity and inclusion of exogenous data. We also predict yearly cumulative risk by municipality and state. The top-performing model and utility of predictive data varies by state, implying that forecasting and nowcasting efforts in the future may be made more robust by and benefit from the use of multiple data streams and models. One size does not fit all, particularly when considering state-level predictions as opposed to the whole country. Our first-of-its-kind high resolution flexible system for predicting dengue incidence with heterogeneous (and still sometimes sparse) data can be extended to multiple applications and regions.


2015 ◽  
Vol 6 (1) ◽  
pp. 33-52 ◽  
Author(s):  
Geoffrey Hill ◽  
Pratim Datta ◽  
William Acar

This paper proposes that, in the context of generating actionable knowledge, uncertainties pertaining to big data streams should be recognized, categorized and accounted for at the appropriate level of knowledge management process models. Arguing that sensemaking from big data sources is a complex series of processes extending beyond just the application of sophisticated analytics, this paper proposes a big data reengineering (BDR) framework to guide requisite categorization, contextualization and remediation processes. The authors discuss the characteristics that uncertainty presents to organizations using big data streams as potential knowledge sources – surfacing relationships between the underlying knowledge flows and uncertainty and presenting typologies that categorize the effects of several common sources of uncertainty. These typologies also serve to provide guidance to transformation agent(s) regarding appropriate actions ultimately aimed at the generation of actionable knowledge.


2002 ◽  
Vol 3 (2) ◽  
pp. 23-27 ◽  
Author(s):  
Daniel Barbará
Keyword(s):  

2017 ◽  
Vol 67 ◽  
pp. 228-238 ◽  
Author(s):  
Jonathan de Andrade Silva ◽  
Eduardo Raul Hruschka ◽  
João Gama

2003 ◽  
Vol 15 (3) ◽  
pp. 515-528 ◽  
Author(s):  
S. Guha ◽  
A. Meyerson ◽  
N. Mishra ◽  
R. Motwani ◽  
L. O'Callaghan

2021 ◽  
Vol 27 (11) ◽  
pp. 1203-1221
Author(s):  
Amal Rekik ◽  
Salma Jamoussi

Clustering data streams in order to detect trending topic on social networks is a chal- lenging task that interests the researchers in the big data field. In fact, analyzing such data needs several requirements to be addressed due to their large amount and evolving nature. For this purpose, we propose, in this paper, a new evolving clustering method which can take into account the incremental nature of the data and meet with its principal requirements. Our method explores a deep learning technique to learn incrementally from unlabelled examples generated at high speed which need to be clustered instantly. To evaluate the performance of our method, we have conducted several experiments using the Sanders, HCR and Terr-Attacks datasets.


Sign in / Sign up

Export Citation Format

Share Document