Event Prediction in the Big Data Era

Events are occurrences in specific locations, time, and semantics that nontrivially impact either our society or the nature, such as earthquakes, civil unrest, system failures, pandemics, and crimes. It is highly desirable to be able to anticipate the occurrence of such events in advance to reduce the potential social upheaval and damage caused. Event prediction, which has traditionally been prohibitively challenging, is now becoming a viable option in the big data era and is thus experiencing rapid growth, also thanks to advances in high performance computers and new Artificial Intelligence techniques. There is a large amount of existing work that focuses on addressing the challenges involved, including heterogeneous multi-faceted outputs, complex (e.g., spatial, temporal, and semantic) dependencies, and streaming data feeds. Due to the strong interdisciplinary nature of event prediction problems, most existing event prediction methods were initially designed to deal with specific application domains, though the techniques and evaluation procedures utilized are usually generalizable across different domains. However, it is imperative yet difficult to cross-reference the techniques across different domains, given the absence of a comprehensive literature survey for event prediction. This article aims to provide a systematic and comprehensive survey of the technologies, applications, and evaluations of event prediction in the big data era. First, systematic categorization and summary of existing techniques are presented, which facilitate domain experts’ searches for suitable techniques and help model developers consolidate their research at the frontiers. Then, comprehensive categorization and summary of major application domains are provided to introduce wider applications to model developers to help them expand the impacts of their research. Evaluation metrics and procedures are summarized and standardized to unify the understanding of model performance among stakeholders, model developers, and domain experts in various application domains. Finally, open problems and future directions are discussed. Additional resources related to event prediction are included in the paper website: http://cs.emory.edu/∼lzhao41/projects/event_prediction_site.html.

Download Full-text

Event Prediction in Big Data Era: A Systematic Survey

10.36227/techrxiv.12733049.v1 ◽

2020 ◽

Author(s):

Liang Zhao

Keyword(s):

Big Data ◽

High Performance ◽

Research Evaluation ◽

Model Performance ◽

Streaming Data ◽

Open Problems ◽

Domain Experts ◽

Event Prediction ◽

Semantic Dependencies ◽

Prediction Problems

Events are occurrences in specific locations, time, and semantics that nontrivially impact either our society or the nature, such as earthquakes, civil unrest, system failures, pandemics, and crimes. It is highly desirable to be able to anticipate the occurrence of such events in advance in order to reduce the potential social upheaval and damage caused. Event prediction, which has traditionally been prohibitively challenging, is now becoming a viable option in the big data era and is thus experiencing rapid growth, also thanks to advances in high performance computers and new Artificial Intelligence techniques. There is a large amount of existing work that focuses on addressing the challenges involved, including heterogeneous multi-faceted outputs, complex (e.g., spatial, temporal, and semantic) dependencies, and streaming data feeds. Due to the strong interdisciplinary nature of event prediction problems, most existing event prediction methods were initially designed to deal with specific application domains, though the techniques and evaluation procedures utilized are usually generalizable across different domains. However, it is imperative yet difficult to cross-reference the techniques across different domains, given the absence of a comprehensive literature survey for event prediction. This paper aims to provide a systematic and comprehensive survey of the technologies, applications, and evaluations of event prediction in the big data era. First, systematic categorization and summary of existing techniques are presented, which facilitate domain experts’ searches for suitable techniques and help model developers consolidate their research at the frontiers. Then, comprehensive categorization and summary of major application domains are provided to introduce wider applications to model developers to help them expand the impacts of their research. Evaluation metrics and procedures are summarized and standardized to unify the understanding of model performance among stakeholders, model developers, and domain experts in various application domains. Finally, open problems and future directions for this promising and important domain are elucidated and discussed.

Download Full-text

Event Prediction in Big Data Era: A Systematic Survey

10.36227/techrxiv.12733049 ◽

2020 ◽

Author(s):

Liang Zhao

Keyword(s):

Big Data ◽

High Performance ◽

Research Evaluation ◽

Model Performance ◽

Streaming Data ◽

Open Problems ◽

Domain Experts ◽

Event Prediction ◽

Semantic Dependencies ◽

Prediction Problems

Events are occurrences in specific locations, time, and semantics that nontrivially impact either our society or the nature, such as earthquakes, civil unrest, system failures, pandemics, and crimes. It is highly desirable to be able to anticipate the occurrence of such events in advance in order to reduce the potential social upheaval and damage caused. Event prediction, which has traditionally been prohibitively challenging, is now becoming a viable option in the big data era and is thus experiencing rapid growth, also thanks to advances in high performance computers and new Artificial Intelligence techniques. There is a large amount of existing work that focuses on addressing the challenges involved, including heterogeneous multi-faceted outputs, complex (e.g., spatial, temporal, and semantic) dependencies, and streaming data feeds. Due to the strong interdisciplinary nature of event prediction problems, most existing event prediction methods were initially designed to deal with specific application domains, though the techniques and evaluation procedures utilized are usually generalizable across different domains. However, it is imperative yet difficult to cross-reference the techniques across different domains, given the absence of a comprehensive literature survey for event prediction. This paper aims to provide a systematic and comprehensive survey of the technologies, applications, and evaluations of event prediction in the big data era. First, systematic categorization and summary of existing techniques are presented, which facilitate domain experts’ searches for suitable techniques and help model developers consolidate their research at the frontiers. Then, comprehensive categorization and summary of major application domains are provided to introduce wider applications to model developers to help them expand the impacts of their research. Evaluation metrics and procedures are summarized and standardized to unify the understanding of model performance among stakeholders, model developers, and domain experts in various application domains. Finally, open problems and future directions for this promising and important domain are elucidated and discussed.

Download Full-text

RE-STORM: REAL-TIME ENERGY EFFICIENT DATA ANALYSIS ADAPTING STORM PLATFORM

Jurnal Teknologi ◽

10.11113/jt.v78.7672 ◽

2016 ◽

Vol 78 (10) ◽

Author(s):

Rizwan Patan ◽

Rajasekhara Babu M.

Keyword(s):

Energy Efficiency ◽

Big Data ◽

Response Time ◽

Energy Efficient ◽

Data Stream ◽

High Performance ◽

High Energy ◽

Streaming Data ◽

Stream Computing ◽

High Energy Efficiency

It is necessary to model an energy efficient and stream optimization towards achieve high energy efficiency for Streaming data without degrading response time in big data stream computing. This paper proposes an Energy Efficient Traffic aware resource scheduling and Re-Streaming Stream Structure to replace a default scheduling strategy of storm is entitled as re-storm. The model described in three parts; First, a mathematical relation among energy consumption, low response time and high traffic streams. Second, various approaches provided for reducing an energy without affecting response time and which provides high performance in overall stream computing in big data. Third, re-storm deployed energy efficient traffic aware scheduling on the storm platform. It allocates worker nodes online by using hot-swapping technique with task utilizing by energy consolidation through graph partitioning. Moreover, re-storm is achieved high energy efficiency, low response time in all types of data arriving speeds.it is suitable for allocation of worker nodes in a storm topology. Experiment results have been demonstrated the comparing existing strategies which are dealing with energy issues without affecting or reducing response time for a different data stream speed levels. Finally, it shows that the re-storm platform achieved high energy efficiency and low response time when compared to all existing approaches.

Download Full-text

Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives

BioMed Research International ◽

10.1155/2014/134023 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 63

Author(s):

Ivan Merelli ◽

Horacio Pérez-Sánchez ◽

Sandra Gesing ◽

Daniele D’Agostino

Keyword(s):

Big Data ◽

High Performance ◽

Biomedical Data ◽

Huge Amount ◽

Open Problems ◽

Starting Point ◽

Data Driven Approach ◽

Access Data ◽

Performance Computing ◽

Open Issues

The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge.

Download Full-text

High Performance Query Execution and Graph Visualization for Effectual Big Data Analytics with Cloud

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9110 ◽

2020 ◽

Vol 17 (6) ◽

pp. 2713-2715

Author(s):

Prachi Garg ◽

Sandip Kumar Goel ◽

Sakshi Sachdeva ◽

Neelam Oberoi

Keyword(s):

Big Data ◽

High Performance ◽

Data Science ◽

Predictive Analytics ◽

Big Data Analytics ◽

Streaming Data ◽

Graph Visualization ◽

Multiple Sources ◽

Need To Evaluate ◽

Usage Patterns

The domain of data science contains enormous approaches and high performance techniques in which there is need to evaluate the data from multiple dimensions so that the effectual outcomes and predictive knowledge can be extracted. Data Science and Analytics is now days one of the conspicuous streams of advanced knowledge discovery. Following are the key constituents and assorted elements which are required in the data science for cavernous and multi-dimensional analytics of the datasets including Streaming of Data from Multiple Sources and Channels, Pre-Processing and Cleaning of Real Time Streaming Data, Feature Engineering and Extraction of Prime Elements from Datasets, Numerical Analysis and Scientific Computations, Statistical Analytics on Datasets, Data Engineering Visualization, Plotting and Predictive Analytics. The paper is presenting the usage patterns and cavernous analytics of big data with the high performance visualization using Grafana.

Download Full-text

Enabling deeper learning on big data for materials informatics applications

Scientific Reports ◽

10.1038/s41598-021-83193-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dipendra Jha ◽

Vishu Gupta ◽

Logan Ward ◽

Zijiang Yang ◽

Christopher Wolverton ◽

...

Keyword(s):

Neural Networks ◽

Big Data ◽

Deep Learning ◽

Deep Neural Networks ◽

Materials Science ◽

Prediction Models ◽

Model Performance ◽

Materials Informatics ◽

Learning Framework ◽

Significant Attention

AbstractThe application of machine learning (ML) techniques in materials science has attracted significant attention in recent years, due to their impressive ability to efficiently extract data-driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector-based materials representation as input to build accurate property prediction models. We find that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to significantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.

Download Full-text

Perspectives on High-Performance Computing in a Big Data World

Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing - HPDC '19 ◽

10.1145/3307681.3325410 ◽

2019 ◽

Author(s):

Geoffrey C. Fox

Keyword(s):

Big Data ◽

High Performance Computing ◽

High Performance ◽

Performance Computing

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text