Real-time stream processing for Big Data

2016 ◽  
Vol 58 (4) ◽  
Author(s):  
Wolfram Wingerath ◽  
Felix Gessert ◽  
Steffen Friedrich ◽  
Norbert Ritter

AbstractWith the rise of the web 2.0 and the Internet of things, it has become feasible to track all kinds of information over time, in particular fine-grained user activities and sensor data on their environment and even their biometrics. However, while efficiency remains mandatory for any application trying to cope with huge amounts of data, only part of the potential of today's Big Data repositories can be exploited using traditional batch-oriented approaches as the value of data often decays quickly and high latency becomes unacceptable in some applications. In the last couple of years, several distributed data processing systems have emerged that deviate from the batch-oriented approach and tackle data items as they arrive, thus acknowledging the growing importance of timeliness and velocity in Big Data analytics.In this article, we give an overview over the state of the art of stream processors for low-latency Big Data analytics and conduct a qualitative comparison of the most popular contenders, namely Storm and its abstraction layer Trident, Samza and Spark Streaming. We describe their respective underlying rationales, the guarantees they provide and discuss the trade-offs that come with selecting one of them for a particular task.

Author(s):  
Dawn E. Holmes

‘Big data analytics’ argues that big data is only useful if we can extract useful information from it. It looks at some of the techniques used to discover useful information from big data, such as customer preferences or how fast an epidemic is spreading. Big data analytics is changing rapidly as the size of the datasets increases and classical statistics makes room for this new paradigm. An example of big data analytics is the algorithmic method called MapReduce, a distributed data processing system that forms part of the core functionality of the Hadoop Ecosystem. Amazon, Google, Facebook, and many others use Hadoop to store and process their data.


2018 ◽  
Vol 15 (3) ◽  
Author(s):  
Blagoj Ristevski ◽  
Ming Chen

Abstract This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various – omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.


2019 ◽  
Vol 35 (4) ◽  
pp. 893-903 ◽  
Author(s):  
Seemu Sharma ◽  
Seema Bawa

Abstract Cultural data and information on the web are continuously increasing, evolving, and reshaping in the form of big data due to globalization, digitization, and its vast exploration, with common people realizing the importance of ancient values. Therefore, before it becomes unwieldy and too complex to manage, its integration in the form of big data repositories is essential. This article analyzes the complexity of the growing cultural data and presents a Cultural Big Data Repository as an efficient way to store and retrieve cultural big data. The repository is highly scalable and provides integrated high-performance methods for big data analytics in cultural heritage. Experimental results demonstrate that the proposed repository outperforms in terms of space as well as storage and retrieval time of Cultural Big Data.


Author(s):  
Chien-Lung Chan ◽  
Chi-Chang Chang

Unlike most daily decisions, medical decision making often has substantial consequences and trade-offs. Recently, big data analytics techniques such as statistical analysis, data mining, machine learning and deep learning can be applied to construct innovative decision models. With complex decision making, it can be difficult to comprehend and compare the benefits and risks of all available options to make a decision. For these reasons, this Special Issue focuses on the use of big data analytics and forms of public health decision making based on the decision model, spanning from theory to practice. A total of 64 submissions were carefully blind peer reviewed by at least two referees and, finally, 23 papers were selected for this Special Issue.


10.2196/19540 ◽  
2020 ◽  
Vol 22 (5) ◽  
pp. e19540 ◽  
Author(s):  
Chi-Mai Chen ◽  
Hong-Wei Jyan ◽  
Shih-Chieh Chien ◽  
Hsiao-Hsuan Jen ◽  
Chen-Yang Hsu ◽  
...  

Background Low infection and case-fatality rates have been thus far observed in Taiwan. One of the reasons for this major success is better use of big data analytics in efficient contact tracing and management and surveillance of those who require quarantine and isolation. Objective We present here a unique application of big data analytics among Taiwanese people who had contact with more than 3000 passengers that disembarked at Keelung harbor in Taiwan for a 1-day tour on January 31, 2020, 5 days before the outbreak of coronavirus disease (COVID-19) on the Diamond Princess cruise ship on February 5, 2020, after an index case was identified on January 20, 2020. Methods The smart contact tracing–based mobile sensor data, cross-validated by other big sensor surveillance data, were analyzed by the mobile geopositioning method and rapid analysis to identify 627,386 potential contact-persons. Information on self-monitoring and self-quarantine was provided via SMS, and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) tests were offered for symptomatic contacts. National Health Insurance claims big data were linked, to follow-up on the outcome related to COVID-19 among those who were hospitalized due to pneumonia and advised to undergo screening for SARS-CoV-2. Results As of February 29, a total of 67 contacts who were tested by reverse transcription–polymerase chain reaction were all negative and no confirmed COVID-19 cases were found. Less cases of respiratory syndrome and pneumonia were found after the follow-up of the contact population compared with the general population until March 10, 2020. Conclusions Big data analytics with smart contact tracing, automated alert messaging for self-restriction, and follow-up of the outcome related to COVID-19 using health insurance data could curtail the resources required for conventional epidemiological contact tracing.


2019 ◽  
Vol 8 (3) ◽  
pp. 4384-4392

Big data is being generating in a wide variety of formats at an exponential rate. Big data analytics deals with processing and analyzing voluminous data to provide useful insight for guided decision making. The traditional data storage and management tools are not well-equipped to handle big data and its application. Apache Hadoop is a popular open-source platform that supports storage and processing of extremely large datasets. For the purposes of big data analytics, Hadoop ecosystem provides a variety of tools. However, there is a need to select a tool that is best suited for a specific requirement of big data analytics. The tools have their own advantages and drawbacks over each other. Some of them have overlapping business use cases however they differ in critical functional areas. So, there is a need to consider the trade-offs between usability and suitability while selecting a tool from Hadoop ecosystem. This paper identifies the requirements for Big Data Analytics (BDA) and maps tools of the Hadoop framework that are best suited for them. For this, we have categorized Hadoop tools according to their functionality and usage. Different Hadoop tools are discussed from the users’ perspective along with their pros and cons, if any. Also, for each identified category, comparison of Hadoop tools based on important parameters is presented. The tools have been thoroughly studied and analyzed based on their suitability for the different requirements of big data analytics. A mapping of big data analytics requirements to the Hadoop tools has been established for use by the data analysts and predictive modelers.


2019 ◽  
Vol 8 (S3) ◽  
pp. 90-93
Author(s):  
K. Rohitha ◽  
V. Bhagyasree ◽  
K. Kusuma ◽  
S. Kokila

Big data analytics plays a major role in today’s industry which insisted to use big data analytics for the analysis of previous data. Patient record keeping is very much important to track the history of the patient. According to the patient previous records, decision is made. Large volumes of data are created on a daily basis and this data is used in decision making process. But, health care industry has not sensed the potential benefits from big data analytics. To address this need, four big data analytics capabilities were identified. In addition to four, five capabilities were proposed which provides practical insights for administrator. On the other way, data security plays a key role in health care industry. In order to overcome this, a new architecture is proposed for the implementation to IOT and process scalable sensor data for health care systems. This paper focuses on data security so that we can make use of potential capabilities and benefits of big data analytics in a better way.


Sign in / Sign up

Export Citation Format

Share Document