A Real-Time Log Analyzer Based on MongoDB

Real-time log analysis on large scale data is important for applications. Specifically, real-time refers to UI latency within 100ms. Therefore, techniques which efficiently support real-time analysis over large log data sets are desired. MongoDB provides well query performance, aggregation frameworks, and distributed architecture which is suitable for real-time data query and massive log analysis. In this paper, a novel implementation approach for an event driven file log analyzer is presented, and performance comparison of query, scan and aggregation operations over MongoDB, HBase and MySQL is analyzed. Our experimental results show that HBase performs best balanced in all operations, while MongoDB provides less than 10ms query speed in some operations which is most suitable for real-time applications.

Download Full-text

A Framework for International Collaboration on ITER Using Large-Scale Data Transfer to Enable Near-Real-Time Analysis

Fusion Science & Technology ◽

10.1080/15361055.2020.1851073 ◽

2021 ◽

Vol 77 (2) ◽

pp. 98-108

Author(s):

R. M. Churchill ◽

C. S. Chang ◽

J. Choi ◽

J. Wong ◽

S. Klasky ◽

...

Keyword(s):

Real Time ◽

International Collaboration ◽

Large Scale ◽

Data Transfer ◽

Time Analysis ◽

Real Time Analysis ◽

Large Scale Data ◽

Scale Data

Download Full-text

Pattern Recognition for Large-Scale Data Processing

Strategic Data-Based Wisdom in the Big Data Era - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-4666-8122-4.ch011 ◽

2015 ◽

pp. 198-208 ◽

Cited By ~ 2

Author(s):

Amir Basirat ◽

Asad I. Khan ◽

Heinz W. Schmidt

Keyword(s):

Large Scale ◽

Distributed Processing ◽

Data Sets ◽

Distributed Data ◽

Time Data ◽

Deterministic Learning ◽

Large Scale Data ◽

Future Data ◽

Large Scale Data Processing ◽

Learning Schemes

One of the main challenges for large-scale computer clouds dealing with massive real-time data is in coping with the rate at which unprocessed data is being accumulated. Transforming big data into valuable information requires a fundamental re-think of the way in which future data management models will need to be developed on the Internet. Unlike the existing relational schemes, pattern-matching approaches can analyze data in similar ways to which our brain links information. Such interactions when implemented in voluminous data clouds can assist in finding overarching relations in complex and highly distributed data sets. In this chapter, a different perspective of data recognition is considered. Rather than looking at conventional approaches, such as statistical computations and deterministic learning schemes, this chapter focuses on distributed processing approach for scalable data recognition and processing.

Download Full-text

Let the Data Do the Talking: Hypothesis Discovery from Large-Scale Data Sets in Real Time

Data-Intensive Computing ◽

10.1017/cbo9780511844409.009 ◽

2012 ◽

pp. 235-257

Author(s):

Christopher Oehmen ◽

Scott Dowson ◽

Wes Hatley ◽

Justin Almquist ◽

Bobbie-Jo Webb-Robertson ◽

...

Keyword(s):

Real Time ◽

Large Scale ◽

Data Sets ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

A Dynamic Scaling Approach in Hadoop YARN

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.286176 ◽

2022 ◽

Vol 12 (2) ◽

pp. 0-0

Keyword(s):

Large Scale ◽

Distributed Processing ◽

Dynamic Scaling ◽

Data Sets ◽

Large Scale Data ◽

The People ◽

Big Data Applications ◽

Scaling Process ◽

And Performance

In Cloud based Big Data applications, Hadoop has been widely adopted for distributed processing large scale data sets. However, the wastage of energy consumption of data centers still constitutes an important axis of research due to overuse of resources and extra overhead costs. As a solution to overcome this challenge, a dynamic scaling of resources in Hadoop YARN Cluster is a practical solution. This paper proposes a dynamic scaling approach in Hadoop YARN (DSHYARN) to add or remove nodes automatically based on workload. It is based on two algorithms (scaling up/down) which are implemented to automate the scaling process in the cluster. This article aims to assure energy efficiency and performance of Hadoop YARN’ clusters. To validate the effectiveness of DSHYARN, a case study with sentiment analysis on tweets about covid-19 vaccine is provided. the goal is to analyze tweets of the people posted on Twitter application. The results showed improvement in CPU utilization, RAM utilization and Job Completion time. In addition, the energy has been reduced of 16% under average workload.

Download Full-text

Research and Implementation of the Secure Database-Update Mechanism

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1752 ◽

2014 ◽

Vol 513-517 ◽

pp. 1752-1755 ◽

Cited By ~ 1

Author(s):

Chun Liu ◽

Kun Tan

Keyword(s):

Real Time ◽

Large Scale ◽

Data Transfer ◽

Time Data ◽

Safety Critical ◽

Large Scale Data ◽

Data Transfer Protocol ◽

Safety And Reliability ◽

Computer Based ◽

Short Time

For a safety critical computer, large-scale data like database which has to be transferred in an instant time cannot be voted directly. This paper proposes a database update algorithm for safety critical computer based on status vote,which is to vote the database status instead of database itself. This algorithm can solve the problem of voting too much data in a short time, and compare versions of database of different modules in real time. A Markov model is built to calculate the safety and reliability of this algorithm. The results show that this algorithm meets the update requirement of safety critical computer. 1. Communication protocol for database update 1.1 TFTP protocol TFTP is a simple protocol for transporting document. It usually uses the UDP protocol to realize but the TFTP does not require the specific agreement of implementation and can implement with TCP in special occasions. [This agreement is designed for small file transferring, so it doesn't have function many FTP usually does; it can only acquire or write the file from the server and not able tot list directory, not authenticate. It transfers 8 bits of data with three models: netascii, the eight-bit ASCII form; octet, the eight-bit source data type; mail, no longer supported, it returns the data back directly to the user rather than saved as a file. 1.2 SRTP Ethernet security real-time data transfer protocol

Download Full-text

The joint contribution of participation and performance to learning functions: Exploring the effects of age in large-scale data sets

Behavior Research Methods ◽

10.3758/s13428-018-1128-2 ◽

2018 ◽

Vol 51 (4) ◽

pp. 1531-1543 ◽

Cited By ~ 5

Author(s):

Mark Steyvers ◽

Aaron S. Benjamin

Keyword(s):

Large Scale ◽

Data Sets ◽

Large Scale Data ◽

Learning Functions ◽

And Performance ◽

Joint Contribution ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Real-time volume splatter for large-scale data sets

10.1117/12.526164 ◽

2004 ◽

Author(s):

Jiawan Zhang ◽

Jizhou Sun ◽

Xiaotu Li ◽

Mingchu Li ◽

Xiaobing Sun ◽

...

Keyword(s):

Real Time ◽

Large Scale ◽

Data Sets ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

HydroGFD3.0: a 25 km global near real-time updated precipitation and temperature data set

10.5194/essd-2020-236 ◽

2020 ◽

Cited By ~ 1

Author(s):

Peter Berg ◽

Fredrik Almén ◽

Denica Bozhinova

Keyword(s):

Real Time ◽

Large Scale ◽

Hydrological Modeling ◽

Horizontal Resolution ◽

Maximum Temperature ◽

Historical Period ◽

Data Sets ◽

Monthly Precipitation ◽

Time Data ◽

Data Set

Abstract. HydroGFD (Hydrological Global Forcing Data) is a data set of bias adjusted reanalysis data for daily precipitation, and minimum, mean, and maximum temperature. It is mainly intended for large scale hydrological modeling, but is also suitable for other impact modeling. The data set has an almost global land area coverage, excluding the Antarctic continent, at a horizontal resolution of 0.25°, i.e. about 25 km. It is available for the complete ERA5 reanalysis time period; currently 1979 until five days ago. This period will be extended back to 1950 once the back catalogue of ERA5 is available. The historical period is adjusted using global gridded observational data sets, and to acquire real-time data, a collection of several reference data sets is used. Consistency in time is attempted by relying on a background climatology, and only making use of anomalies from the different data sets. Precipitation is adjusted for mean bias as well as the number or wet days in a month. The latter is relying on a calibrated statistical method with input only of the monthly precipitation anomaly, such that no additional input data about the number of wet days is necessary. The daily mean temperature is adjusted toward the monthly mean of the observations, and applied to 1 h timesteps of the ERA5 reanalysis. Daily mean, minimum and maximum temperature are then calculated. The performance of the HydroGFD3 data set is on par with other similar products, although there are significant differences in different parts of the globe, especially where observations are uncertain. Further, HydroGFD3 tends to have higher precipitation extremes, partly due to its higher spatial resolution. In this paper, we present the methodology, evaluation results, and how to access to the data set at https://doi.org/10.5281/zenodo.3871707.

Download Full-text

Using data-driven approaches to improve delivery of animal health care interventions for public health

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2003722118 ◽

2021 ◽

Vol 118 (5) ◽

pp. e2003722118

Author(s):

Stella Mazeri ◽

Jordana L. Burdon Bailey ◽

Dagmar Mayer ◽

Patrick Chikungwa ◽

Julius Chulu ◽

...

Keyword(s):

Real Time ◽

Vaccination Coverage ◽

Large Scale ◽

Animal Health ◽

Vaccination Campaign ◽

Sub Saharan Africa ◽

Data Driven ◽

Time Data ◽

Real Time Analysis ◽

Vaccination Campaigns

Rabies kills ∼60,000 people per year. Annual vaccination of at least 70% of dogs has been shown to eliminate rabies in both human and canine populations. However, delivery of large-scale mass dog vaccination campaigns remains a challenge in many rabies-endemic countries. In sub-Saharan Africa, where the vast majority of dogs are owned, mass vaccination campaigns have typically depended on a combination of static point (SP) and door-to-door (D2D) approaches since SP-only campaigns often fail to achieve 70% vaccination coverage. However, D2D approaches are expensive, labor-intensive, and logistically challenging, raising the need to develop approaches that increase attendance at SPs. Here, we report a real-time, data-driven approach to improve efficiency of an urban dog vaccination campaign. Historically, we vaccinated ∼35,000 dogs in Blantyre city, Malawi, every year over a 20-d period each year using combined fixed SP (FSP) and D2D approaches. To enhance cost effectiveness, we used our historical vaccination dataset to define the barriers to FSP attendance. Guided by these insights, we redesigned our vaccination campaign by increasing the number of FSPs and eliminating the expensive and labor-intensive D2D component. Combined with roaming SPs, whose locations were defined through the real-time analysis of vaccination coverage data, this approach resulted in the vaccination of near-identical numbers of dogs in only 11 d. This approach has the potential to act as a template for successful and sustainable future urban SP-only dog vaccination campaigns.

Download Full-text

HydroGFD3.0 (Hydrological Global Forcing Data): a 25 km global precipitation and temperature data set updated in near-real time

Earth System Science Data ◽

10.5194/essd-13-1531-2021 ◽

2021 ◽

Vol 13 (4) ◽

pp. 1531-1545

Author(s):

Peter Berg ◽

Fredrik Almén ◽

Denica Bozhinova

Keyword(s):

Real Time ◽

Large Scale ◽

Horizontal Resolution ◽

Precipitation Anomaly ◽

Maximum Temperature ◽

Historical Period ◽

Data Sets ◽

Monthly Precipitation ◽

Time Data ◽

Data Set

Abstract. HydroGFD3 (Hydrological Global Forcing Data) is a data set of bias-adjusted reanalysis data for daily precipitation and minimum, mean, and maximum temperature. It is mainly intended for large-scale hydrological modelling but is also suitable for other impact modelling. The data set has an almost global land area coverage, excluding the Antarctic continent and small islands, at a horizontal resolution of 0.25∘, i.e. about 25 km. It is available for the complete ERA5 reanalysis time period, currently 1979 until 5 d ago. This period will be extended back to 1950 once the back catalogue of ERA5 is available. The historical period is adjusted using global gridded observational data sets, and to acquire real-time data, a collection of several reference data sets is used. Consistency in time is attempted by relying on a background climatology and only making use of anomalies from the different data sets. Precipitation is adjusted for mean bias as well as the number of wet days in a month. The latter is relying on a calibrated statistical method with input only of the monthly precipitation anomaly such that no additional input data about the number of wet days are necessary. The daily mean temperature is adjusted toward the monthly mean of the observations and applied to 1 h time steps of the ERA5 reanalysis. Daily mean, minimum, and maximum temperature are then calculated. The performance of the HydroGFD3 data set is on par with other similar products, although there are significant differences in different parts of the globe, especially where observations are uncertain. Further, HydroGFD3 tends to have higher precipitation extremes, partly due to its higher spatial resolution. In this paper, we present the methodology, evaluation results, and how to access the data set at https://doi.org/10.5281/zenodo.3871707 (Berg et al., 2020).

Download Full-text