A Real-Time Log Analyzer Based on MongoDB

2014 ◽  
Vol 571-572 ◽  
pp. 497-501 ◽  
Author(s):  
Qi Lv ◽  
Wei Xie

Real-time log analysis on large scale data is important for applications. Specifically, real-time refers to UI latency within 100ms. Therefore, techniques which efficiently support real-time analysis over large log data sets are desired. MongoDB provides well query performance, aggregation frameworks, and distributed architecture which is suitable for real-time data query and massive log analysis. In this paper, a novel implementation approach for an event driven file log analyzer is presented, and performance comparison of query, scan and aggregation operations over MongoDB, HBase and MySQL is analyzed. Our experimental results show that HBase performs best balanced in all operations, while MongoDB provides less than 10ms query speed in some operations which is most suitable for real-time applications.

2021 ◽  
Vol 77 (2) ◽  
pp. 98-108
Author(s):  
R. M. Churchill ◽  
C. S. Chang ◽  
J. Choi ◽  
J. Wong ◽  
S. Klasky ◽  
...  

Author(s):  
Amir Basirat ◽  
Asad I. Khan ◽  
Heinz W. Schmidt

One of the main challenges for large-scale computer clouds dealing with massive real-time data is in coping with the rate at which unprocessed data is being accumulated. Transforming big data into valuable information requires a fundamental re-think of the way in which future data management models will need to be developed on the Internet. Unlike the existing relational schemes, pattern-matching approaches can analyze data in similar ways to which our brain links information. Such interactions when implemented in voluminous data clouds can assist in finding overarching relations in complex and highly distributed data sets. In this chapter, a different perspective of data recognition is considered. Rather than looking at conventional approaches, such as statistical computations and deterministic learning schemes, this chapter focuses on distributed processing approach for scalable data recognition and processing.


2012 ◽  
pp. 235-257
Author(s):  
Christopher Oehmen ◽  
Scott Dowson ◽  
Wes Hatley ◽  
Justin Almquist ◽  
Bobbie-Jo Webb-Robertson ◽  
...  

In Cloud based Big Data applications, Hadoop has been widely adopted for distributed processing large scale data sets. However, the wastage of energy consumption of data centers still constitutes an important axis of research due to overuse of resources and extra overhead costs. As a solution to overcome this challenge, a dynamic scaling of resources in Hadoop YARN Cluster is a practical solution. This paper proposes a dynamic scaling approach in Hadoop YARN (DSHYARN) to add or remove nodes automatically based on workload. It is based on two algorithms (scaling up/down) which are implemented to automate the scaling process in the cluster. This article aims to assure energy efficiency and performance of Hadoop YARN’ clusters. To validate the effectiveness of DSHYARN, a case study with sentiment analysis on tweets about covid-19 vaccine is provided. the goal is to analyze tweets of the people posted on Twitter application. The results showed improvement in CPU utilization, RAM utilization and Job Completion time. In addition, the energy has been reduced of 16% under average workload.


2014 ◽  
Vol 513-517 ◽  
pp. 1752-1755 ◽  
Author(s):  
Chun Liu ◽  
Kun Tan

For a safety critical computer, large-scale data like database which has to be transferred in an instant time cannot be voted directly. This paper proposes a database update algorithm for safety critical computer based on status vote,which is to vote the database status instead of database itself. This algorithm can solve the problem of voting too much data in a short time, and compare versions of database of different modules in real time. A Markov model is built to calculate the safety and reliability of this algorithm. The results show that this algorithm meets the update requirement of safety critical computer. 1. Communication protocol for database update 1.1 TFTP protocol TFTP is a simple protocol for transporting document. It usually uses the UDP protocol to realize but the TFTP does not require the specific agreement of implementation and can implement with TCP in special occasions. [This agreement is designed for small file transferring, so it doesn't have function many FTP usually does; it can only acquire or write the file from the server and not able tot list directory, not authenticate. It transfers 8 bits of data with three models: netascii, the eight-bit ASCII form; octet, the eight-bit source data type; mail, no longer supported, it returns the data back directly to the user rather than saved as a file. 1.2 SRTP Ethernet security real-time data transfer protocol


2004 ◽  
Author(s):  
Jiawan Zhang ◽  
Jizhou Sun ◽  
Xiaotu Li ◽  
Mingchu Li ◽  
Xiaobing Sun ◽  
...  

2020 ◽  
Author(s):  
Peter Berg ◽  
Fredrik Almén ◽  
Denica Bozhinova

Abstract. HydroGFD (Hydrological Global Forcing Data) is a data set of bias adjusted reanalysis data for daily precipitation, and minimum, mean, and maximum temperature. It is mainly intended for large scale hydrological modeling, but is also suitable for other impact modeling. The data set has an almost global land area coverage, excluding the Antarctic continent, at a horizontal resolution of 0.25°, i.e. about 25 km. It is available for the complete ERA5 reanalysis time period; currently 1979 until five days ago. This period will be extended back to 1950 once the back catalogue of ERA5 is available. The historical period is adjusted using global gridded observational data sets, and to acquire real-time data, a collection of several reference data sets is used. Consistency in time is attempted by relying on a background climatology, and only making use of anomalies from the different data sets. Precipitation is adjusted for mean bias as well as the number or wet days in a month. The latter is relying on a calibrated statistical method with input only of the monthly precipitation anomaly, such that no additional input data about the number of wet days is necessary. The daily mean temperature is adjusted toward the monthly mean of the observations, and applied to 1 h timesteps of the ERA5 reanalysis. Daily mean, minimum and maximum temperature are then calculated. The performance of the HydroGFD3 data set is on par with other similar products, although there are significant differences in different parts of the globe, especially where observations are uncertain. Further, HydroGFD3 tends to have higher precipitation extremes, partly due to its higher spatial resolution. In this paper, we present the methodology, evaluation results, and how to access to the data set at https://doi.org/10.5281/zenodo.3871707.


2021 ◽  
Vol 118 (5) ◽  
pp. e2003722118
Author(s):  
Stella Mazeri ◽  
Jordana L. Burdon Bailey ◽  
Dagmar Mayer ◽  
Patrick Chikungwa ◽  
Julius Chulu ◽  
...  

Rabies kills ∼60,000 people per year. Annual vaccination of at least 70% of dogs has been shown to eliminate rabies in both human and canine populations. However, delivery of large-scale mass dog vaccination campaigns remains a challenge in many rabies-endemic countries. In sub-Saharan Africa, where the vast majority of dogs are owned, mass vaccination campaigns have typically depended on a combination of static point (SP) and door-to-door (D2D) approaches since SP-only campaigns often fail to achieve 70% vaccination coverage. However, D2D approaches are expensive, labor-intensive, and logistically challenging, raising the need to develop approaches that increase attendance at SPs. Here, we report a real-time, data-driven approach to improve efficiency of an urban dog vaccination campaign. Historically, we vaccinated ∼35,000 dogs in Blantyre city, Malawi, every year over a 20-d period each year using combined fixed SP (FSP) and D2D approaches. To enhance cost effectiveness, we used our historical vaccination dataset to define the barriers to FSP attendance. Guided by these insights, we redesigned our vaccination campaign by increasing the number of FSPs and eliminating the expensive and labor-intensive D2D component. Combined with roaming SPs, whose locations were defined through the real-time analysis of vaccination coverage data, this approach resulted in the vaccination of near-identical numbers of dogs in only 11 d. This approach has the potential to act as a template for successful and sustainable future urban SP-only dog vaccination campaigns.


2021 ◽  
Vol 13 (4) ◽  
pp. 1531-1545
Author(s):  
Peter Berg ◽  
Fredrik Almén ◽  
Denica Bozhinova

Abstract. HydroGFD3 (Hydrological Global Forcing Data) is a data set of bias-adjusted reanalysis data for daily precipitation and minimum, mean, and maximum temperature. It is mainly intended for large-scale hydrological modelling but is also suitable for other impact modelling. The data set has an almost global land area coverage, excluding the Antarctic continent and small islands, at a horizontal resolution of 0.25∘, i.e. about 25 km. It is available for the complete ERA5 reanalysis time period, currently 1979 until 5 d ago. This period will be extended back to 1950 once the back catalogue of ERA5 is available. The historical period is adjusted using global gridded observational data sets, and to acquire real-time data, a collection of several reference data sets is used. Consistency in time is attempted by relying on a background climatology and only making use of anomalies from the different data sets. Precipitation is adjusted for mean bias as well as the number of wet days in a month. The latter is relying on a calibrated statistical method with input only of the monthly precipitation anomaly such that no additional input data about the number of wet days are necessary. The daily mean temperature is adjusted toward the monthly mean of the observations and applied to 1 h time steps of the ERA5 reanalysis. Daily mean, minimum, and maximum temperature are then calculated. The performance of the HydroGFD3 data set is on par with other similar products, although there are significant differences in different parts of the globe, especially where observations are uncertain. Further, HydroGFD3 tends to have higher precipitation extremes, partly due to its higher spatial resolution. In this paper, we present the methodology, evaluation results, and how to access the data set at https://doi.org/10.5281/zenodo.3871707 (Berg et al., 2020).


Sign in / Sign up

Export Citation Format

Share Document