scholarly journals Improving Lookup and Query Execution Performance in Distributed Big Data Systems using Cuckoo Filter

Author(s):  
Sharafat Ibn Mollah Mosharraf ◽  
Muhammad Abdullah Adnan

Abstract Performance is a critical concern when reading and writing data from billions of records stored in Big Data warehouse. We introduce two scopes for query performance improvement. One is to improve performance of lookup queries after data deletion in Big Data systems that use Eventual Consistency. We propose a scheme to improve lookup performance after data deletion by using Cuckoo Filter. Another scope for improvement is to avoid unnecessary network round-trip for querying in remote nodes in a distributed Big Data cluster when it is known that the nodes do not have requested partition of data. We propose a scheme using probabilistic filters that are looked up before querying remote nodes, so that queries resulting in no data can be skipped from passing through the network. We evaluate our schemes with Cassandra using real dataset and show that each scheme can improve performance of lookup queries for up to 100%.

2016 ◽  
Vol 14 (37) ◽  
pp. 23-44
Author(s):  
Sonia Ordóñez Salinas ◽  
Alba Consuelo Nieto Lemus

Until recently, the issue of analytical data was related to Data Warehouse, but due to the necessity of analyzing new types of unstructured data, both repetitive and non-repetitive, Big Data arises. Although this subject has been widely studied, there is not available a reference architecture for Big Data systems involved with the processing of large volumes of raw data, aggregated and non-aggregated. There are not complete proposals for managing the lifecycle of data or standardized terminology, even less a methodology supporting the design and development of that architecture. There are architectures in small-scale, industrial and product-oriented, which limit their scope to solutions for a company or group of companies, focused on technology but omitting the functionality. This paper explores the requirements for the formulation of an architectural model that supports the analysis and management of data: structured, repetitive and non-repetitive unstructured; there are some architectural proposals –industrial or technological type– to propose a logical model of multi-layered tiered architecture, which aims to respond to the requirements covering both Data Warehouse and Big Data.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mohammed Anouar Naoui ◽  
Brahim Lejdel ◽  
Mouloud Ayad ◽  
Abdelfattah Amamra ◽  
Okba kazar

PurposeThe purpose of this paper is to propose a distributed deep learning architecture for smart cities in big data systems.Design/methodology/approachWe have proposed an architectural multilayer to describe the distributed deep learning for smart cities in big data systems. The components of our system are Smart city layer, big data layer, and deep learning layer. The Smart city layer responsible for the question of Smart city components, its Internet of things, sensors and effectors, and its integration in the system, big data layer concerns data characteristics 10, and its distribution over the system. The deep learning layer is the model of our system. It is responsible for data analysis.FindingsWe apply our proposed architecture in a Smart environment and Smart energy. 10; In a Smart environment, we study the Toluene forecasting in Madrid Smart city. For Smart energy, we study wind energy foresting in Australia. Our proposed architecture can reduce the time of execution and improve the deep learning model, such as Long Term Short Memory10;.Research limitations/implicationsThis research needs the application of other deep learning models, such as convolution neuronal network and autoencoder.Practical implicationsFindings of the research will be helpful in Smart city architecture. It can provide a clear view into a Smart city, data storage, and data analysis. The 10; Toluene forecasting in a Smart environment can help the decision-maker to ensure environmental safety. The Smart energy of our proposed model can give a clear prediction of power generation.Originality/valueThe findings of this study are expected to contribute valuable information to decision-makers for a better understanding of the key to Smart city architecture. Its relation with data storage, processing, and data analysis.


Sign in / Sign up

Export Citation Format

Share Document