Big Data Query Engines

The amount of data in our industry and the world is exploding. Data is being collected and stored at unprecedented rates. The challenge is not only to store and manage the vast volume of data, which is also called big data, but also to analyze and query from it. In order to put forward the universal method to response mobile big data query, queries are separated and grouped according to kinds of query for massive mobile objects in the space. The indexing method for grouping the mobile objects with Grid (GG TPR-tree) has great efficiency to manage a massive capacity of mobile objects within a limited area, but it only could meet a part of requirements for mobile big data query if the GG TPR-tree was used solely. This thesis offers solutions to simple immediate query, simple continuous query, active window query, and continuous window query, dynamic condition query and other query requests by employing DTDI index structure. The experiments prove that with the support of DTDI index structure, query of massive mobile objects has higher precision and better query performance.

Download Full-text

Exploiting soft and hard correlations in big data query optimization

Proceedings of the VLDB Endowment ◽

10.14778/2994509.2994519 ◽

2016 ◽

Vol 9 (12) ◽

pp. 1005-1016 ◽

Cited By ~ 7

Author(s):

Hai Liu ◽

Dongqing Xiao ◽

Pankaj Didwania ◽

Mohamed Y. Eltabakh

Keyword(s):

Big Data ◽

Query Optimization ◽

Data Query

Download Full-text

Using model Driven Engineering to transform Big Data query languages to MapReduce jobs

International Journal of Computing and Digital Systems ◽

10.12785/ijcds/100160 ◽

2021 ◽

Vol 10 (1) ◽

pp. 619-628

Author(s):

Allae Erraissi

Keyword(s):

Big Data ◽

Query Languages ◽

Model Driven Engineering ◽

Model Driven ◽

Data Query

Download Full-text

Analysis and Research on Big Data Query Technology Based on NOSQL Database

2018 4th World Conference on Control, Electronics and Computer Engineering (WCCECE 2018) ◽

10.25236/wccece.2018.02 ◽

2018 ◽

Keyword(s):

Big Data ◽

Data Query ◽

Nosql Database

Download Full-text

Business-oriented customized big data query system and its SQL parser design and implementation

MATEC Web of Conferences ◽

10.1051/matecconf/201823201004 ◽

2018 ◽

Vol 232 ◽

pp. 01004

Author(s):

Wenshuai Ge ◽

Gang He ◽

Xinwen Liu

Keyword(s):

Big Data ◽

User Behavior ◽

The Internet ◽

Query Interface ◽

User Behavior Analysis ◽

Design And Implementation ◽

Data Query ◽

Query System ◽

Internet User Behavior ◽

Analysis Platform

This paper proposes a big data query system for customized queries based on specific business needs. This paper introduces the components and structure of the query system. ANTLR tools are used as language recognizer to design and implement a customized SQL dialect. The system builds a simpler and easier query interface on Spark SQL, which satisfies the query requirements of the Internet user behavior analysis platform.

Download Full-text

Self-adaptive Based Model for Ambiguity Resolution of The Linked Data Query for Big Data Analytics

International Journal of Integrated Engineering ◽

10.30880/ijie.2018.10.06.025 ◽

2018 ◽

Vol 10 (6) ◽

Author(s):

Nurfadhlina Mohd Sharef ◽

◽

Yasser M. Shafazand ◽

Mohd Zakree Ahmad Nazri ◽

Nor Azura Husin ◽

...

Keyword(s):

Big Data ◽

Ambiguity Resolution ◽

Linked Data ◽

Data Analytics ◽

Big Data Analytics ◽

Data Query ◽

Self Adaptive

Download Full-text

Large-Scale Data Mining and Distributed Processing in Big Data Internet

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.4594 ◽

2014 ◽

Vol 989-994 ◽

pp. 4594-4597

Author(s):

Chun Zhi Xing

Keyword(s):

Data Mining ◽

Big Data ◽

Decision Tree ◽

Large Scale ◽

Distributed Processing ◽

Processing Method ◽

Decision Tree Algorithm ◽

Data Query ◽

Large Scale Data ◽

Scale Data

With the development of Internet, various Internet-based large-scale data are facing increasing competition. With the hope of satisfying the need of data query, it is necessary to use data mining and distributed processing. As a consequence, this paper proposes a large-scale data mining and distributed processing method based on decision tree algorithm.

Download Full-text

Research of CouchDB Storage Plugin for Big Data Query Engine Apache Drill

Big Data - Communications in Computer and Information Science ◽

10.1007/978-981-15-1899-7_16 ◽

2019 ◽

pp. 224-239

Author(s):

Yulei Liao ◽

Liang Tan

Keyword(s):

Big Data ◽

Data Query ◽

Query Engine

Download Full-text

Big Data Query Optimization -Literature Survey

10.21203/rs.3.rs-655386/v1 ◽

2021 ◽

Author(s):

Anuja S. ◽

Malathy C.

Keyword(s):

Big Data ◽

Query Optimization ◽

Small Businesses ◽

Large Data ◽

Efficient Manner ◽

Raw Data ◽

Data Query ◽

Big Data Applications ◽

Large Databases ◽

Private And Public

Abstract In today's world, most of the private and public sector organizations deal with massive amounts of raw data, which includes information and knowledge in their secret layer. In addition, the format, scale, variety, and velocity of generated data make it more difficult to use the algorithms in an efficient manner. This complexity necessitates the use of sophisticated methods, strategies, and algorithms to solve the challenges of managing raw data. Big data query optimization (BDQO) requires businesses to define, diagnose, forecast, prescribe, and cognize hidden growth opportunities and guiding them toward achieving market value. BDQO uses advanced analytical methods to extract information from an increasingly growing volume of data, resulting in a reduction in the difficulty of the decision-making process. Hadoop, Apache Hive, No SQL, Map Reduce, and HPCC are the technologies used in big data applications to manage large data. It is less costly to consume data for query processing because big data provides scalability. However, small businesses will never be able to query large databases. Joining tables with millions of tuples could take hours. Parallelism, which solves the problem by using more processors, may be a potential solution. Unfortunately, small businesses cannot afford to operate on a shoestring budget. There are many techniques to tackle the problem. The technologies used in the big data query optimization process are discussed in depth in this paper.

Download Full-text

PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning

ACM/IMS Transactions on Data Science ◽

10.1145/3465375 ◽

2021 ◽

Vol 2 (3) ◽

pp. 1-28

Author(s):

Jie Song ◽

Qiang He ◽

Feifei Chen ◽

Ye Yuan ◽

Ge Yu

Keyword(s):

Big Data ◽

Query Processing ◽

State Of The Art ◽

Data Placement ◽

Probabilistic Data ◽

Trade Off ◽

Query Performance ◽

Data Query ◽

Query Efficiency ◽

The Given

In big data query processing, there is a trade-off between query accuracy and query efficiency, for example, sampling query approaches trade-off query completeness for efficiency. In this article, we argue that query performance can be significantly improved by slightly losing the possibility of query completeness, that is, the chance that a query is complete. To quantify the possibility, we define a new concept, Probability of query Completeness (hereinafter referred to as PC). For example, If a query is executed 100 times, PC = 0.95 guarantees that there are no more than 5 incomplete results among 100 results. Leveraging the probabilistic data placement and scanning, we trade off PC for query performance. In the article, we propose PoBery (POssibly-complete Big data quERY), a method that supports neither complete queries nor incomplete queries, but possibly-complete queries. The experimental results conducted on HiBench prove that PoBery can significantly accelerate queries while ensuring the PC. Specifically, it is guaranteed that the percentage of complete queries is larger than the given PC confidence. Through comparison with state-of-the-art key-value stores, we show that while Drill-based PoBery performs as fast as Drill on complete queries, it is 1.7 ×, 1.1 ×, and 1.5 × faster on average than Drill, Impala, and Hive, respectively, on possibly-complete queries.

Download Full-text