PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning

In big data query processing, there is a trade-off between query accuracy and query efficiency, for example, sampling query approaches trade-off query completeness for efficiency. In this article, we argue that query performance can be significantly improved by slightly losing the possibility of query completeness, that is, the chance that a query is complete. To quantify the possibility, we define a new concept, Probability of query Completeness (hereinafter referred to as PC). For example, If a query is executed 100 times, PC = 0.95 guarantees that there are no more than 5 incomplete results among 100 results. Leveraging the probabilistic data placement and scanning, we trade off PC for query performance. In the article, we propose PoBery (POssibly-complete Big data quERY), a method that supports neither complete queries nor incomplete queries, but possibly-complete queries. The experimental results conducted on HiBench prove that PoBery can significantly accelerate queries while ensuring the PC. Specifically, it is guaranteed that the percentage of complete queries is larger than the given PC confidence. Through comparison with state-of-the-art key-value stores, we show that while Drill-based PoBery performs as fast as Drill on complete queries, it is 1.7 ×, 1.1 ×, and 1.5 × faster on average than Drill, Impala, and Hive, respectively, on possibly-complete queries.

Download Full-text

Spatial Big Data Query Processing System Supporting SQL-based Query Language in Hadoop

The Journal of Korea Institute of Information Electronics and Communication Technology ◽

10.17661/jkiiect.2017.10.1.1 ◽

2017 ◽

Vol 10 (1) ◽

pp. 1-8

Author(s):

In-Hak Joo

Keyword(s):

Big Data ◽

Query Processing ◽

Query Language ◽

Processing System ◽

Data Query ◽

Spatial Big Data

Download Full-text

Materialized view selection using evolutionary algorithm for speeding up big data query processing

Journal of Intelligent Information Systems ◽

10.1007/s10844-017-0455-6 ◽

2017 ◽

Vol 49 (3) ◽

pp. 407-433 ◽

Cited By ~ 6

Author(s):

Rajib Goswami ◽

D. K Bhattacharyya ◽

Malayananda Dutta

Keyword(s):

Big Data ◽

Query Processing ◽

Evolutionary Algorithm ◽

View Selection ◽

Materialized View ◽

Data Query ◽

Materialized View Selection

Download Full-text

Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data ◽

10.1145/3318464.3380584 ◽

2020 ◽

Cited By ~ 3

Author(s):

Tarique Siddiqui ◽

Alekh Jindal ◽

Shi Qiao ◽

Hiren Patel ◽

Wangchao Le

Keyword(s):

Big Data ◽

Query Processing ◽

Cost Models ◽

Data Query

Download Full-text

Public Sentiment Big Data Query Processing and Optimization with Unified Storage of Source and Meta Data

Journal of Physics Conference Series ◽

10.1088/1742-6596/1828/1/012116 ◽

2021 ◽

Vol 1828 (1) ◽

pp. 012116

Author(s):

Donglei Yan ◽

Jiaxin Li ◽

Shengnan Lei ◽

Junri Tang ◽

Kaiqi Kou ◽

...

Keyword(s):

Big Data ◽

Query Processing ◽

Meta Data ◽

Data Query ◽

Public Sentiment ◽

Query Processing And Optimization

Download Full-text

Digitalization in the system of anti-corruption measures

Public Administration ◽

10.22394/2070-8378-2020-22-5-51-55 ◽

2020 ◽

Vol 22 (5) ◽

pp. 51-55

Author(s):

OLEG N. KORCHAGIN ◽

◽

ANASTASIA V. LYADSKAYA ◽

Keyword(s):

Big Data ◽

Law Enforcement ◽

Public Administration ◽

Network Architecture ◽

Digital Technologies ◽

Current State ◽

Foreign Countries ◽

Separate Area ◽

Private Business Sector ◽

The Given

The article is devoted to the current state of digitalization aimed at solving urgent problems of combating corruption in the field of public administration and private business sector. The work considers the experience of foreign countries and the influence of digital technologies on the fight against corruption. It is noted that the digitalization of public administration is becoming one of the decisive factors for increasing the efficiency of the anti-corruption system and improving management mechanisms. Big Data, if integrated and structured according to the given parameters, allows the implementation of legislative, law enforcement, control and supervisory and law enforcement activities reliably and transparently. Big Data tools allow us to analyze processes, identify dependencies and predict corruption risks. The author describes the most significant problems that complicate the transfer of offline technologies into the online environment. The paper analyzes promising directions for the development of digital technologies that would lead to solving the arising problems, as well as to implement tasks that previously seemed unreachable. The article also describes current developments in the field of collecting and managing large amounts of data, the “Internet of Things”, modern network architecture, and other advances in the field of IT; the work provides applied examples of their potential use in the field of combating corruption. The study gives reasons that, in the context of combating corruption, digitalization should be allocated in a separate area of activity that is controlled and regulated by the state.

Download Full-text

An Efficient, Secure, and Queryable Encryption for NoSQL-Based Databases Hosted on Untrusted Cloud Environments

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2019040102 ◽

2019 ◽

Vol 13 (2) ◽

pp. 14-31

Author(s):

Mamdouh Alenezi ◽

Muhammad Usama ◽

Khaled Almustafa ◽

Waheed Iqbal ◽

Muhammad Ali Raza ◽

...

Keyword(s):

Query Processing ◽

State Of The Art ◽

Data Communication ◽

Nosql Databases ◽

High Concern ◽

Cloud Environments ◽

High Scalability ◽

And Performance ◽

Secure Query Processing ◽

Security Concern

NoSQL-based databases are attractive to store and manage big data mainly due to high scalability and data modeling flexibility. However, security in NoSQL-based databases is weak which raises concerns for users. Specifically, security of data at rest is a high concern for the users deployed their NoSQL-based solutions on the cloud because unauthorized access to the servers will expose the data easily. There have been some efforts to enable encryption for data at rest for NoSQL databases. However, existing solutions do not support secure query processing, and data communication over the Internet and performance of the proposed solutions are also not good. In this article, the authors address NoSQL data at rest security concern by introducing a system which is capable to dynamically encrypt/decrypt data, support secure query processing, and seamlessly integrate with any NoSQL- based database. The proposed solution is based on a combination of chaotic encryption and Order Preserving Encryption (OPE). The experimental evaluation showed excellent results when integrated the solution with MongoDB and compared with the state-of-the-art existing work.

Download Full-text

UBAR

ACM Transactions on Embedded Computing Systems ◽

10.1145/3441644 ◽

2021 ◽

Vol 20 (3) ◽

pp. 1-25

Author(s):

Elham Shamsa ◽

Alma Pröbstl ◽

Nima TaheriNejad ◽

Anil Kanduri ◽

Samarjit Chakraborty ◽

...

Keyword(s):

Resource Management ◽

Quality Of Experience ◽

State Of The Art ◽

State Of Charge ◽

User Preference ◽

Management Approach ◽

High Quality ◽

Trade Off ◽

Run Time

Smartphone users require high Battery Cycle Life (BCL) and high Quality of Experience (QoE) during their usage. These two objectives can be conflicting based on the user preference at run-time. Finding the best trade-off between QoE and BCL requires an intelligent resource management approach that considers and learns user preference at run-time. Current approaches focus on one of these two objectives and neglect the other, limiting their efficiency in meeting users’ needs. In this article, we present UBAR, User- and Battery-aware Resource management, which considers dynamic workload, user preference, and user plug-in/out pattern at run-time to provide a suitable trade-off between BCL and QoE. UBAR personalizes this trade-off by learning the user’s habits and using that to satisfy QoE, while considering battery temperature and State of Charge (SOC) pattern to maximize BCL. The evaluation results show that UBAR achieves 10% to 40% improvement compared to the existing state-of-the-art approaches.

Download Full-text

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00217-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Alexandru-Lucian Georgescu ◽

Alessandro Pappalardo ◽

Horia Cucu ◽

Michaela Blott

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

Decision Makers ◽

Computing Power ◽

Trade Off ◽

Speech Features ◽

Commercial Applications ◽

Guided Tour ◽

Embedded Applications

AbstractThe last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted speech features with probabilistic frameworks and generated phone posteriors, to end-to-end (E2E) systems, that translate the raw waveform directly into words using one deep neural network (DNN). The transcription accuracy greatly increased, leading to ASR technology being integrated into many commercial applications. However, few of the existing ASR technologies are suitable for integration in embedded applications, due to their hard constrains related to computing power and memory usage. This overview paper serves as a guided tour through the recent literature on speech recognition and compares the most popular ASR implementations. The comparison emphasizes the trade-off between ASR performance and hardware requirements, to further serve decision makers in choosing the system which fits best their embedded application. To the best of our knowledge, this is the first study to provide this kind of trade-off analysis for state-of-the-art ASR systems.

Download Full-text

A novel optimal multi-pattern matching method with wildcards for DNA sequence

Technology and Health Care ◽

10.3233/thc-218012 ◽

2021 ◽

Vol 29 ◽

pp. 115-124

Author(s):

Xinlu Wang ◽

Ahmed A.F. Saif ◽

Dayou Liu ◽

Yungang Zhu ◽

Jon Atli Benediktsson

Keyword(s):

Dna Sequence ◽

Pattern Matching ◽

Health Informatics ◽

State Of The Art ◽

Machine Language ◽

Data Sets ◽

Fundamental Issue ◽

Matching Method ◽

Dna Sequence Alignment ◽

The Given

BACKGROUND: DNA sequence alignment is one of the most fundamental and important operation to identify which gene family may contain this sequence, pattern matching for DNA sequence has been a fundamental issue in biomedical engineering, biotechnology and health informatics. OBJECTIVE: To solve this problem, this study proposes an optimal multi pattern matching with wildcards for DNA sequence. METHODS: This proposed method packs the patterns and a sliding window of texts, and the window slides along the given packed text, matching against stored packed patterns. RESULTS: Three data sets are used to test the performance of the proposed algorithm, and the algorithm was seen to be more efficient than the competitors because its operation is close to machine language. CONCLUSIONS: Theoretical analysis and experimental results both demonstrate that the proposed method outperforms the state-of-the-art methods and is especially effective for the DNA sequence.

Download Full-text

Accelerating organic solar cell material's discovery: high-throughput screening and big data

Energy & Environmental Science ◽

10.1039/d1ee00559f ◽

2021 ◽

Author(s):

Xabier Rodríguez-Martínez ◽

Enrique Pascual-San-José ◽

Mariano Campoy-Quiles

Keyword(s):

Machine Learning ◽

Big Data ◽

High Throughput ◽

Organic Solar Cells ◽

High Throughput Screening ◽

Organic Solar Cell ◽

State Of The Art ◽

Review Article ◽

Machine Learning Algorithms ◽

Device Optimization

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.

Download Full-text