scholarly journals PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning

2021 ◽  
Vol 2 (3) ◽  
pp. 1-28
Author(s):  
Jie Song ◽  
Qiang He ◽  
Feifei Chen ◽  
Ye Yuan ◽  
Ge Yu

In big data query processing, there is a trade-off between query accuracy and query efficiency, for example, sampling query approaches trade-off query completeness for efficiency. In this article, we argue that query performance can be significantly improved by slightly losing the possibility of query completeness, that is, the chance that a query is complete. To quantify the possibility, we define a new concept, Probability of query Completeness (hereinafter referred to as PC). For example, If a query is executed 100 times, PC = 0.95 guarantees that there are no more than 5 incomplete results among 100 results. Leveraging the probabilistic data placement and scanning, we trade off PC for query performance. In the article, we propose PoBery (POssibly-complete Big data quERY), a method that supports neither complete queries nor incomplete queries, but possibly-complete queries. The experimental results conducted on HiBench prove that PoBery can significantly accelerate queries while ensuring the PC. Specifically, it is guaranteed that the percentage of complete queries is larger than the given PC confidence. Through comparison with state-of-the-art key-value stores, we show that while Drill-based PoBery performs as fast as Drill on complete queries, it is 1.7 ×, 1.1 ×, and 1.5 × faster on average than Drill, Impala, and Hive, respectively, on possibly-complete queries.

2021 ◽  
Vol 1828 (1) ◽  
pp. 012116
Author(s):  
Donglei Yan ◽  
Jiaxin Li ◽  
Shengnan Lei ◽  
Junri Tang ◽  
Kaiqi Kou ◽  
...  

2020 ◽  
Vol 22 (5) ◽  
pp. 51-55
Author(s):  
OLEG N. KORCHAGIN ◽  
◽  
ANASTASIA V. LYADSKAYA ◽  

The article is devoted to the current state of digitalization aimed at solving urgent problems of combating corruption in the field of public administration and private business sector. The work considers the experience of foreign countries and the influence of digital technologies on the fight against corruption. It is noted that the digitalization of public administration is becoming one of the decisive factors for increasing the efficiency of the anti-corruption system and improving management mechanisms. Big Data, if integrated and structured according to the given parameters, allows the implementation of legislative, law enforcement, control and supervisory and law enforcement activities reliably and transparently. Big Data tools allow us to analyze processes, identify dependencies and predict corruption risks. The author describes the most significant problems that complicate the transfer of offline technologies into the online environment. The paper analyzes promising directions for the development of digital technologies that would lead to solving the arising problems, as well as to implement tasks that previously seemed unreachable. The article also describes current developments in the field of collecting and managing large amounts of data, the “Internet of Things”, modern network architecture, and other advances in the field of IT; the work provides applied examples of their potential use in the field of combating corruption. The study gives reasons that, in the context of combating corruption, digitalization should be allocated in a separate area of activity that is controlled and regulated by the state.


2019 ◽  
Vol 13 (2) ◽  
pp. 14-31
Author(s):  
Mamdouh Alenezi ◽  
Muhammad Usama ◽  
Khaled Almustafa ◽  
Waheed Iqbal ◽  
Muhammad Ali Raza ◽  
...  

NoSQL-based databases are attractive to store and manage big data mainly due to high scalability and data modeling flexibility. However, security in NoSQL-based databases is weak which raises concerns for users. Specifically, security of data at rest is a high concern for the users deployed their NoSQL-based solutions on the cloud because unauthorized access to the servers will expose the data easily. There have been some efforts to enable encryption for data at rest for NoSQL databases. However, existing solutions do not support secure query processing, and data communication over the Internet and performance of the proposed solutions are also not good. In this article, the authors address NoSQL data at rest security concern by introducing a system which is capable to dynamically encrypt/decrypt data, support secure query processing, and seamlessly integrate with any NoSQL- based database. The proposed solution is based on a combination of chaotic encryption and Order Preserving Encryption (OPE). The experimental evaluation showed excellent results when integrated the solution with MongoDB and compared with the state-of-the-art existing work.


2021 ◽  
Vol 20 (3) ◽  
pp. 1-25
Author(s):  
Elham Shamsa ◽  
Alma Pröbstl ◽  
Nima TaheriNejad ◽  
Anil Kanduri ◽  
Samarjit Chakraborty ◽  
...  

Smartphone users require high Battery Cycle Life (BCL) and high Quality of Experience (QoE) during their usage. These two objectives can be conflicting based on the user preference at run-time. Finding the best trade-off between QoE and BCL requires an intelligent resource management approach that considers and learns user preference at run-time. Current approaches focus on one of these two objectives and neglect the other, limiting their efficiency in meeting users’ needs. In this article, we present UBAR, User- and Battery-aware Resource management, which considers dynamic workload, user preference, and user plug-in/out pattern at run-time to provide a suitable trade-off between BCL and QoE. UBAR personalizes this trade-off by learning the user’s habits and using that to satisfy QoE, while considering battery temperature and State of Charge (SOC) pattern to maximize BCL. The evaluation results show that UBAR achieves 10% to 40% improvement compared to the existing state-of-the-art approaches.


Author(s):  
Alexandru-Lucian Georgescu ◽  
Alessandro Pappalardo ◽  
Horia Cucu ◽  
Michaela Blott

AbstractThe last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted speech features with probabilistic frameworks and generated phone posteriors, to end-to-end (E2E) systems, that translate the raw waveform directly into words using one deep neural network (DNN). The transcription accuracy greatly increased, leading to ASR technology being integrated into many commercial applications. However, few of the existing ASR technologies are suitable for integration in embedded applications, due to their hard constrains related to computing power and memory usage. This overview paper serves as a guided tour through the recent literature on speech recognition and compares the most popular ASR implementations. The comparison emphasizes the trade-off between ASR performance and hardware requirements, to further serve decision makers in choosing the system which fits best their embedded application. To the best of our knowledge, this is the first study to provide this kind of trade-off analysis for state-of-the-art ASR systems.


2021 ◽  
Vol 29 ◽  
pp. 115-124
Author(s):  
Xinlu Wang ◽  
Ahmed A.F. Saif ◽  
Dayou Liu ◽  
Yungang Zhu ◽  
Jon Atli Benediktsson

BACKGROUND: DNA sequence alignment is one of the most fundamental and important operation to identify which gene family may contain this sequence, pattern matching for DNA sequence has been a fundamental issue in biomedical engineering, biotechnology and health informatics. OBJECTIVE: To solve this problem, this study proposes an optimal multi pattern matching with wildcards for DNA sequence. METHODS: This proposed method packs the patterns and a sliding window of texts, and the window slides along the given packed text, matching against stored packed patterns. RESULTS: Three data sets are used to test the performance of the proposed algorithm, and the algorithm was seen to be more efficient than the competitors because its operation is close to machine language. CONCLUSIONS: Theoretical analysis and experimental results both demonstrate that the proposed method outperforms the state-of-the-art methods and is especially effective for the DNA sequence.


Author(s):  
Xabier Rodríguez-Martínez ◽  
Enrique Pascual-San-José ◽  
Mariano Campoy-Quiles

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document