scholarly journals MapReduce Algorithm for Variants of Skyline Queries: Skyband and Dominating Queries

Algorithms ◽  
2019 ◽  
Vol 12 (8) ◽  
pp. 166
Author(s):  
Md. Anisuzzaman Siddique ◽  
Hao Tian ◽  
Mahboob Qaosar ◽  
Yasuhiko Morimoto

The skyline query and its variant queries are useful functions in the early stages of a knowledge-discovery processes. The skyline query and its variant queries select a set of important objects, which are better than other common objects in the dataset. In order to handle big data, such knowledge-discovery queries must be computed in parallel distributed environments. In this paper, we consider an efficient parallel algorithm for the “K-skyband query” and the “top-k dominating query”, which are popular variants of skyline query. We propose a method for computing both queries simultaneously in a parallel distributed framework called MapReduce, which is a popular framework for processing “big data” problems. Our extensive evaluation results validate the effectiveness and efficiency of the proposed algorithm on both real and synthetic datasets.

Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 500 ◽  
Author(s):  
Ping Sun ◽  
Caimei Liang ◽  
Guohui Li ◽  
Ling Yuan

This paper aims to answer “why-not” questions in skyline queries based on the orthogonal query range (i.e., ORSQ). These queries retrieve skyline points within a rectangular query range, which improves query efficiency. Answering why-not questions in ORSQ can help users analyze query results and make decisions. We discuss the causes of why-not questions in ORSQ. Then, we outline how to modify the why-not point and the orthogonal query range so that the why-not point is included in the result of the skyline query based on the orthogonal range. When the why-not point is in the orthogonal range, we show how to modify the why-not point and narrow the orthogonal range. We also present how to expand the orthogonal range when the why-not point is not in the orthogonal range. We effectively combine query refinement and data modification techniques to produce meaningful answers. The experimental results demonstrate that the proposed algorithms have high-quality explanations for why-not questions in ORSQ in the real and synthetic datasets.


2021 ◽  
Vol 14 (11) ◽  
pp. 2244-2257
Author(s):  
Otmar Ertl

MinHash and HyperLogLog are sketching algorithms that have become indispensable for set summaries in big data applications. While HyperLogLog allows counting different elements with very little space, MinHash is suitable for the fast comparison of sets as it allows estimating the Jaccard similarity and other joint quantities. This work presents a new data structure called SetSketch that is able to continuously fill the gap between both use cases. Its commutative and idempotent insert operation and its mergeable state make it suitable for distributed environments. Fast, robust, and easy-to-implement estimators for cardinality and joint quantities, as well as the ability to use SetSketch for similarity search, enable versatile applications. The presented joint estimator can also be applied to other data structures such as MinHash, HyperLogLog, or Hyper-MinHash, where it even performs better than the corresponding state-of-the-art estimators in many cases.


Author(s):  
Xingxing Xiao ◽  
Jianzhong Li

Nowadays, big data is coming to the force in a lot of applications. Processing a skyline query on big data in more than linear time is by far too expensive and often even linear time may be too slow. It is obviously not possible to compute an exact solution to a skyline query in sublinear time, since an exact solution may itself have linear size. Fortunately, in many situations, a fast approximate solution is more useful than a slower exact solution. This paper proposes two sampling-based approximate algorithms for processing skyline queries. The first algorithm obtains a fixed size sample and computes the approximate skyline on it. The error of the algorithm is not only relatively small in most cases, but also is almost unaffected by the input size. The second algorithm returns an [Formula: see text]-approximation for the exact skyline efficiently. The running time of the algorithm has nothing to do with the input size in practical, achieving the goal of sublinearity on big data. Experiments verify the error analysis of the first algorithm, and show that the second is much faster than the existing skyline algorithms.


Author(s):  
Yue Liu ◽  
Hongyan Bai

With the development of the big data era and the opening of translation majors in colleges and universities, translation teaching is gradually receiving attention. However, there are still many problems in the training of translators in colleges and universities in terms of teachers, teaching time and teaching mode. In the context of the era of big data, this article uses questionnaires and data analysis, starting from the PACTE translation ability model, combined with constructivist learning theory, blended learning theory, and instructional design theory to analyze the problems of undergraduate translation ability. This article conducts a questionnaire survey on the 2018 students of XX University’s a major, and analyzes their English scores. Students’ bilingual ability is weak, and it is difficult to consider translation under the influence of context in the translation process; their strategic ability is not ideal, and they lack the ability to solve problems when they encounter specific translation problems. The English performance of the experimental class students who have undergone English translation teaching for one semester is significantly better than the control class students who have not received English translation teaching. Teachers can combine teaching theories to design English translation teaching and cultivate students’ awareness of comparative analysis in English learning. Teachers can cultivate students’ English thinking ability, promote them to master English better, and help them improve their English application ability.


2016 ◽  
Vol 16 (6) ◽  
pp. 245-255 ◽  
Author(s):  
Li Xie ◽  
Wenbo Zhou ◽  
Yaosen Li

Abstract In the era of big data, people have to face information filtration problem. For those cases when users do not or cannot express their demands clearly, recommender system can analyse user’s information more proactive and intelligent to filter out something users want. This property makes recommender system play a very important role in the field of e-commerce, social network and so on. The collaborative filtering recommendation algorithm based on Alternating Least Squares (ALS) is one of common algorithms using matrix factorization technique of recommendation system. In this paper, we design the parallel implementation process of the recommendation algorithm based on Spark platform and the related technology research of recommendation systems. Because of the shortcomings of the recommendation algorithm based on ALS model, a new loss function is designed. Before the model is trained, the similarity information of users and items is fused. The experimental results show that the performance of the proposed algorithm is better than that of algorithm based on ALS.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Asmaa El Hannani ◽  
Rahhal Errattahi ◽  
Fatima Zahra Salmam ◽  
Thomas Hain ◽  
Hassan Ouahmane

AbstractSpeech based human-machine interaction and natural language understanding applications have seen a rapid development and wide adoption over the last few decades. This has led to a proliferation of studies that investigate Error detection and classification in Automatic Speech Recognition (ASR) systems. However, different data sets and evaluation protocols are used, making direct comparisons of the proposed approaches (e.g. features and models) difficult. In this paper we perform an extensive evaluation of the effectiveness and efficiency of state-of-the-art approaches in a unified framework for both errors detection and errors type classification. We make three primary contributions throughout this paper: (1) we have compared our Variant Recurrent Neural Network (V-RNN) model with three other state-of-the-art neural based models, and have shown that the V-RNN model is the most effective classifier for ASR error detection in term of accuracy and speed, (2) we have compared four features’ settings, corresponding to different categories of predictor features and have shown that the generic features are particularly suitable for real-time ASR error detection applications, and (3) we have looked at the post generalization ability of our error detection framework and performed a detailed post detection analysis in order to perceive the recognition errors that are difficult to detect.


2018 ◽  
Vol 10 (12) ◽  
pp. 4863 ◽  
Author(s):  
Chao Huang ◽  
Longpeng Cao ◽  
Nanxin Peng ◽  
Sijia Li ◽  
Jing Zhang ◽  
...  

Photovoltaic (PV) modules convert renewable and sustainable solar energy into electricity. However, the uncertainty of PV power production brings challenges for the grid operation. To facilitate the management and scheduling of PV power plants, forecasting is an essential technique. In this paper, a robust multilayer perception (MLP) neural network was developed for day-ahead forecasting of hourly PV power. A generic MLP is usually trained by minimizing the mean squared loss. The mean squared error is sensitive to a few particularly large errors that can lead to a poor estimator. To tackle the problem, the pseudo-Huber loss function, which combines the best properties of squared loss and absolute loss, was adopted in this paper. The effectiveness and efficiency of the proposed method was verified by benchmarking against a generic MLP network with real PV data. Numerical experiments illustrated that the proposed method performed better than the generic MLP network in terms of root mean squared error (RMSE) and mean absolute error (MAE).


Sign in / Sign up

Export Citation Format

Share Document