Distributed classification for imbalanced big data in distributed environments

2021 ◽  
Author(s):  
Huihui Wang ◽  
Mingfei Xiao ◽  
Changsheng Wu ◽  
Jing Zhang
Algorithms ◽  
2019 ◽  
Vol 12 (8) ◽  
pp. 166
Author(s):  
Md. Anisuzzaman Siddique ◽  
Hao Tian ◽  
Mahboob Qaosar ◽  
Yasuhiko Morimoto

The skyline query and its variant queries are useful functions in the early stages of a knowledge-discovery processes. The skyline query and its variant queries select a set of important objects, which are better than other common objects in the dataset. In order to handle big data, such knowledge-discovery queries must be computed in parallel distributed environments. In this paper, we consider an efficient parallel algorithm for the “K-skyband query” and the “top-k dominating query”, which are popular variants of skyline query. We propose a method for computing both queries simultaneously in a parallel distributed framework called MapReduce, which is a popular framework for processing “big data” problems. Our extensive evaluation results validate the effectiveness and efficiency of the proposed algorithm on both real and synthetic datasets.


Author(s):  
K. Radha ◽  
B.Thirumala Rao ◽  
Shaik Masthan Babu ◽  
K.Thirupathi Rao ◽  
V.Krishna Reddy ◽  
...  

Now-a-days Most of the industries are having large volumes of data. Data has range of Tera bytes to Peta byte. Organizations are looking to handle the growth of data. Enterprises are using cloud deployments to address the big data and analytics with respect to the interaction between cloud and big data. This paper presents big data issues and research directions towards the ongoing work of processing of big data in the distributed environments.


2020 ◽  
pp. 776-789
Author(s):  
Wei Li ◽  
◽  
William W. Guo

In contrast to HPC clusters, when big data is processing in a distributed, particularly dynamic and opportunistic environment, the overall performance must be impaired and even bottlenecked by the dynamics of overlay and the opportunism of computing nodes. The dynamics and opportunism are caused by churn and unreliability of a generic distributed environment, and they cannot be ignored or avoided. Understanding impact factors, their impact strength and the relevance between these impacts is the foundation of potential optimization. This paper derives the research background, methodology and results by reasoning the necessity of distributed environments for big data processing, scrutinizing the dynamics and opportunism of distributed environments, classifying impact factors, proposing evaluation metrics and carrying out a series of intensive experiments. The result analysis of this paper provides important insights to the impact strength of the factors and the relevance of impact across the factors. The production of the results aims at paving a way to future optimization or avoidance of potential bottlenecks for big data processing in distributed environments.


2021 ◽  
Vol 14 (11) ◽  
pp. 2244-2257
Author(s):  
Otmar Ertl

MinHash and HyperLogLog are sketching algorithms that have become indispensable for set summaries in big data applications. While HyperLogLog allows counting different elements with very little space, MinHash is suitable for the fast comparison of sets as it allows estimating the Jaccard similarity and other joint quantities. This work presents a new data structure called SetSketch that is able to continuously fill the gap between both use cases. Its commutative and idempotent insert operation and its mergeable state make it suitable for distributed environments. Fast, robust, and easy-to-implement estimators for cardinality and joint quantities, as well as the ability to use SetSketch for similarity search, enable versatile applications. The presented joint estimator can also be applied to other data structures such as MinHash, HyperLogLog, or Hyper-MinHash, where it even performs better than the corresponding state-of-the-art estimators in many cases.


2017 ◽  
pp. 297-332 ◽  
Author(s):  
Alfredo Cuzzocrea ◽  
Carson Kai-Sang Leung ◽  
Fan Jiang ◽  
Richard Kyle MacKinnon

Sign in / Sign up

Export Citation Format

Share Document