Experimental evaluation of ensemble classifiers for imbalance in Big Data

Big data-based clouds health-care and risk predictions based on ensemble classifiers and subjective projection

2016 IEEE 17th International Symposium on Computational Intelligence and Informatics (CINTI) ◽

10.1109/cinti.2016.7846371 ◽

2016 ◽

Cited By ~ 1

Author(s):

Hamido Fujita

Keyword(s):

Health Care ◽

Big Data ◽

Ensemble Classifiers

Download Full-text

Large Iterative Multitier Ensemble Classifiers for Security of Big Data

IEEE Transactions on Emerging Topics in Computing ◽

10.1109/tetc.2014.2316510 ◽

2014 ◽

Vol 2 (3) ◽

pp. 352-363 ◽

Cited By ~ 26

Author(s):

Jemal H. Abawajy ◽

Andrei Kelarev ◽

Morshed Chowdhury

Keyword(s):

Big Data ◽

Ensemble Classifiers

Download Full-text

An experimental evaluation of garbage collectors on big data applications

Proceedings of the VLDB Endowment ◽

10.14778/3303753.3303762 ◽

2019 ◽

Vol 12 (5) ◽

pp. 570-583 ◽

Cited By ~ 4

Author(s):

Lijie Xu ◽

Tian Guo ◽

Wensheng Dou ◽

Wei Wang ◽

Jun Wei

Keyword(s):

Big Data ◽

Experimental Evaluation ◽

Big Data Applications

Download Full-text

Experimental evaluation of two new GEP-based ensemble classifiers

Expert Systems with Applications ◽

10.1016/j.eswa.2011.02.135 ◽

2011 ◽

Vol 38 (9) ◽

pp. 10932-10939 ◽

Cited By ~ 16

Author(s):

Joanna Je¸drzejowicz ◽

Piotr Je¸drzejowicz

Keyword(s):

Experimental Evaluation ◽

Ensemble Classifiers

Download Full-text

SE-PSI: Fog/Cloud server-aided enhanced secure and effective private set intersection on scalable datasets with Bloom Filter

Mathematical Biosciences and Engineering ◽

10.3934/mbe.2022087 ◽

2021 ◽

Vol 19 (2) ◽

pp. 1861-1876

Author(s):

Shuo Qiu ◽

◽

Zheng Zhang ◽

Yanan Liu ◽

Hao Yan ◽

...

Keyword(s):

Big Data ◽

Experimental Evaluation ◽

High Efficiency ◽

Bloom Filter ◽

Medical System ◽

Set Intersection ◽

Cloud Server ◽

Private Set Intersection ◽

Enhance Efficiency ◽

Application Requirements

<abstract><p>Private Set Intersection (PSI), which is a hot topic in recent years, has been extensively utilized in credit evaluation, medical system and so on. However, with the development of big data era, the existing traditional PSI cannot meet the application requirements in terms of performance and scalability. In this work, we proposed two secure and effective PSI (SE-PSI) protocols on scalable datasets by leveraging deterministic encryption and Bloom Filter. Specially, our first protocol focuses on high efficiency and is secure under a semi-honest server, while the second protocol achieves security on an economic-driven malicious server and hides the set/intersection size to the server. With experimental evaluation, our two protocols need only around 15 and 24 seconds respectively over one million-element datasets. Moreover, as a novelty, a <italic>multi-round</italic> mechanism is proposed for the two protocols to improve the efficiency. The implementation demonstrates that our <italic>two-round</italic> mechanism can enhance efficiency by almost twice than two basic protocols.</p></abstract>

Download Full-text

Handling of Class Imbalanced Problem in Big Data Sets An Experimental Evaluation (UCPMOT)

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6si1.19 ◽

2018 ◽

Vol 06 (01) ◽

pp. 1-9

Author(s):

S.S. Patil ◽

S. P. Sonavane

Keyword(s):

Big Data ◽

Experimental Evaluation ◽

Data Sets

Download Full-text

Towards HPC and Big Data Analytics Convergence: Design and Experimental Evaluation of a HPDA Framework for eScience at Scale

IEEE Access ◽

10.1109/access.2021.3079139 ◽

2021 ◽

pp. 1-1

Author(s):

Donatello Elia ◽

Sandro Fiore ◽

Giovanni Aloisio

Keyword(s):

Big Data ◽

Experimental Evaluation ◽

Data Analytics ◽

Big Data Analytics

Download Full-text

New and Efficient Algorithms for Producing Frequent Itemsets with the Map-Reduce Framework

Algorithms ◽

10.3390/a11120194 ◽

2018 ◽

Vol 11 (12) ◽

pp. 194

Author(s):

Yaron Gonen ◽

Ehud Gudes ◽

Kirill Kandalov

Keyword(s):

Data Mining ◽

Big Data ◽

Experimental Evaluation ◽

Distributed Databases ◽

Frequent Itemsets ◽

Parallel Architectures ◽

Efficient Algorithms ◽

Map Reduce ◽

Closed Frequent Itemsets ◽

New Algorithms

The Map-Reduce (MR) framework has become a popular framework for developing new parallel algorithms for Big Data. Efficient algorithms for data mining of big data and distributed databases has become an important problem. In this paper we focus on algorithms producing association rules and frequent itemsets. After reviewing the most recent algorithms that perform this task within the MR framework, we present two new algorithms: one algorithm for producing closed frequent itemsets, and the second one for producing frequent itemsets when the database is updated and new data is added to the old database. Both algorithms include novel optimizations which are suitable to the MR framework, as well as to other parallel architectures. A detailed experimental evaluation shows the effectiveness and advantages of the algorithms over existing methods when it comes to large distributed databases.

Download Full-text

Handling of Class Imbalanced Problem in Big Data Sets: An Experimental Evaluation (UCPMOT)

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6si2.19 ◽

2018 ◽

Vol 6 (2) ◽

pp. 1-9

Author(s):

S.S. Patil ◽

S. P. Sonavane

Keyword(s):

Big Data ◽

Experimental Evaluation ◽

Data Sets

Download Full-text

RGen: Data Generator for Benchmarking Big Data Workloads

Engineering Proceedings ◽

10.3390/engproc2021007013 ◽

2021 ◽

Vol 7 (1) ◽

pp. 13

Author(s):

Rubén Pérez-Jove ◽

Roberto R. Expósito ◽

Juan Touriño

Keyword(s):

Big Data ◽

Experimental Evaluation ◽

The Other ◽

Text Generation ◽

Other Hand ◽

Parallel Data ◽

Data Generator ◽

The One ◽

Graph Generation

This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.

Download Full-text