scholarly journals Experimental evaluation of ensemble classifiers for imbalance in Big Data

2021 ◽  
pp. 107447
Author(s):  
Mario Juez-Gil ◽  
Álvar Arnaiz-González ◽  
Juan J. Rodríguez ◽  
César García-Osorio
2014 ◽  
Vol 2 (3) ◽  
pp. 352-363 ◽  
Author(s):  
Jemal H. Abawajy ◽  
Andrei Kelarev ◽  
Morshed Chowdhury

2019 ◽  
Vol 12 (5) ◽  
pp. 570-583 ◽  
Author(s):  
Lijie Xu ◽  
Tian Guo ◽  
Wensheng Dou ◽  
Wei Wang ◽  
Jun Wei

2011 ◽  
Vol 38 (9) ◽  
pp. 10932-10939 ◽  
Author(s):  
Joanna Je¸drzejowicz ◽  
Piotr Je¸drzejowicz

2021 ◽  
Vol 19 (2) ◽  
pp. 1861-1876
Author(s):  
Shuo Qiu ◽  
◽  
Zheng Zhang ◽  
Yanan Liu ◽  
Hao Yan ◽  
...  

<abstract><p>Private Set Intersection (PSI), which is a hot topic in recent years, has been extensively utilized in credit evaluation, medical system and so on. However, with the development of big data era, the existing traditional PSI cannot meet the application requirements in terms of performance and scalability. In this work, we proposed two secure and effective PSI (SE-PSI) protocols on scalable datasets by leveraging deterministic encryption and Bloom Filter. Specially, our first protocol focuses on high efficiency and is secure under a semi-honest server, while the second protocol achieves security on an economic-driven malicious server and hides the set/intersection size to the server. With experimental evaluation, our two protocols need only around 15 and 24 seconds respectively over one million-element datasets. Moreover, as a novelty, a <italic>multi-round</italic> mechanism is proposed for the two protocols to improve the efficiency. The implementation demonstrates that our <italic>two-round</italic> mechanism can enhance efficiency by almost twice than two basic protocols.</p></abstract>


Algorithms ◽  
2018 ◽  
Vol 11 (12) ◽  
pp. 194
Author(s):  
Yaron Gonen ◽  
Ehud Gudes ◽  
Kirill Kandalov

The Map-Reduce (MR) framework has become a popular framework for developing new parallel algorithms for Big Data. Efficient algorithms for data mining of big data and distributed databases has become an important problem. In this paper we focus on algorithms producing association rules and frequent itemsets. After reviewing the most recent algorithms that perform this task within the MR framework, we present two new algorithms: one algorithm for producing closed frequent itemsets, and the second one for producing frequent itemsets when the database is updated and new data is added to the old database. Both algorithms include novel optimizations which are suitable to the MR framework, as well as to other parallel architectures. A detailed experimental evaluation shows the effectiveness and advantages of the algorithms over existing methods when it comes to large distributed databases.


2021 ◽  
Vol 7 (1) ◽  
pp. 13
Author(s):  
Rubén Pérez-Jove ◽  
Roberto R. Expósito ◽  
Juan Touriño

This paper presents RGen, a parallel data generator for benchmarking Big Data workloads, which integrates existing features and new functionalities in a standalone tool. The main functionalities developed in this work were the generation of text and graphs that meet the characteristics defined by the 4 Vs of Big Data. On the one hand, the LDA model has been used for text generation, which extracts topics or themes covered in a series of documents. On the other hand, graph generation is based on the Kronecker model. The experimental evaluation carried out on a 16-node cluster has shown that RGen provides very good weak and strong scalability results. RGen is publicly available to download at https://github.com/rubenperez98/RGen, accessed on 30 September 2021.


Sign in / Sign up

Export Citation Format

Share Document