Big Data Performance in Private Clouds. Some Initial Findings on Apache Spark Clusters Deployed in OpenStack

Mapping Intimacies ◽

10.1109/roedunet54112.2021.9638296 ◽

2021 ◽

Author(s):

Marin Fotache ◽

Marius-Iulian Cluci

Keyword(s):

Big Data ◽

Download Full-text

Exploring Apache Spark Data APIs for Water Big Data Management

Advances in Intelligent Systems and Computing - Advanced Intelligent Systems for Sustainable Development (AI2SD’2018) ◽

10.1007/978-3-030-11881-5_10 ◽

2019 ◽

pp. 105-117

Author(s):

Nassif El Hassane ◽

Hicham Hajji

Keyword(s):

Big Data ◽

Data Management ◽

Download Full-text

Big data Predictive Analytics for Apache Spark using Machine Learning

2020 Global Conference on Wireless and Optical Technologies (GCWOT) ◽

10.1109/gcwot49901.2020.9391620 ◽

2020 ◽

Author(s):

Muhammad Junaid ◽

Shiraz Ali Wagan ◽

Nawab Muhammad Faseeh Qureshi ◽

Choon Sung Nam ◽

Dong Ryeol Shin

Keyword(s):

Machine Learning ◽

Big Data ◽

Predictive Analytics ◽

Download Full-text

Approx-SMOTE: Fast SMOTE for Big Data on Apache Spark

Neurocomputing ◽

10.1016/j.neucom.2021.08.086 ◽

2021 ◽

Vol 464 ◽

pp. 432-437

Author(s):

Mario Juez-Gil ◽

Álvar Arnaiz-González ◽

Juan J. Rodríguez ◽

Carlos López-Nozal ◽

César García-Osorio

Keyword(s):

Big Data ◽

Download Full-text

Sentiment Classification Using Paragraph Vector and Cognitive Big Data Semantics on Apache Spark

2018 IEEE 17th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC) ◽

10.1109/icci-cc.2018.8482085 ◽

2018 ◽

Author(s):

Kumar Ravi ◽

Vadlamani Ravi ◽

B. Shivakrishna

Keyword(s):

Big Data ◽

Sentiment Classification ◽

Apache Spark ◽

Data Semantics ◽

Paragraph Vector

Download Full-text

SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b2-343-2016 ◽

2016 ◽

Vol XLI-B2 ◽

pp. 343-348

Author(s):

J. Boehm ◽

K. Liu ◽

C. Alis

Keyword(s):

Big Data ◽

Point Cloud ◽

Point Clouds ◽

Geospatial Data ◽

Apache Spark ◽

Binary File ◽

Data Framework ◽

File Formats ◽

In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.

Download Full-text

Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification

10.1145/3481646.3481649 ◽

2021 ◽

Author(s):

Taha Tekdogan ◽

Ali Cakmak

Keyword(s):

Big Data ◽

Data Classification ◽

Apache Spark ◽

Hadoop Mapreduce ◽

Big Data Classification

Download Full-text

Big Data Platform for Oil and Gas Production Based on Apache Spark

Modern Industrial IoT, Big Data and Supply Chain - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-33-6141-6_14 ◽

2021 ◽

pp. 129-141

Author(s):

Peng Qing ◽

Yi Li ◽

Shuqin Luo ◽

Zhuoqun Xu

Keyword(s):

Big Data ◽

Oil And Gas ◽

Gas Production ◽

Apache Spark ◽

Oil And Gas Production ◽

Download Full-text

A big data driven distributed density based hesitant fuzzy clustering using Apache spark with application to gene expression microarray

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2019.01.006 ◽

2019 ◽

Vol 79 ◽

pp. 100-113 ◽

Author(s):

Behrooz Hosseini ◽

Kourosh Kiani

Keyword(s):

Gene Expression ◽

Big Data ◽

Fuzzy Clustering ◽

Apache Spark ◽

Data Driven ◽

Gene Expression Microarray ◽

Expression Microarray

Download Full-text

An Overview of Apache Spark for Structured Big Data and Simulation of Enhancing Security Using Asymmetric Cryptosystems

2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2) ◽

10.1109/ic4me2.2018.8465666 ◽

2018 ◽

Author(s):

Abu Zafar Md. Nuruzzaman Abir ◽

Kazi Md. Rokibul Alam ◽

Nahid Hasan Kakon ◽

Yasuhiko Morimoto

Keyword(s):

Big Data ◽

Download Full-text

Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark

2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService) ◽

10.1109/bigdataservice.2016.34 ◽

2016 ◽

Author(s):

Neha Bharill ◽

Aruna Tiwari ◽

Aayushi Malviya

Keyword(s):

Big Data ◽

Clustering Algorithms ◽

Download Full-text