Implementing a distributed volumetric data analytics toolkit on apache spark

Intelligent big data analysis is an evolving pattern in the age of big data science and artificial intelligence (AI). Analysis of organized data has been very successful, but analyzing human behavior using social media data becomes challenging. The social media data comprises a vast and unstructured format of data sources that can include likes, comments, tweets, shares, and views. Data analytics of social media data became a challenging task for companies, such as Dailymotion, that have billions of daily users and vast numbers of comments, likes, and views. Social media data is created in a significant amount and at a tremendous pace. There is a very high volume to store, sort, process, and carefully study the data for making possible decisions. This article proposes an architecture using a big data analytics mechanism to efficiently and logically process the huge social media datasets. The proposed architecture is composed of three layers. The main objective of the project is to demonstrate Apache Spark parallel processing and distributed framework technologies with other storage and processing mechanisms. The social media data generated from Dailymotion is used in this article to demonstrate the benefits of this architecture. The project utilized the application programming interface (API) of Dailymotion, allowing it to incorporate functions suitable to fetch and view information. The API key is generated to fetch information of public channel data in the form of text files. Hive storage machinist is utilized with Apache Spark for efficient data processing. The effectiveness of the proposed architecture is also highlighted.

Download Full-text

Capturing Data from Untapped Sources using Apache Spark for Big Data Analytics

The Transactions of The Korean Institute of Electrical Engineers ◽

10.5370/kiee.2016.65.7.1277 ◽

2016 ◽

Vol 65 (7) ◽

pp. 1277-1282

Author(s):

Aaron Nichie ◽

Heung-Seo Koo

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Apache Spark

Download Full-text

A Scalable Data Analytics Framework for Connected Vehicles Using Apache Spark

2021 International Symposium on Electrical, Electronics and Information Engineering ◽

10.1145/3459104.3459156 ◽

2021 ◽

Author(s):

Shumail Mohyuddin ◽

Christian Prehofer

Keyword(s):

Data Analytics ◽

Apache Spark ◽

Connected Vehicles

Download Full-text

Mobile big data analytics using deep learning and apache spark

IEEE Network ◽

10.1109/mnet.2016.7474340 ◽

2016 ◽

Vol 30 (3) ◽

pp. 22-29 ◽

Cited By ~ 115

Author(s):

Mohammad Abu Alsheikh ◽

Dusit Niyato ◽

Shaowei Lin ◽

Hwee-pink Tan ◽

Zhu Han

Keyword(s):

Big Data ◽

Deep Learning ◽

Data Analytics ◽

Big Data Analytics ◽

Apache Spark ◽

Mobile Big Data

Download Full-text

A Study of Big Data Analytics using Apache Spark with Python and Scala

2020 3rd International Conference on Intelligent Sustainable Systems (ICISS) ◽

10.1109/iciss49785.2020.9315863 ◽

2020 ◽

Author(s):

Yogesh Kumar Gupta ◽

Surbhi Kumari

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Apache Spark

Download Full-text

Integrating E-Governance with Big Data Analytics using Apache Spark

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7820.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1609-1615

Keyword(s):

Big Data ◽

Management System ◽

Database Management ◽

Data Analytics ◽

Big Data Analytics ◽

Database Management System ◽

Apache Spark ◽

World Government ◽

Government Services ◽

The Government

The constant innovations and rapid developments in the IT industry have revolutionized the thinking and mindset of the people throughout the world. Government departments have also been computerized to provide transparent, efficient and responsible government through e-governance. The government have been providing access to various websites or portal for filing complaints, uploading or downloading forms, pictures, data or PDFs to avail the government services. Enlightened citizens are frequently using the portal to access government services. Thus, the size and volume of data that need to be managed by government departments have been increasing drastically under e-governance. The traditional database management system is not designed to deal with such mix type of data. Moreover, the speed at which the e-governance generated data need to be processed is another big challenge being faced by traditional database system. All the abovesaid concerns can be solved by using the emerging technology - Big Data Analytics techniques. Big data analytic techniques can make the government more efficient and transparent by processing structured, unstructured or mixed types data at a great speed. In this paper, we shall understand the scenario for the need or the emergence of big data analytics in egovernance and knowhow of Apache Spark. This paper proposes a practical approach to integrate big data analytics with egovernance using Apache Spark. This paper also reflects how major issues of traditional database management system (mixed type datasets, speed and accuracy) can be resolved through the integration of big data analytics and e-governance.

Download Full-text

Effective Selection of Machine Learning Algorithms for Big Data Analytics Using Apache Spark

Advances in Intelligent Systems and Computing - Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016 ◽

10.1007/978-3-319-48308-5_66 ◽

2016 ◽

pp. 692-704 ◽

Cited By ~ 4

Author(s):

Manar Mohamed Hafez ◽

Mohamed Elemam Shehab ◽

Essam El Fakharany ◽

Abd El Ftah Abdel Ghfar Hegazy

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analytics ◽

Learning Algorithms ◽

Big Data Analytics ◽

Machine Learning Algorithms ◽

Apache Spark ◽

Effective Selection ◽

Selection Of

Download Full-text

Data Science and Big Data Practice Using Apache Spark and Python

Advances in Data Mining and Database Management - Intelligent Analytics With Advanced Multi-Industry Applications ◽

10.4018/978-1-7998-4963-6.ch004 ◽

2021 ◽

pp. 67-95

Author(s):

Li Chen ◽

Lala Aicha Coulibaly

Keyword(s):

Information Technology ◽

Big Data ◽

Computer Science ◽

Data Analytics ◽

Data Science ◽

Principal Component ◽

Real Data ◽

Apache Spark ◽

Data Sets ◽

Information Technology Students

Data science and big data analytics are still at the center of computer science and information technology. Students and researchers not in computer science often found difficulties in real data analytics using programming languages such as Python and Scala, especially when they attempt to use Apache-Spark in cloud computing environments-Spark Scala and PySpark. At the same time, students in information technology could find it difficult to deal with the mathematical background of data science algorithms. To overcome these difficulties, this chapter will provide a practical guideline to different users in this area. The authors cover the main algorithms for data science and machine learning including principal component analysis (PCA), support vector machine (SVM), k-means, k-nearest neighbors (kNN), regression, neural networks, and decision trees. A brief description of these algorithms will be explained, and the related code will be selected to fit simple data sets and real data sets. Some visualization methods including 2D and 3D displays will be also presented in this chapter.

Download Full-text