Implementing a distributed volumetric data analytics toolkit on apache spark

Author(s):  
Chao Chen ◽  
Yuzhong Yan ◽  
Lei Huang ◽  
Lijun Qian
2021 ◽  
Vol 12 ◽  
Author(s):  
Muhammad Usman Tariq ◽  
Muhammad Babar ◽  
Marc Poulin ◽  
Akmal Saeed Khattak ◽  
Mohammad Dahman Alshehri ◽  
...  

Intelligent big data analysis is an evolving pattern in the age of big data science and artificial intelligence (AI). Analysis of organized data has been very successful, but analyzing human behavior using social media data becomes challenging. The social media data comprises a vast and unstructured format of data sources that can include likes, comments, tweets, shares, and views. Data analytics of social media data became a challenging task for companies, such as Dailymotion, that have billions of daily users and vast numbers of comments, likes, and views. Social media data is created in a significant amount and at a tremendous pace. There is a very high volume to store, sort, process, and carefully study the data for making possible decisions. This article proposes an architecture using a big data analytics mechanism to efficiently and logically process the huge social media datasets. The proposed architecture is composed of three layers. The main objective of the project is to demonstrate Apache Spark parallel processing and distributed framework technologies with other storage and processing mechanisms. The social media data generated from Dailymotion is used in this article to demonstrate the benefits of this architecture. The project utilized the application programming interface (API) of Dailymotion, allowing it to incorporate functions suitable to fetch and view information. The API key is generated to fetch information of public channel data in the form of text files. Hive storage machinist is utilized with Apache Spark for efficient data processing. The effectiveness of the proposed architecture is also highlighted.


IEEE Network ◽  
2016 ◽  
Vol 30 (3) ◽  
pp. 22-29 ◽  
Author(s):  
Mohammad Abu Alsheikh ◽  
Dusit Niyato ◽  
Shaowei Lin ◽  
Hwee-pink Tan ◽  
Zhu Han

2020 ◽  
Vol 8 (6) ◽  
pp. 1609-1615

The constant innovations and rapid developments in the IT industry have revolutionized the thinking and mindset of the people throughout the world. Government departments have also been computerized to provide transparent, efficient and responsible government through e-governance. The government have been providing access to various websites or portal for filing complaints, uploading or downloading forms, pictures, data or PDFs to avail the government services. Enlightened citizens are frequently using the portal to access government services. Thus, the size and volume of data that need to be managed by government departments have been increasing drastically under e-governance. The traditional database management system is not designed to deal with such mix type of data. Moreover, the speed at which the e-governance generated data need to be processed is another big challenge being faced by traditional database system. All the abovesaid concerns can be solved by using the emerging technology - Big Data Analytics techniques. Big data analytic techniques can make the government more efficient and transparent by processing structured, unstructured or mixed types data at a great speed. In this paper, we shall understand the scenario for the need or the emergence of big data analytics in egovernance and knowhow of Apache Spark. This paper proposes a practical approach to integrate big data analytics with egovernance using Apache Spark. This paper also reflects how major issues of traditional database management system (mixed type datasets, speed and accuracy) can be resolved through the integration of big data analytics and e-governance.


Author(s):  
Li Chen ◽  
Lala Aicha Coulibaly

Data science and big data analytics are still at the center of computer science and information technology. Students and researchers not in computer science often found difficulties in real data analytics using programming languages such as Python and Scala, especially when they attempt to use Apache-Spark in cloud computing environments-Spark Scala and PySpark. At the same time, students in information technology could find it difficult to deal with the mathematical background of data science algorithms. To overcome these difficulties, this chapter will provide a practical guideline to different users in this area. The authors cover the main algorithms for data science and machine learning including principal component analysis (PCA), support vector machine (SVM), k-means, k-nearest neighbors (kNN), regression, neural networks, and decision trees. A brief description of these algorithms will be explained, and the related code will be selected to fit simple data sets and real data sets. Some visualization methods including 2D and 3D displays will be also presented in this chapter.


2019 ◽  
Vol 28 (1) ◽  
pp. 355-360
Author(s):  
Elhossiny Ibrahim ◽  
Marwa Shouman ◽  
Hanaa Torkey ◽  
Ezz El-din Hendan ◽  
Ayman EL-SAYED

Sign in / Sign up

Export Citation Format

Share Document