Cloud Computing-Based Socially Important Locations Discovery on Social Media Big Datasets

2020 ◽  
Vol 19 (02) ◽  
pp. 469-497 ◽  
Author(s):  
Ahmet Sakir Dokuz ◽  
Mete Celik

Socially important locations are places which are frequently visited by social media users in their social media lifetime. Discovering socially important locations provides valuable information, such as which locations are frequently visited by a social media user, which locations are common for a social media user group, and which locations are socially important for a group of urban area residents. However, discovering socially important locations is challenging due to huge volume, velocity, and variety of social media datasets, inefficiency of current interest measures and algorithms on social media big datasets, and the need of massive spatial and temporal calculations for spatial social media analyses. In contrast, cloud computing provides infrastructure and platforms to scale compute-intensive jobs. In the literature, limited number of studies related to socially important locations discovery takes into account cloud computing systems to scale increasing dataset size and to handle massive calculations. This study proposes a cloud-based socially important locations discovery algorithm of Cloud SS-ILM to handle volume and variety of social media big datasets. In particular, in this study, we used Apache Hadoop framework and Hadoop MapReduce programming model to scale dataset size and handle massive spatial and temporal calculations. The performance evaluation of the proposed algorithm is conducted on a cloud computing environment using Turkey Twitter social media big dataset. The experimental results show that using cloud computing systems for socially important locations discovery provide much faster discovery of results than classical algorithms. Moreover, the results show that it is necessary to use cloud computing systems for analyzing social media big datasets that could not be handled with traditional stand-alone computer systems. The proposed Cloud SS-ILM algorithm could be applied on many application areas, such as targeted advertisement of businesses, social media utilization of cities for city planners and local governments, and handling emergency situations.

Author(s):  
L. M. Almutairi ◽  
S. Shetty ◽  
H. G. Momm

Evolutionary computation, in the form of genetic programming, is used to aid information extraction process from high-resolution satellite imagery in a semi-automatic fashion. Distributing and parallelizing the task of evaluating all candidate solutions during the evolutionary process could significantly reduce the inherent computational cost of evolving solutions that are composed of multichannel large images. In this study, we present the design and implementation of a system that leverages cloud-computing technology to expedite supervised solution development in a centralized evolutionary framework. The system uses the MapReduce programming model to implement a distributed version of the existing framework in a cloud-computing platform. The proposed system has two major subsystems; (i) data preparation: the generation of random spectral indices; and (ii) distributed processing: the distributed implementation of genetic programming, which is used to spectrally distinguish the features of interest from the remaining image background in the cloud computing environment in order to improve scalability. The proposed system reduces response time by leveraging the vast computational and storage resources in a cloud computing environment. The results demonstrate that distributing the candidate solutions reduces the execution time by 91.58%. These findings indicate that such technology could be applied to more complex problems that involve a larger population size and number of generations.


2021 ◽  
Vol 27 (2) ◽  
Author(s):  
Osuolale A. Festus ◽  
Adewale O. Sunday ◽  
Alese K. Boniface

The introduction of computers has been a huge plus to human life in its entirety because it provides both the world of business and private an easy and fast means to process, generate and exchange information. However, the proliferation of networked devices, internet services and the amount of data being generated frequently is enormous. This poses a major challenge, to the procurement cost of high performing computers and servers capable of processing and housing the big data. This called for the migration of organizational and/or institutional data upload to the cloud for highlevel of productivity at a low cost. Therefore, with high demand for cloud services and resources by users who migrated to the cloud, cloud computing systems have experienced an increase in outages or failures in real-time cloud computing environment and thereby affecting its reliability and availability. This paper proposes and simulates a system comprising four components: the user, task controller, fault detector and fault tolerance layers to mitigate the occurrence of fault combining checkpointing and replication techniques using cloud simulator (CloudSim).


2015 ◽  
Vol 742 ◽  
pp. 726-729
Author(s):  
Hong Xia Tian ◽  
Xue We Cui ◽  
Jing Wang ◽  
Ying Jie Wang

This paper presents a lightweight index does not suspend services online update program, and demonstrate the performance of the index update program from the theoretical analysis and experimental data in two ways. A new method of MapReduce existing index methodology based on this design and further discussion are done in the paper, in the index MapReduce and Hadoop MapReduce feasibility aspects, the design flaws through experimentation.


2017 ◽  
Vol 10 (13) ◽  
pp. 445
Author(s):  
Purvi Pathak ◽  
Kumar R

High-performance computing (HPC) applications require high-end computing systems, but not all scientists have access to such powerful systems. Cloud computing provides an opportunity to run these applications on the cloud without the requirement of investing in high-end parallel computing systems. We can analyze the performance of the HPC applications on private as well as public clouds. The performance of the workload on the cloud can be calculated using different benchmarking tools such as NAS parallel benchmarking and Rally. The workloads of HPC applications require use of many parallel computing systems to be run on a physical setup, but this facility is available on cloud computing environment without the need of investing in physical machines. We aim to analyze the ability of the cloud to perform well when running HPC workloads. We shall get the detailed performance of the cloud when running these applications on a private cloud and find the pros and cons of running HPC workloads on cloud environment.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1882
Author(s):  
Chiu-Han Hsiao ◽  
Frank Yeong-Sung Lin ◽  
Evana Szu-Han Fang ◽  
Yu-Fang Chen ◽  
Yean-Fu Wen ◽  
...  

A combined edge and core cloud computing environment is a novel solution in 5G network slices. The clients’ high availability requirement is a challenge because it limits the possible admission control in front of the edge cloud. This work proposes an orchestrator with a mathematical programming model in a global viewpoint to solve resource management problems and satisfying the clients’ high availability requirements. The proposed Lagrangian relaxation-based approach is adopted to solve the problems at a near-optimal level for increasing the system revenue. A promising and straightforward resource management approach and several experimental cases are used to evaluate the efficiency and effectiveness. Preliminary results are presented as performance evaluations to verify the proposed approach’s suitability for edge and core cloud computing environments. The proposed orchestrator significantly enables the network slicing services and efficiently enhances the clients’ satisfaction of high availability.


2021 ◽  
Vol 18 (2) ◽  
pp. 517-534
Author(s):  
Pei Tian

With the advent of the era of cloud computing, the amount of application data increases dramatically, and personalized recommendation technology becomes more and more important. This paper mainly studies the collaborative filtering detection algorithm in the cloud computing environment. The algorithm migrates the collaborative filtering detection technology and applies it to the cloud computing environment. It shortens the recommendation time by using the advantages of clustering. A new recommendation algorithm can improve the accuracy of recommendation, and proposes a parallel collaborative filtering recommendation algorithm based on project. The algorithm is designed with programming model The experimental results show that the proposed algorithm has shorter running time and better scalability than the existing parallel algorithm.


Webology ◽  
2021 ◽  
Vol 18 (Special Issue 01) ◽  
pp. 127-136
Author(s):  
A.S. Manekar ◽  
Dr. Pradeepini Gera

Task Scheduling and Resource allocation is a prominent research topic in cloud computing. There are several objectives associated with Optimize Task Scheduling and Resource allocation as cloud computing systems are more complex than the traditional distributed system. There are several challenges like resolving the task mapped to the node on which task to be executed. A simplified but near optimal proposed nature inspired algorithms are focus in this paper. In this paper basic idea about optimization, reliability and complexity is considered while design a solution for modern BDA (Big Data Application). Detailed analysis of experimental results, it is shown that the proposed algorithm has better optimization effect on the fair share policies which are presently available in most of the BDA. In this paper we focused on Dragonfly algorithm and Sea lion algorithms which are nature inspired algorithms. These algorithms are efficient for optimization purpose for solving task scheduling and resource allocation problem. Finally performance of the hybrid DA algorithm and Sea lion is compared with traditional techniques used for modern BDA using Hadoop MapReduce. Simulation results prove the efficacy of the suggested algorithms.


2019 ◽  
Vol 6 (5) ◽  
pp. 519
Author(s):  
Aminudin Aminudin ◽  
Eko Budi Cahyono

<p class="Judul2">Apache Spark merupakan platform yang dapat digunakan untuk memproses data dengan ukuran data yang relatif  besar (<em>big data</em>) dengan kemampuan untuk membagi data tersebut ke masing-masing cluster yang telah ditentukan konsep ini disebut dengan parallel komputing. Apache Spark mempunyai kelebihan dibandingkan dengan framework lain yang serupa misalnya Apache Hadoop dll, di mana Apache Spark mampu memproses data secara streaming artinya data yang masuk ke dalam lingkungan Apache Spark dapat langsung diproses tanpa menunggu data lain terkumpul. Agar di dalam Apache Spark mampu melakukan proses machine learning, maka di dalam paper ini akan dilakukan eksperimen yaitu dengan mengintegrasikan Apache Spark yang bertindak sebagai lingkungan pemrosesan data yang besar dan konsep parallel komputing akan dikombinasikan dengan library H2O yang khusus untuk menangani pemrosesan data menggunakan algoritme machine learning. Berdasarkan hasil pengujian Apache Spark di dalam lingkungan cloud computing, Apache Spark mampu memproses data cuaca yang didapatkan dari arsip data cuaca terbesar yaitu yaitu data NCDC dengan ukuran data sampai dengan 6GB. Data tersebut diproses menggunakan salah satu model machine learning yaitu deep learning dengan membagi beberapa node yang telah terbentuk di lingkungan cloud computing dengan memanfaatkan library H2O. Keberhasilan tersebut dapat dilihat dari parameter pengujian yang telah diujikan meliputi nilai running time, throughput, Avarege Memory dan Average CPU yang didapatkan dari Benchmark Hibench. Semua nilai tersebut  dipengaruhi oleh banyaknya data dan jumlah node.</p><p class="Judul2"> </p><p class="Judul2"><em><strong>Abstract</strong></em></p><p><em>Apache Spark is a platform that can be used to process data with relatively large data sizes (big data) with the ability to divide the data into each cluster that has been determined. This concept is called parallel computing. Apache Spark has advantages compared to other similar frameworks such as Apache Hadoop, etc., where Apache Spark is able to process data in streaming, meaning that the data entered into the Apache Spark environment can be directly processed without waiting for other data to be collected. In order for Apache Spark to be able to do machine learning processes, in this paper an experiment will be conducted that integrates Apache Spark which acts as a large data processing environment and the concept of parallel computing will be combined with H2O libraries specifically for handling data processing using machine learning algorithms . Based on the results of testing Apache Spark in a cloud computing environment, Apache Spark is able to process weather data obtained from the largest weather data archive, namely NCDC data with data sizes up to 6GB. The data is processed using one of the machine learning models namely deep learning by dividing several nodes that have been formed in the cloud computing environment by utilizing the H2O library. The success can be seen from the test parameters that have been tested including the value of running time, throughput, Avarege Memory and CPU Average obtained from the Hibench Benchmark. All these values are influenced by the amount of data and number of nodes.</em><em></em></p><p class="Judul2"><em><strong><br /></strong></em></p>


Sign in / Sign up

Export Citation Format

Share Document