Current Trends in Cloud Computing for Data Science Experiments

Syed Imran Jami; Siraj Munir

doi:10.4018/ijcac.2021100105

Current Trends in Cloud Computing for Data Science Experiments

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2021100105 ◽

2021 ◽

Vol 11 (4) ◽

pp. 80-99

Author(s):

Syed Imran Jami ◽

Siraj Munir

Keyword(s):

Cloud Computing ◽

Resource Sharing ◽

Data Science ◽

Job Scheduling ◽

Resource Scheduling ◽

Data Intensive ◽

Organizational Issues ◽

Current Trends ◽

Recent Trends ◽

And Storage

Recent trends in data-intensive experiments require extensive computing and storage resources that are now handled using cloud resources. Industry experts and researchers use cloud-based services and resources to get analytics of their data to avoid inter-organizational issues including power overhead on local machines, cost associated with maintaining and running infrastructure, etc. This article provides detailed review of selected metrics for cloud computing according to the requirements of data science and big data that includes (1) load balancing, (2) resource scheduling, (3) resource allocation, (4) resource sharing, and (5) job scheduling. The major contribution of this review is the inclusion of these metrics collectively which is the first attempt towards evaluating the latest systems in the context of data science. The detailed analysis shows that cloud computing needs research in its association with data-intensive experiments with emphasis on the resource scheduling area.

Get full-text (via PubEx)

Time and Cost Efficient Cloud Resource Allocation for Real-Time Data-Intensive Smart Systems

Energies ◽

10.3390/en13215706 ◽

2020 ◽

Vol 13 (21) ◽

pp. 5706

Author(s):

Muhammad Shuaib Qureshi ◽

Muhammad Bilal Qureshi ◽

Muhammad Fayaz ◽

Muhammad Zakarya ◽

Sheraz Aslam ◽

...

Keyword(s):

Resource Allocation ◽

Cloud Computing ◽

Real Time ◽

Large Scale ◽

Smart Devices ◽

Budget Constraints ◽

Smart Systems ◽

Data Intensive ◽

Cost Efficient ◽

And Storage

Cloud computing is the de facto platform for deploying resource- and data-intensive real-time applications due to the collaboration of large scale resources operating in cross-administrative domains. For example, real-time systems are generated by smart devices (e.g., sensors in smart homes that monitor surroundings in real-time, security cameras that produce video streams in real-time, cloud gaming, social media streams, etc.). Such low-end devices form a microgrid which has low computational and storage capacity and hence offload data unto the cloud for processing. Cloud computing still lacks mature time-oriented scheduling and resource allocation strategies which thoroughly deliberate stringent QoS. Traditional approaches are sufficient only when applications have real-time and data constraints, and cloud storage resources are located with computational resources where the data are locally available for task execution. Such approaches mainly focus on resource provision and latency, and are prone to missing deadlines during tasks execution due to the urgency of the tasks and limited user budget constraints. The timing and data requirements exacerbate the efficient task scheduling and resource allocation problems. To cope with the aforementioned gaps, we propose a time- and cost-efficient resource allocation strategy for smart systems that periodically offload computational and data-intensive load to the cloud. The proposed strategy minimizes the data files transfer overhead to computing resources by selecting appropriate pairs of computing and storage resources. The celebrated results show the effectiveness of the proposed technique in terms of resource selection and tasks processing within time and budget constraints when compared with the other counterparts.

Get full-text (via PubEx)

Fault-Tolerant and Data-Intensive Resource Scheduling and Management for Scientific Applications in Cloud Computing

Sensors ◽

10.3390/s21217238 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7238

Author(s):

Zulfiqar Ahmad ◽

Ali Imran Jehangiri ◽

Mohammed Alaa Ala’anzy ◽

Mohamed Othman ◽

Arif Iqbal Umar

Keyword(s):

Cloud Computing ◽

Fault Tolerant ◽

Research Work ◽

Resource Scheduling ◽

Scientific Workflow ◽

Scientific Workflows ◽

Scientific Applications ◽

Data Intensive ◽

Computing Paradigm ◽

Cost Constraints

Cloud computing is a fully fledged, matured and flexible computing paradigm that provides services to scientific and business applications in a subscription-based environment. Scientific applications such as Montage and CyberShake are organized scientific workflows with data and compute-intensive tasks and also have some special characteristics. These characteristics include the tasks of scientific workflows that are executed in terms of integration, disintegration, pipeline, and parallelism, and thus require special attention to task management and data-oriented resource scheduling and management. The tasks executed during pipeline are considered as bottleneck executions, the failure of which result in the wholly futile execution, which requires a fault-tolerant-aware execution. The tasks executed during parallelism require similar instances of cloud resources, and thus, cluster-based execution may upgrade the system performance in terms of make-span and execution cost. Therefore, this research work presents a cluster-based, fault-tolerant and data-intensive (CFD) scheduling for scientific applications in cloud environments. The CFD strategy addresses the data intensiveness of tasks of scientific workflows with cluster-based, fault-tolerant mechanisms. The Montage scientific workflow is considered as a simulation and the results of the CFD strategy were compared with three well-known heuristic scheduling policies: (a) MCT, (b) Max-min, and (c) Min-min. The simulation results showed that the CFD strategy reduced the make-span by 14.28%, 20.37%, and 11.77%, respectively, as compared with the existing three policies. Similarly, the CFD reduces the execution cost by 1.27%, 5.3%, and 2.21%, respectively, as compared with the existing three policies. In case of the CFD strategy, the SLA is not violated with regard to time and cost constraints, whereas it is violated by the existing policies numerous times.

Get full-text (via PubEx)

A Hybrid Approach for Scheduling based on Multi-criteria Decision Method in Data Grid

Computer Engineering and Applications Journal ◽

10.18495/comengapp.v3i1.44 ◽

2014 ◽

Vol 3 (1) ◽

pp. 1-12

Author(s):

Najme Mansouri

Keyword(s):

Job Scheduling ◽

Hybrid Approach ◽

The Other ◽

Computing Power ◽

Data Intensive ◽

A Value ◽

Performance Problem ◽

Computing Environments ◽

And Storage ◽

Very High

Grid computing environments have emerged following the demand of scientists to have a very high computing power and storage capacity. One among the challenges imposed in the use of these environments is the performance problem. To improve performance, scheduling technique is used. Most existing scheduling strategies in Grids only focus on one kind of Grid jobs which can be data-intensive or computation-intensive. However, only considering one kind of jobs in scheduling does not result in suitable scheduling in the viewpoint of all system, and sometimes causes wasting of resources on the other side. To address the challenge of simultaneously considering both kinds of jobs, a new Hybrid Job Scheduling (HJS) strategy is proposed in this paper. At one hand, HJS algorithm considers both data and computational resource availability of the network, and on the other hand, considering the corresponding requirements of each job, it determines a value called W to the job. Using the W value, the importance of two aspects (being data or computation intensive) for each job is determined, and then the job is assigned to the available resources. The simulation results with OptorSim show that HJS outperforms comparing to the existing algorithms mentioned in literature as number of jobs increases.

Get full-text (via PubEx)

Threshold based VM Placement Technique for Load Balanced Resource Provisioning using Priority Scheme in Cloud Computing

International journal of Computer Networks & Communications ◽

10.5121/ijcnc.2021.13501 ◽

2021 ◽

Vol 13 (5) ◽

pp. 01-18

Author(s):

Mayank Sohani ◽

Dr. S. C. Jain

Keyword(s):

Cloud Computing ◽

Resource Sharing ◽

Virtual Machines ◽

Resource Scheduling ◽

Resource Provisioning ◽

Scheduling Model ◽

Dynamic Resource Provisioning ◽

Two Factors ◽

Cloud Framework ◽

Resource Cost

The unbalancing load issue is a multi-variation, multi-imperative issue that corrupts the execution and productivity of processing assets. Workload adjusting methods give solutions of load unbalancing circumstances for two bothersome aspects over-burdening and under-stacking. Cloud computing utilizes planning and workload balancing for a virtualized environment, resource partaking in cloud foundation. These two factors must be handled in an improved way in cloud computing to accomplish ideal resource sharing. Henceforth, there requires productive resource, asset reservation for guaranteeing load advancement in the cloud. This work aims to present an incorporated resource, asset reservation, and workload adjusting calculation for effective cloud provisioning. The strategy develops a Priority-based Resource Scheduling Model to acquire the resource, asset reservation with threshold-based load balancing for improving the proficiency in cloud framework. Extending utilization of Virtual Machines through the suitable and sensible outstanding task at hand modifying is then practiced by intensely picking a job from submitting jobs using Priority-based Resource Scheduling Model to acquire resource asset reservation. Experimental evaluations represent, the proposed scheme gives better results by reducing execution time, with minimum resource cost and improved resource utilization in dynamic resource provisioning conditions.

Get full-text (via PubEx)

Research on Resource Scheduling Algorithm in Cloud Computing Data Center

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.2050 ◽

2014 ◽

Vol 926-930 ◽

pp. 2050-2053 ◽

Cited By ~ 2

Author(s):

Yi Zhang ◽

Yi Min Su

Keyword(s):

Cloud Computing ◽

Energy Consumption ◽

Data Center ◽

Resource Sharing ◽

Rapid Development ◽

Resource Scheduling ◽

Research Direction ◽

Computing Resource ◽

High Resource ◽

Cloud Resource

In recent years, with the rapid development of Internet and virtualization technology, cloud computing, which providing users with on-demand services, has become a research hotspot. Under the environment of cloud computing, the datacenter, consisted by hardware and software, is a loosely coupled resource sharing architecture. The existing cloud computing's inadequacies are as following three aspects: 1. For lacking of real adequate and effective transaction of global bidirectional-way selection, the revenue of most of cloud resource provider is too low. 2. Since not fully considering the scheduling of multi-dimensional cloud resources, existing cloud computing's utilization for multi-dimensional cloud resource is too low. 3. Because existing cloud datacenter does not fully consider the energy consumption of communication between the cloud tasks, its energy consumption is too high. Resource scheduling is a major research direction of cloud computing. First, we make a in-depth investigation and analysis of the research status of cloud computing resource scheduling, and then focus on resource scheduling method to reduce the energy consumption of cloud computing data center. Finally we set an important future research direction of cloud computing resource management research in order to provide a useful reference for cloud computing research.

Get full-text (via PubEx)

Modelling and Resource Scheduling approaches on Cloud Computing

2020 European Control Conference (ECC) ◽

10.23919/ecc51009.2020.9143767 ◽

2020 ◽

Author(s):

Dimitrios Dechouniotis ◽

Symeon Papavassiliou

Keyword(s):

Cloud Computing ◽

Resource Scheduling

Get full-text (via PubEx)

Issues in security and privacy of big data

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i12.482 ◽

2018 ◽

Vol 7 (12) ◽

pp. 1

Author(s):

Shaveta Bhatia

Keyword(s):

Cloud Computing ◽

Big Data ◽

Approximate Method ◽

Biomedical Research ◽

Cyber Security ◽

Data Science ◽

Third Party ◽

Security And Privacy ◽

Security Threats ◽

The Third

The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity. It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.

Get full-text (via PubEx)

A Novel Topology Optimization Theory and Parallel Data Analysis Model based Resource Scheduling Algorithm for Cloud Computing

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096511666180213111403 ◽

2018 ◽

Vol 11 (4) ◽

pp. 449-456

Author(s):

Yucheng Zhang ◽

Wenzhun Huang ◽

Ting Zhang ◽

Tuo Zhang

Keyword(s):

Cloud Computing ◽

Topology Optimization ◽

Data Analysis ◽

Scheduling Algorithm ◽

Resource Scheduling ◽

Optimization Theory ◽

Analysis Model ◽

Model Based ◽

Parallel Data

Get full-text (via PubEx)

Virtualized Resource Scheduling in Cloud Computing Environments: An Review

2020 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS) ◽

10.1109/tocs50858.2020.9339736 ◽

2020 ◽

Author(s):

Jianpeng Lin ◽

Delong Cui ◽

Zhiping Peng ◽

Qirui Li ◽

Jieguang He ◽

...

Keyword(s):

Cloud Computing ◽

Resource Scheduling ◽

Computing Environments

Get full-text (via PubEx)

Econometrics Pedagogy and Cloud Computing: Training the Next Generation of Economists and Data Scientists

Journal of Econometric Methods ◽

10.1515/jem-2020-0012 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Danielle V. Handel ◽

Anson T. Y. Ho ◽

Kim P. Huynh ◽

David T. Jacho-Chávez ◽

Carson H. Rea

Keyword(s):

Cloud Computing ◽

Data Science ◽

Internet Access ◽

Next Generation ◽

Web Browser ◽

Computer Laboratory ◽

Economics Students ◽

Computing Platforms

AbstractThis paper describes how cloud computing tools widely used in the instruction of data scientists can be introduced and taught to economics students as part of their curriculum. The demonstration centers around a workflow where the instructor creates a virtual server and the students only need Internet access and a web browser to complete in-class tutorials, assignments, or exams. Given how prevalent cloud computing platforms are becoming for data science, introducing these techniques into students’ econometrics training would prepare them to be more competitive when job hunting, while making instructors and administrators re-think what a computer laboratory means on campus.

Get full-text (via PubEx)