Current Trends in Cloud Computing for Data Science Experiments

2021 ◽  
Vol 11 (4) ◽  
pp. 80-99
Author(s):  
Syed Imran Jami ◽  
Siraj Munir

Recent trends in data-intensive experiments require extensive computing and storage resources that are now handled using cloud resources. Industry experts and researchers use cloud-based services and resources to get analytics of their data to avoid inter-organizational issues including power overhead on local machines, cost associated with maintaining and running infrastructure, etc. This article provides detailed review of selected metrics for cloud computing according to the requirements of data science and big data that includes (1) load balancing, (2) resource scheduling, (3) resource allocation, (4) resource sharing, and (5) job scheduling. The major contribution of this review is the inclusion of these metrics collectively which is the first attempt towards evaluating the latest systems in the context of data science. The detailed analysis shows that cloud computing needs research in its association with data-intensive experiments with emphasis on the resource scheduling area.

Energies ◽  
2020 ◽  
Vol 13 (21) ◽  
pp. 5706
Author(s):  
Muhammad Shuaib Qureshi ◽  
Muhammad Bilal Qureshi ◽  
Muhammad Fayaz ◽  
Muhammad Zakarya ◽  
Sheraz Aslam ◽  
...  

Cloud computing is the de facto platform for deploying resource- and data-intensive real-time applications due to the collaboration of large scale resources operating in cross-administrative domains. For example, real-time systems are generated by smart devices (e.g., sensors in smart homes that monitor surroundings in real-time, security cameras that produce video streams in real-time, cloud gaming, social media streams, etc.). Such low-end devices form a microgrid which has low computational and storage capacity and hence offload data unto the cloud for processing. Cloud computing still lacks mature time-oriented scheduling and resource allocation strategies which thoroughly deliberate stringent QoS. Traditional approaches are sufficient only when applications have real-time and data constraints, and cloud storage resources are located with computational resources where the data are locally available for task execution. Such approaches mainly focus on resource provision and latency, and are prone to missing deadlines during tasks execution due to the urgency of the tasks and limited user budget constraints. The timing and data requirements exacerbate the efficient task scheduling and resource allocation problems. To cope with the aforementioned gaps, we propose a time- and cost-efficient resource allocation strategy for smart systems that periodically offload computational and data-intensive load to the cloud. The proposed strategy minimizes the data files transfer overhead to computing resources by selecting appropriate pairs of computing and storage resources. The celebrated results show the effectiveness of the proposed technique in terms of resource selection and tasks processing within time and budget constraints when compared with the other counterparts.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7238
Author(s):  
Zulfiqar Ahmad ◽  
Ali Imran Jehangiri ◽  
Mohammed Alaa Ala’anzy ◽  
Mohamed Othman ◽  
Arif Iqbal Umar

Cloud computing is a fully fledged, matured and flexible computing paradigm that provides services to scientific and business applications in a subscription-based environment. Scientific applications such as Montage and CyberShake are organized scientific workflows with data and compute-intensive tasks and also have some special characteristics. These characteristics include the tasks of scientific workflows that are executed in terms of integration, disintegration, pipeline, and parallelism, and thus require special attention to task management and data-oriented resource scheduling and management. The tasks executed during pipeline are considered as bottleneck executions, the failure of which result in the wholly futile execution, which requires a fault-tolerant-aware execution. The tasks executed during parallelism require similar instances of cloud resources, and thus, cluster-based execution may upgrade the system performance in terms of make-span and execution cost. Therefore, this research work presents a cluster-based, fault-tolerant and data-intensive (CFD) scheduling for scientific applications in cloud environments. The CFD strategy addresses the data intensiveness of tasks of scientific workflows with cluster-based, fault-tolerant mechanisms. The Montage scientific workflow is considered as a simulation and the results of the CFD strategy were compared with three well-known heuristic scheduling policies: (a) MCT, (b) Max-min, and (c) Min-min. The simulation results showed that the CFD strategy reduced the make-span by 14.28%, 20.37%, and 11.77%, respectively, as compared with the existing three policies. Similarly, the CFD reduces the execution cost by 1.27%, 5.3%, and 2.21%, respectively, as compared with the existing three policies. In case of the CFD strategy, the SLA is not violated with regard to time and cost constraints, whereas it is violated by the existing policies numerous times.


2014 ◽  
Vol 3 (1) ◽  
pp. 1-12
Author(s):  
Najme Mansouri

Grid computing environments have emerged following the demand of scientists to have a very high computing power and storage capacity. One among the challenges imposed in the use of these environments is the performance problem. To improve performance, scheduling technique is used. Most existing scheduling strategies in Grids only focus on one kind of Grid jobs which can be data-intensive or computation-intensive. However, only considering one kind of jobs in scheduling does not result in suitable scheduling in the viewpoint of all system, and sometimes causes wasting of resources on the other side. To address the challenge of simultaneously considering both kinds of jobs, a new Hybrid Job Scheduling (HJS) strategy is proposed in this paper. At one hand, HJS algorithm considers both data and computational resource availability of the network, and on the other hand, considering the corresponding requirements of each job, it determines a value called W to the job. Using the W value, the importance of two aspects (being data or computation intensive) for each job is determined, and then the job is assigned to the available resources. The simulation results with OptorSim show that HJS outperforms comparing to the existing algorithms mentioned in literature as number of jobs increases.


2021 ◽  
Vol 13 (5) ◽  
pp. 01-18
Author(s):  
Mayank Sohani ◽  
Dr. S. C. Jain

The unbalancing load issue is a multi-variation, multi-imperative issue that corrupts the execution and productivity of processing assets. Workload adjusting methods give solutions of load unbalancing circumstances for two bothersome aspects over-burdening and under-stacking. Cloud computing utilizes planning and workload balancing for a virtualized environment, resource partaking in cloud foundation. These two factors must be handled in an improved way in cloud computing to accomplish ideal resource sharing. Henceforth, there requires productive resource, asset reservation for guaranteeing load advancement in the cloud. This work aims to present an incorporated resource, asset reservation, and workload adjusting calculation for effective cloud provisioning. The strategy develops a Priority-based Resource Scheduling Model to acquire the resource, asset reservation with threshold-based load balancing for improving the proficiency in cloud framework. Extending utilization of Virtual Machines through the suitable and sensible outstanding task at hand modifying is then practiced by intensely picking a job from submitting jobs using Priority-based Resource Scheduling Model to acquire resource asset reservation. Experimental evaluations represent, the proposed scheme gives better results by reducing execution time, with minimum resource cost and improved resource utilization in dynamic resource provisioning conditions.


2014 ◽  
Vol 926-930 ◽  
pp. 2050-2053 ◽  
Author(s):  
Yi Zhang ◽  
Yi Min Su

In recent years, with the rapid development of Internet and virtualization technology, cloud computing, which providing users with on-demand services, has become a research hotspot. Under the environment of cloud computing, the datacenter, consisted by hardware and software, is a loosely coupled resource sharing architecture. The existing cloud computing's inadequacies are as following three aspects: 1. For lacking of real adequate and effective transaction of global bidirectional-way selection, the revenue of most of cloud resource provider is too low. 2. Since not fully considering the scheduling of multi-dimensional cloud resources, existing cloud computing's utilization for multi-dimensional cloud resource is too low. 3. Because existing cloud datacenter does not fully consider the energy consumption of communication between the cloud tasks, its energy consumption is too high. Resource scheduling is a major research direction of cloud computing. First, we make a in-depth investigation and analysis of the research status of cloud computing resource scheduling, and then focus on resource scheduling method to reduce the energy consumption of cloud computing data center. Finally we set an important future research direction of cloud computing resource management research in order to provide a useful reference for cloud computing research.


Author(s):  
Shaveta Bhatia

 The epoch of the big data presents many opportunities for the development in the range of data science, biomedical research cyber security, and cloud computing. Nowadays the big data gained popularity.  It also invites many provocations and upshot in the security and privacy of the big data. There are various type of threats, attacks such as leakage of data, the third party tries to access, viruses and vulnerability that stand against the security of the big data. This paper will discuss about the security threats and their approximate method in the field of biomedical research, cyber security and cloud computing.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Danielle V. Handel ◽  
Anson T. Y. Ho ◽  
Kim P. Huynh ◽  
David T. Jacho-Chávez ◽  
Carson H. Rea

AbstractThis paper describes how cloud computing tools widely used in the instruction of data scientists can be introduced and taught to economics students as part of their curriculum. The demonstration centers around a workflow where the instructor creates a virtual server and the students only need Internet access and a web browser to complete in-class tutorials, assignments, or exams. Given how prevalent cloud computing platforms are becoming for data science, introducing these techniques into students’ econometrics training would prepare them to be more competitive when job hunting, while making instructors and administrators re-think what a computer laboratory means on campus.


Sign in / Sign up

Export Citation Format

Share Document