Workflow Scheduling for Scientific Application in Homogeneous Cloud Environment

Author(s):  
. Monika ◽  
Pardeep Kumar ◽  
Sanjay Tyagi

In Cloud computing environment QoS i.e. Quality-of-Service and cost is the key element that to be take care of. As, today in the era of big data, the data must be handled properly while satisfying the request. In such case, while handling request of large data or for scientific applications request, flow of information must be sustained. In this paper, a brief introduction of workflow scheduling is given and also a detailed survey of various scheduling algorithms is performed using various parameter.

2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Yifeng Zheng ◽  
Zaixiang Huang ◽  
Tianzhong He

In recent years, more and more people pay attention to cloud computing. Users need to deal with magnanimity data in the cloud computing environment. Classification can predict the need of users from large data in the cloud computing environment. Some traditional classification methods frequently adopt the following two ways. One way is to remove instance after it is covered by a rule, another way is to decrease tuple weight of instance after it is covered by a rule. The quality of these traditional classifiers may be not high. As a result, they cannot achieve high classification accuracy in some data. In this paper, we present a new classification approach, called classification based on both attribute value weight and tuple weight (CATW). CATW is distinguished from some traditional classifiers in two aspects. First, CATW uses both attribute value weight and tuple weight. Second, CATW proposes a new measure to select best attribute values and generate high quality classification rule set. Our experimental results indicate that CATW can achieve higher classification accuracy than some traditional classifiers.


Cloud computing faces a challenge of handling huge amounts of data. The users keep on pushing the data without knowing the challenge in increased storage. Task Scheduling deals with allocating the task to a respective resource pool on a demand basis. Approaches have been built that handle requests from users with deadlines on the amount of request that can be handled. It is important to understand that the mechanism is available to handle the deadlines. The experimental results show that the proposed algorithm produces remarkable performance improvement rate on the total execution cost and total transfer time under meeting the deadline constraint. In view of the experimental results, the proposed algorithm provides a better-quality scheduling solution that is suitable for scientific application task execution in the cloud computing environment.


2019 ◽  
Vol 6 (5) ◽  
pp. 519
Author(s):  
Aminudin Aminudin ◽  
Eko Budi Cahyono

<p class="Judul2">Apache Spark merupakan platform yang dapat digunakan untuk memproses data dengan ukuran data yang relatif  besar (<em>big data</em>) dengan kemampuan untuk membagi data tersebut ke masing-masing cluster yang telah ditentukan konsep ini disebut dengan parallel komputing. Apache Spark mempunyai kelebihan dibandingkan dengan framework lain yang serupa misalnya Apache Hadoop dll, di mana Apache Spark mampu memproses data secara streaming artinya data yang masuk ke dalam lingkungan Apache Spark dapat langsung diproses tanpa menunggu data lain terkumpul. Agar di dalam Apache Spark mampu melakukan proses machine learning, maka di dalam paper ini akan dilakukan eksperimen yaitu dengan mengintegrasikan Apache Spark yang bertindak sebagai lingkungan pemrosesan data yang besar dan konsep parallel komputing akan dikombinasikan dengan library H2O yang khusus untuk menangani pemrosesan data menggunakan algoritme machine learning. Berdasarkan hasil pengujian Apache Spark di dalam lingkungan cloud computing, Apache Spark mampu memproses data cuaca yang didapatkan dari arsip data cuaca terbesar yaitu yaitu data NCDC dengan ukuran data sampai dengan 6GB. Data tersebut diproses menggunakan salah satu model machine learning yaitu deep learning dengan membagi beberapa node yang telah terbentuk di lingkungan cloud computing dengan memanfaatkan library H2O. Keberhasilan tersebut dapat dilihat dari parameter pengujian yang telah diujikan meliputi nilai running time, throughput, Avarege Memory dan Average CPU yang didapatkan dari Benchmark Hibench. Semua nilai tersebut  dipengaruhi oleh banyaknya data dan jumlah node.</p><p class="Judul2"> </p><p class="Judul2"><em><strong>Abstract</strong></em></p><p><em>Apache Spark is a platform that can be used to process data with relatively large data sizes (big data) with the ability to divide the data into each cluster that has been determined. This concept is called parallel computing. Apache Spark has advantages compared to other similar frameworks such as Apache Hadoop, etc., where Apache Spark is able to process data in streaming, meaning that the data entered into the Apache Spark environment can be directly processed without waiting for other data to be collected. In order for Apache Spark to be able to do machine learning processes, in this paper an experiment will be conducted that integrates Apache Spark which acts as a large data processing environment and the concept of parallel computing will be combined with H2O libraries specifically for handling data processing using machine learning algorithms . Based on the results of testing Apache Spark in a cloud computing environment, Apache Spark is able to process weather data obtained from the largest weather data archive, namely NCDC data with data sizes up to 6GB. The data is processed using one of the machine learning models namely deep learning by dividing several nodes that have been formed in the cloud computing environment by utilizing the H2O library. The success can be seen from the test parameters that have been tested including the value of running time, throughput, Avarege Memory and CPU Average obtained from the Hibench Benchmark. All these values are influenced by the amount of data and number of nodes.</em><em></em></p><p class="Judul2"><em><strong><br /></strong></em></p>


Author(s):  
Lavanya S. ◽  
Susila N. ◽  
Venkatachalam K.

In recent times, the cloud has become a leading technology demanding its functionality in every business. According to research firm IDC and Gartner study, nearly one-third of the worldwide enterprise application market will be SaaS-based by 2018, driving annual SaaS revenue to $50.8 billion, from $22.6 billion in 2013. Downtime is treated as the primary drawback which may affect great deals in businesses. The service unavailability leads to a major disruption affecting the business environment. Hence, utmost care should be taken to scale the availability of services. As cloud computing has plenty of uncertainty with respect to network bandwidth and resources accessibility, delegating the computing resources as services should be scheduled accordingly. This chapter proposes a study on cloud of clouds and its impact on a business enterprise. It is also decided to propose a suitable scheduling algorithm to the cloud of cloud environment so as to trim the downtime problem faced by the cloud computing environment.


2020 ◽  
pp. 1499-1521
Author(s):  
Sukhpal Singh Gill ◽  
Inderveer Chana ◽  
Rajkumar Buyya

Cloud computing has transpired as a new model for managing and delivering applications as services efficiently. Convergence of cloud computing with technologies such as wireless sensor networking, Internet of Things (IoT) and Big Data analytics offers new applications' of cloud services. This paper proposes a cloud-based autonomic information system for delivering Agriculture-as-a-Service (AaaS) through the use of cloud and big data technologies. The proposed system gathers information from various users through preconfigured devices and IoT sensors and processes it in cloud using big data analytics and provides the required information to users automatically. The performance of the proposed system has been evaluated in Cloud environment and experimental results show that the proposed system offers better service and the Quality of Service (QoS) is also better in terms of QoS parameters.


Sign in / Sign up

Export Citation Format

Share Document