scholarly journals Data Mining and Knowledge Discovery for Big Data in Cloud Environment

Webology ◽  
2021 ◽  
Vol 18 (Special Issue 04) ◽  
pp. 1118-1131
Author(s):  
Raid Abd Alreda Shekan ◽  
Ahmed Mahdi Abdulkadium ◽  
Hiba Ameer Jabir

In past few decades, big data has evolved as a modern framework that offers huge amount of data and possibilities for applying and/or promoting analysis and decision-making technologies with unparalleled importance for digital processes in organization, engineering and science. Because of the new methods in these domains, the paper discusses history of big data mining under the cloud computing environment. In addition to the pursuit of exploration of knowledge, Big Data revolution gives companies many exciting possibilities (in relation to new vision, decision making and business growths strategies). The prospect of developing large-data processing, data analytics, and evaluation through a cloud computing model has been explored. The key component of this paper is the technical description of how to use cloud computing and the uses of data mining techniques and analytics methods in predictive and decision support systems.

Author(s):  
. Monika ◽  
Pardeep Kumar ◽  
Sanjay Tyagi

In Cloud computing environment QoS i.e. Quality-of-Service and cost is the key element that to be take care of. As, today in the era of big data, the data must be handled properly while satisfying the request. In such case, while handling request of large data or for scientific applications request, flow of information must be sustained. In this paper, a brief introduction of workflow scheduling is given and also a detailed survey of various scheduling algorithms is performed using various parameter.


2015 ◽  
Vol 719-720 ◽  
pp. 924-928 ◽  
Author(s):  
Xiao Chun Sheng ◽  
Xiao Feng Xue ◽  
Yan Ping Cheng

Cloud computing is computing tasks distribution resources of a large number of computers in the subnet, to provide users with cheap and efficient computing power, storage capacity and service capabilities. Data mining is to find useful information in large data repository. Frequent flow of large amounts of data quickly and accurately find important basis for forecasting and decision, therefore, under the cloud computing environment parallelization frequent item data mining strategy to provide efficient solutions to store and analyze vast amounts of data has important theoretical significanceand application value.


Author(s):  
Meenu Gupta ◽  
Neha Singla

Data can be anything but from a large data base extraction of useful information is known as data mining. Cloud computing is a term which represent a collection of huge amount of data. Cloud computing can be correlated with data mining and Big Data Hadoop. Big data is high volume, high velocity, and/or high variety information asset that require new form of processing to enable enhanced decision making, insight discovery and process optimization. Data growth, speed and complexity are being accompanied by deployment of smart sensors and devices that transmit data commonly called the Internet of Things, multimedia and by other sources of semi-structured and structured data. Big Data is defined as the core element of nearly every digital transformation today.


Web Services ◽  
2019 ◽  
pp. 1601-1622 ◽  
Author(s):  
Meenu Gupta ◽  
Neha Singla

Data can be anything but from a large data base extraction of useful information is known as data mining. Cloud computing is a term which represent a collection of huge amount of data. Cloud computing can be correlated with data mining and Big Data Hadoop. Big data is high volume, high velocity, and/or high variety information asset that require new form of processing to enable enhanced decision making, insight discovery and process optimization. Data growth, speed and complexity are being accompanied by deployment of smart sensors and devices that transmit data commonly called the Internet of Things, multimedia and by other sources of semi-structured and structured data. Big Data is defined as the core element of nearly every digital transformation today.


2019 ◽  
Vol 6 (5) ◽  
pp. 519
Author(s):  
Aminudin Aminudin ◽  
Eko Budi Cahyono

<p class="Judul2">Apache Spark merupakan platform yang dapat digunakan untuk memproses data dengan ukuran data yang relatif  besar (<em>big data</em>) dengan kemampuan untuk membagi data tersebut ke masing-masing cluster yang telah ditentukan konsep ini disebut dengan parallel komputing. Apache Spark mempunyai kelebihan dibandingkan dengan framework lain yang serupa misalnya Apache Hadoop dll, di mana Apache Spark mampu memproses data secara streaming artinya data yang masuk ke dalam lingkungan Apache Spark dapat langsung diproses tanpa menunggu data lain terkumpul. Agar di dalam Apache Spark mampu melakukan proses machine learning, maka di dalam paper ini akan dilakukan eksperimen yaitu dengan mengintegrasikan Apache Spark yang bertindak sebagai lingkungan pemrosesan data yang besar dan konsep parallel komputing akan dikombinasikan dengan library H2O yang khusus untuk menangani pemrosesan data menggunakan algoritme machine learning. Berdasarkan hasil pengujian Apache Spark di dalam lingkungan cloud computing, Apache Spark mampu memproses data cuaca yang didapatkan dari arsip data cuaca terbesar yaitu yaitu data NCDC dengan ukuran data sampai dengan 6GB. Data tersebut diproses menggunakan salah satu model machine learning yaitu deep learning dengan membagi beberapa node yang telah terbentuk di lingkungan cloud computing dengan memanfaatkan library H2O. Keberhasilan tersebut dapat dilihat dari parameter pengujian yang telah diujikan meliputi nilai running time, throughput, Avarege Memory dan Average CPU yang didapatkan dari Benchmark Hibench. Semua nilai tersebut  dipengaruhi oleh banyaknya data dan jumlah node.</p><p class="Judul2"> </p><p class="Judul2"><em><strong>Abstract</strong></em></p><p><em>Apache Spark is a platform that can be used to process data with relatively large data sizes (big data) with the ability to divide the data into each cluster that has been determined. This concept is called parallel computing. Apache Spark has advantages compared to other similar frameworks such as Apache Hadoop, etc., where Apache Spark is able to process data in streaming, meaning that the data entered into the Apache Spark environment can be directly processed without waiting for other data to be collected. In order for Apache Spark to be able to do machine learning processes, in this paper an experiment will be conducted that integrates Apache Spark which acts as a large data processing environment and the concept of parallel computing will be combined with H2O libraries specifically for handling data processing using machine learning algorithms . Based on the results of testing Apache Spark in a cloud computing environment, Apache Spark is able to process weather data obtained from the largest weather data archive, namely NCDC data with data sizes up to 6GB. The data is processed using one of the machine learning models namely deep learning by dividing several nodes that have been formed in the cloud computing environment by utilizing the H2O library. The success can be seen from the test parameters that have been tested including the value of running time, throughput, Avarege Memory and CPU Average obtained from the Hibench Benchmark. All these values are influenced by the amount of data and number of nodes.</em><em></em></p><p class="Judul2"><em><strong><br /></strong></em></p>


Author(s):  
Kiran Kumar S V N Madupu

Big Data has terrific influence on scientific discoveries and also value development. This paper presents approaches in data mining and modern technologies in Big Data. Difficulties of data mining as well as data mining with big data are discussed. Some technology development of data mining as well as data mining with big data are additionally presented.


2019 ◽  
Author(s):  
Meghana Bastwadkar ◽  
Carolyn McGregor ◽  
S Balaji

BACKGROUND This paper presents a systematic literature review of existing remote health monitoring systems with special reference to neonatal intensive care (NICU). Articles on NICU clinical decision support systems (CDSSs) which used cloud computing and big data analytics were surveyed. OBJECTIVE The aim of this study is to review technologies used to provide NICU CDSS. The literature review highlights the gaps within frameworks providing HAaaS paradigm for big data analytics METHODS Literature searches were performed in Google Scholar, IEEE Digital Library, JMIR Medical Informatics, JMIR Human Factors and JMIR mHealth and only English articles published on and after 2015 were included. The overall search strategy was to retrieve articles that included terms that were related to “health analytics” and “as a service” or “internet of things” / ”IoT” and “neonatal intensive care unit” / ”NICU”. Title and abstracts were reviewed to assess relevance. RESULTS In total, 17 full papers met all criteria and were selected for full review. Results showed that in most cases bedside medical devices like pulse oximeters have been used as the sensor device. Results revealed a great diversity in data acquisition techniques used however in most cases the same physiological data (heart rate, respiratory rate, blood pressure, blood oxygen saturation) was acquired. Results obtained have shown that in most cases data analytics involved data mining classification techniques, fuzzy logic-NICU decision support systems (DSS) etc where as big data analytics involving Artemis cloud data analysis have used CRISP-TDM and STDM temporal data mining technique to support clinical research studies. In most scenarios both real-time and retrospective analytics have been performed. Results reveal that most of the research study has been performed within small and medium sized urban hospitals so there is wide scope for research within rural and remote hospitals with NICU set ups. Results have shown creating a HAaaS approach where data acquisition and data analytics are not tightly coupled remains an open research area. Reviewed articles have described architecture and base technologies for neonatal health monitoring with an IoT approach. CONCLUSIONS The current work supports implementation of the expanded Artemis cloud as a commercial offering to healthcare facilities in Canada and worldwide to provide cloud computing services to critical care. However, no work till date has been completed for low resource setting environment within healthcare facilities in India which results in scope for research. It is observed that all the big data analytics frameworks which have been reviewed in this study have tight coupling of components within the framework, so there is a need for a framework with functional decoupling of components.


Sign in / Sign up

Export Citation Format

Share Document