Optimasi Algoritma K-Means Clustering dengan Parallel Processing menggunakan Framework R

Mastura Diana Marieska; Suci Lestari; Calvin Mahendra; Nabila Rizky Oktadini; Muhammad Ali Buchari

doi:10.26418/jp.v7i1.43400

Optimasi Algoritma K-Means Clustering dengan Parallel Processing menggunakan Framework R

Jurnal Edukasi dan Penelitian Informatika (JEPIN) ◽

10.26418/jp.v7i1.43400 ◽

2021 ◽

Vol 7 (1) ◽

pp. 70

Author(s):

Mastura Diana Marieska ◽

Suci Lestari ◽

Calvin Mahendra ◽

Nabila Rizky Oktadini ◽

Muhammad Ali Buchari

Keyword(s):

Data Mining ◽

Parallel Processing ◽

Execution Time ◽

Serial Processing

Parallel processing sering digunakan untuk melakukan optimasi execution time terhadap algoritma data mining. Pada penelitian ini, parallel processing digunakan untuk melakukan optimasi pada algoritma clustering K-Means. Implementasi algoritma K-means dilakukan dengan memanfaatkan package yang tersedia pada framework R. Algoritma K-Means dijalankan secara serial dan parallel. Untuk mendapatkan persentase optimasi, maka dilakukan perbandingan antara execution time pada parallel processing dan execution time pada serial processing. Penelitian ini menggunakan dataset Boston Housing yang umum digunakan pada data mining. Skenario pengujian dibedakan berdasarkan jumlah core dan jumlah centroid. Hasil pengujian menunjukkan bahwa parallel processing untuk tiap skenario memiliki execution time yang lebih kecil daripada serial processing. Optimasi yang dihasilkan cukup signifikan, yakni bernilai 20% hingga 52%. Optimasi tertinggi didapatkan pada jumlah core terbanyak dan jumlah centroid terbesar.

Download Full-text

A Hybrid Algorithm of Mining Closed Itemsets for Large Databases

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.145.292 ◽

2011 ◽

Vol 145 ◽

pp. 292-296

Author(s):

Lee Wen Huang

Keyword(s):

Data Mining ◽

Association Rules ◽

Execution Time ◽

Hybrid Algorithm ◽

Hybrid Approach ◽

Market Basket Analysis ◽

Market Basket ◽

Large Databases ◽

Closed Itemsets ◽

Simulation Results

Data Mining means a process of nontrivial extraction of implicit, previously and potentially useful information from data in databases. Mining closed large itemsets is a further work of mining association rules, which aims to find the set of necessary subsets of large itemsets that could be representative of all large itemsets. In this paper, we design a hybrid approach, considering the character of data, to mine the closed large itemsets efficiently. Two features of market basket analysis are considered – the number of items is large; the number of associated items for each item is small. Combining the cut-point method and the hash concept, the new algorithm can find the closed large itemsets efficiently. The simulation results show that the new algorithm outperforms the FP-CLOSE algorithm in the execution time and the space of storage.

Download Full-text

Research on Data Mining Optimization and Security Based on MapReduce

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.631-632.1053 ◽

2014 ◽

Vol 631-632 ◽

pp. 1053-1056

Author(s):

Hui Xia

Keyword(s):

Data Mining ◽

Execution Time ◽

Cluster Computing ◽

Limited Resource ◽

Experimental Results ◽

Computing Environment ◽

Cluster Systems ◽

National Education ◽

Distributed Cluster ◽

Data Optimization

The paper addressed the issues of limited resource for data optimization for efficiency, reliability, scalability and security of data in distributed, cluster systems with huge datasets. The study’s experimental results predicted that the MapReduce tool developed improved data optimization. The system exhibits undesired speedup with smaller datasets, but reasonable speedup is achieved with a larger enough datasets that complements the number of computing nodes reducing the execution time by 30% as compared to normal data mining and processing. The MapReduce tool is able to handle data growth trendily, especially with larger number of computing nodes. Scaleup gracefully grows as data and number of computing nodes increases. Security of data is guaranteed at all computing nodes since data is replicated at various nodes on the cluster system hence reliable. Our implementation of the MapReduce runs on distributed cluster computing environment of a national education web portal and is highly scalable.

Download Full-text

Impact of parallel processing of regional anesthesia with block rooms on resource utilization and clinical outcomes: a systematic review and meta-analysis

Regional Anesthesia and Pain Medicine ◽

10.1136/rapm-2020-101397 ◽

2020 ◽

Vol 45 (9) ◽

pp. 720-726

Author(s):

Kariem El-Boghdadly ◽

Ganeshkrishna Nair ◽

Amit Pawa ◽

Desire N. Onwochei

Keyword(s):

Systematic Review ◽

Parallel Processing ◽

Clinical Outcomes ◽

Regional Anesthesia ◽

Resource Utilization ◽

Meta Analysis ◽

Research Centre ◽

Postanesthesia Care Unit ◽

Serial Processing ◽

Pacu Stay

Block rooms allow parallel processing of surgical patients with the purported benefits of improving resource utilization and patient outcomes. There is disparity in the literature supporting these suppositions. We aimed to synthesize the evidence base for parallel processing by conducting a systematic review and meta-analysis. A systematic search was undertaken of Medline, Embase, Web of Science, Cumulative Index to Nursing and Allied Health Literature (CINAHL), the National Health Service (NHS) National Institute for Health Research Centre for Reviews and Dissemination database, and Google Scholar for terms relating to regional anesthesia and block rooms. The primary outcome was anesthesia-controlled time (ACT; time from entry of the patient into the operating room (OR) until the start of surgical prep plus surgical closure to exit of patient from the OR). Secondary outcomes of interest included other resource-utilization parameters such as turnover time (TOT; time between the exit of one patient from the OR and the entry of another), time spent in the postanesthesia care unit (PACU), OR throughput, and clinical outcomes such as pain scores, nausea and vomiting, and patient satisfaction. Fifteen studies were included involving 8888 patients, of which 3364 received care using a parallel processing model. Parallel processing reduced ACT by a mean difference (95% CI) of 10.4 min (16.3 to 4.5; p<0.0001), TOT by 16.1 min (27.4 to 4.8; p<0.0001) and PACU stay by 26.6 min (47.1 to 6.1; p=0.01) when compared with serial processing. Moreover, parallel processing increased daily OR throughout by 1.7 cases per day (p<0.0001). Clinical outcomes all favored parallel processing models. All studies showed moderate-to-critical levels of bias. Parallel processing in regional anesthesia appears to reduce the ACT, TOT, PACU time and improved OR throughput when compared with serial processing. PROSPERO CRD42018085184.

Download Full-text

Optimizing the Execution Time of the SLA-based Workflow in the Grid with Parallel Processing Technology

10.1109/apscc.2008.114 ◽

2008 ◽

Author(s):

Dang Minh Quan ◽

Jörn ALtmann ◽

Laurence T. Yang

Keyword(s):

Parallel Processing ◽

Execution Time ◽

Processing Technology

Download Full-text

Media Multitasking Negatively Impacts Cognitive Flexibility

10.31234/osf.io/xgk7d ◽

2018 ◽

Author(s):

Jesus Lopez ◽

Joseph M Orr

Keyword(s):

Parallel Processing ◽

Cognitive Flexibility ◽

Divergent Thinking ◽

Negative Relationship ◽

Media Multitasking ◽

Convergent Thinking ◽

Serial Processing ◽

Processing Styles ◽

Processing Abilities ◽

Remote Associates Test

Given the prevalence of multitasking today, it is critical to understand how multitasking affects the mind. Recent studies have suggested that frequent multitaskers perform worse on tasks requiring cognitive control. Nevertheless, others have suggested that frequent multitasking may lead to an improvement of parallel processing abilities, perhaps at the expense of serial processing. The current study examined whether the degree to which a person engages in media multitasking affects the balance between serial and parallel processing styles. Moreover, we examined the idea that heavy multitaskers would be biased toward the parallel processing of tasks. For this study, parallel processing was indexed by the divergent thinking paradigm, the AUT (Alternative Uses Task), and serial processing by the convergent thinking paradigm, the RAT (Remote Associates Test). Our hypothesis was that people who frequently media multitask would display higher measures of divergent thinking, while those who media multitask to a lesser degree would in turn display higher measures of convergent thinking. 528 college students completed the Media Use Questionnaire in order to compute their Media Multitasking Inventory (MMI) score, as well as the RAT and AUT. A negative relationship between MMI score and AUT scores was found, indicating that more time spent media multitasking was associated with less divergent thinking. There was no significant effect of MMI and RAT scores. Subjects who completed the AUT online performed significantly worse than their in-person counterparts. These results suggest that the more an individual media multitasks, the poorer cognitive flexibility they command. Further, the context and environment in which these heavier media multitaskers operate in may influence their degree of cognitive flexibility.

Download Full-text

Intelligent Bilingual Data Extraction and Rebuilding Using Data Mining for Big Data

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8699 ◽

2020 ◽

Vol 17 (1) ◽

pp. 513-518

Author(s):

Shashi Pal Singh ◽

Ajai Kumar ◽

Rachna Awasthi ◽

Neetu Yadav ◽

Shikha Jain

Keyword(s):

Data Mining ◽

Big Data ◽

Parallel Processing ◽

Data Extraction ◽

Structured Data ◽

Unstructured Data ◽

File Format ◽

Structure Form ◽

Text Extraction ◽

File Formats

In today’s World there exists various source of data in various formats (file formats), different structure, different types and etc. which is a hug collection of unstructured over the internet or social media. This gives rise to categorization of data as unstructured, semi structured and structured data. Data that exist in irregular manner without any particular schema are referred as unstructured data which is very difficult to process as it consists of irregularities and ambiguities. So, we are focused on Intelligent Processing Unit which converts unstructured big data into intelligent meaningful information. Intelligent text extraction is a technique that automatically identifies and extracts text from file format. The system consists of different stages which include the pre-processing, keyphase extraction techniques and transformation for the text extraction and retrieve structured data from unstructured data. The system consists multiple method/approach give better result. We are currently working in various file formats and converting the file format into DOCX which will come in the form of the un-structure Form, and then we will obtain that file in the structure form with the help of intelligent Pre-processing. The pre-process stages that triggers the unstructured data/corpus into structured data converting into meaning full. The Initial stage is the system remove the stop word, unwanted symbols noisy data and line spacing. The second stage is Data Extraction from various sources of file or types of files into proper format plain text. The then in third stage we transform the data or information from one format to another for the user to understand the data. The final step is rebuilding the file in its original format maintaining tag of the files. The large size files are divided into sub small size file to executed the parallel processing algorithms for fast processing of larger files and data. Parallel processing is a very important concept for text extraction and with its help; the big file breaks in a small file and improves the result. Extraction of data is done in Bilingual language, and represent the most relevant information contained in the document. Key-phase extraction is an important problem of data mining, Knowledge retrieval and natural speech processing. Keyword Extraction technique has been used to abstract keywords that exclusively recognize a document. Rebuilding is an important part of this project and we will use the entire concept in that file format and in the last, we need the same format which we have done in that file. This concept is being widely used but not much work of the work has been done in the area of developing many functionalities under one tool, so this makes us feel the requirement of such a tool which can easily and efficiently convert unstructured files into structured one.

Download Full-text

Reducing Side Effects of Hiding Sensitive Itemsets in Privacy Preserving Data Mining

The Scientific World JOURNAL ◽

10.1155/2014/235837 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 10

Author(s):

Chun-Wei Lin ◽

Tzung-Pei Hong ◽

Hung-Chuan Hsu

Keyword(s):

Data Mining ◽

Side Effects ◽

Execution Time ◽

Privacy Preserving ◽

Sensitive Information ◽

Privacy Preserving Data Mining ◽

Confidential Data

Data mining is traditionally adopted to retrieve and analyze knowledge from large amounts of data. Private or confidential data may be sanitized or suppressed before it is shared or published in public. Privacy preserving data mining (PPDM) has thus become an important issue in recent years. The most general way of PPDM is to sanitize the database to hide the sensitive information. In this paper, a novel hiding-missing-artificial utility (HMAU) algorithm is proposed to hide sensitive itemsets through transaction deletion. The transaction with the maximal ratio of sensitive to nonsensitive one is thus selected to be entirely deleted. Three side effects of hiding failures, missing itemsets, and artificial itemsets are considered to evaluate whether the transactions are required to be deleted for hiding sensitive itemsets. Three weights are also assigned as the importance to three factors, which can be set according to the requirement of users. Experiments are then conducted to show the performance of the proposed algorithm in execution time, number of deleted transactions, and number of side effects.

Download Full-text

Malware Detection in Android Using Data Mining

International Journal of Natural Computing Research ◽

10.4018/ijncr.2017070101 ◽

2017 ◽

Vol 6 (2) ◽

pp. 1-17 ◽

Cited By ~ 1

Author(s):

Suparna Dasgupta ◽

Soumyabrata Saha ◽

Suman Kumar Das

Keyword(s):

Data Mining ◽

Supervised Learning ◽

Execution Time ◽

Classification Tree ◽

The Internet ◽

Data Set ◽

Software Packages ◽

Using Data ◽

Day By Day ◽

User Location

This article describes how as day-to-day Android users are increasing, the Internet has become the type of environment preferred by attackers to inject malicious packages. This is content with the intention of gathering critical information, spying on user details, credentials, call logs, contact details, and tracking user location. Regrettably it is very hard to detect malware even with antivirus software/packages. In addition, this type of attack is increasing day by day. In this article the authors have chosen a Supervised Learning Classification Tree-based algorithm to detect malware on the data set. Comparison amongst all the classifiers on the basis of accuracy and execution time are used to build the classifier model which has the highest executed detections.

Download Full-text

Robust Query Execution Time Prediction for Concurrent Workloads on Massive Parallel Processing Databases

Lecture Notes in Computer Science - Advances and Trends in Artificial Intelligence. From Theory to Practice ◽

10.1007/978-3-030-22999-3_6 ◽

2019 ◽

pp. 63-70

Author(s):

Zhihao Zheng ◽

Yuanzhe Bei ◽

Hongyan Sun ◽

Pengyu Hong

Keyword(s):

Parallel Processing ◽

Execution Time ◽

Query Execution ◽

Time Prediction ◽

Query Execution Time Prediction ◽

Execution Time Prediction ◽

Massive Parallel Processing

Download Full-text

Distinguishing between parallel and serial processing in visual attention from neurobiological data

10.1101/383596 ◽

2018 ◽

Cited By ~ 1

Author(s):

Kang Li ◽

Mikiko Kadohisa ◽

Makoto Kusunoki ◽

John Duncan ◽

Claus Bundesen ◽

...

Keyword(s):

Prefrontal Cortex ◽

Parallel Processing ◽

Rhesus Monkeys ◽

Visual Cognition ◽

Spike Trains ◽

Stimulus Onset ◽

Maximum Likelihood Estimates ◽

Serial Processing ◽

In The Beginning ◽

Processing Mechanisms

AbstractSerial and parallel processing in visual search have been long debated in psychology but the processing mechanism remains an open issue. Serial processing allows only one object at a time to be processed, whereas parallel processing assumes that various objects are processed simultaneously. Here we present novel neural models for the two types of processing mechanisms based on analysis of simultaneously recorded spike trains using electrophysiological data from prefrontal cortex of rhesus monkeys while processing task-relevant visual displays. We combine mathematical models describing neuronal attention and point process models for spike trains. The same model can explain both serial and parallel processing by adopting different parameter regimes. We present statistical methods to distinguish between serial and parallel processing based on both maximum likelihood estimates and decoding the momentary focus of attention when two stimuli are presented simultaneously. Results show that both processing mechanisms are in play for the simultaneously recorded neurons, but neurons tend to follow parallel processing in the beginning after the onset of the stimulus pair, whereas they tend to serial processing later on. This could be explained by parallel processing being related to sensory bottom-up signals or feedforward processing, which typically occur in the beginning after stimulus onset, whereas top-down signals related to cognitive modulatory influences guiding attentional effects in recurrent feedback connections occur after a small delay, and is related to serial processing, where all processing capacities are being directed towards the attended object.Author summaryA fundamental question concerning processing of visual objects in our brain is how a population of cortical cells respond when presented with more than a single object in their receptive fields. Is one object processed at a time (serial processing), or are all objects processed simultaneously (parallel processing)? Inferring the dynamics of attentional states in simultaneously recorded spike trains from sensory neurons while being exposed to a pair of visual stimuli is key to advance our understanding of visual cognition. We propose novel statistical models and measures to quantify and follow the time evolution of the visual cognition processes right after stimulus onset. We find that in the beginning processing appears to be predominantly parallel, which develops into serial processing 150 – 200 ms after stimulus onset in prefrontal cortex of rhesus monkeys.

Download Full-text