Research on Data Mining Optimization and Security Based on MapReduce

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.631-632.1053 ◽

2014 ◽

Vol 631-632 ◽

pp. 1053-1056

Author(s):

Hui Xia

Keyword(s):

Data Mining ◽

Execution Time ◽

Cluster Computing ◽

Limited Resource ◽

Experimental Results ◽

Computing Environment ◽

Cluster Systems ◽

National Education ◽

Distributed Cluster ◽

Data Optimization

The paper addressed the issues of limited resource for data optimization for efficiency, reliability, scalability and security of data in distributed, cluster systems with huge datasets. The study’s experimental results predicted that the MapReduce tool developed improved data optimization. The system exhibits undesired speedup with smaller datasets, but reasonable speedup is achieved with a larger enough datasets that complements the number of computing nodes reducing the execution time by 30% as compared to normal data mining and processing. The MapReduce tool is able to handle data growth trendily, especially with larger number of computing nodes. Scaleup gracefully grows as data and number of computing nodes increases. Security of data is guaranteed at all computing nodes since data is replicated at various nodes on the cluster system hence reliable. Our implementation of the MapReduce runs on distributed cluster computing environment of a national education web portal and is highly scalable.

Download Full-text

HIGH CANDIDATES GENERATION: A NEW EFFICIENT METHOD FOR MINING SHARE-FREQUENT PATTERNS

Jurnal Teknologi ◽

10.11113/jt.v79.10292 ◽

2017 ◽

Vol 79 (7) ◽

Author(s):

Chayanan Nawapornanan ◽

Sarun Intakosum ◽

Veera Boonjing

Keyword(s):

Data Mining ◽

Efficient Method ◽

Execution Time ◽

Experimental Results ◽

Closure Property ◽

Frequent Patterns ◽

Research Issue ◽

Important Research ◽

Useful Knowledge ◽

Downward Closure

The share frequent patterns mining is more practical than the traditional frequent patternset mining because it can reflect useful knowledge such as total costs and profits of patterns. Mining share-frequent patterns becomes one of the most important research issue in the data mining. However, previous algorithms extract a large number of candidate and spend a lot of time to generate and test a large number of useless candidate in the mining process. This paper proposes a new efficient method for discovering share-frequent patterns. The new method reduces a number of candidates by generating candidates from only high transaction-measure-value patterns. The downward closure property of transaction-measure-value patterns assures correctness of the proposed method. Experimental results on dense and sparse datasets show that the proposed method is very efficient in terms of execution time. Also, it decreases the number of generated useless candidates in the mining process by at least 70%.

Download Full-text

Research and application on algorithms of data mining for EMU malfunction’s data under a cloud computing environment

Artificial Intelligence and Industrial Application ◽

10.2495/aiia140571 ◽

2015 ◽

Author(s):

C. Zhang ◽

H. Hu

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Computing Environment ◽

Cloud Computing Environment

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

Lightweight Blockchain Processing. Case Study: Scanned Document Tracking on Tezos Blockchain

Applied Sciences ◽

10.3390/app11157169 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7169

Author(s):

Mohamed Allouche ◽

Tarek Frikha ◽

Mihai Mitrea ◽

Gérard Memmi ◽

Faten Chaabane

Keyword(s):

Load Balancing ◽

Relative Error ◽

Execution Time ◽

General Purpose ◽

Experimental Results ◽

Raspberry Pi ◽

Embedded Platform ◽

Memory Resources ◽

Processing Solution

To bridge the current gap between the Blockchain expectancies and their intensive computation constraints, the present paper advances a lightweight processing solution, based on a load-balancing architecture, compatible with the lightweight/embedding processing paradigms. In this way, the execution of complex operations is securely delegated to an off-chain general-purpose computing machine while the intimate Blockchain operations are kept on-chain. The illustrations correspond to an on-chain Tezos configuration and to a multiprocessor ARM embedded platform (integrated into a Raspberry Pi). The performances are assessed in terms of security, execution time, and CPU consumption when achieving a visual document fingerprint task. It is thus demonstrated that the advanced solution makes it possible for a computing intensive application to be deployed under severely constrained computation and memory resources, as set by a Raspberry Pi 3. The experimental results show that up to nine Tezos nodes can be deployed on a single Raspberry Pi 3 and that the limitation is not derived from the memory but from the computation resources. The execution time with a limited number of fingerprints is 40% higher than using a classical PC solution (value computed with 95% relative error lower than 5%).

Download Full-text

Algorithm of Text Categorization Based on Cloud Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.311.158 ◽

2013 ◽

Vol 311 ◽

pp. 158-163 ◽

Cited By ~ 1

Author(s):

Li Qin Huang ◽

Li Qun Lin ◽

Yan Huang Liu

Keyword(s):

Cloud Computing ◽

Text Categorization ◽

Experimental Results ◽

Support Vector ◽

Computing Environment ◽

Mapreduce Framework ◽

Cloud Computing Environment ◽

Environment Map ◽

Vector Machines ◽

Parallel Text

MapReduce framework of cloud computing has an effective way to achieve massive text categorization. In this paper a distributed parallel text training algorithm in cloud computing environment based on multi-class Support Vector Machines(SVM) is designed. In cloud computing environment Map tasks realize distributing various types of samples and Reduce tasks realize the specific SVM training. Experimental results show that the execution time of text training decreases with the number of Reduce tasks increasing. Also a parallel text classifying based on cloud computing is designed and implemented, which classify the unknown type texts. Experimental results show that the speed of text classifying increases with the number of Map tasks increasing.

Download Full-text

A UMDA-Based Discretization Method for Continuous Attributes

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.1834 ◽

2011 ◽

Vol 403-408 ◽

pp. 1834-1838

Author(s):

Jing Zhao ◽

Chong Zhao Han ◽

Bin Wei ◽

De Qiang Han

Keyword(s):

Machine Learning ◽

Data Mining ◽

Evolutionary Algorithms ◽

Marginal Distribution ◽

Convergence Speed ◽

Fast Convergence ◽

Experimental Results ◽

Discretization Method ◽

Bottom Up ◽

Global Dynamic

Discretization of continuous attributes have played an important role in machine learning and data mining. They can not only improve the performance of the classifier, but also reduce the space of the storage. Univariate Marginal Distribution Algorithm is a modified Evolutionary Algorithms, which has some advantages over classical Evolutionary Algorithms such as the fast convergence speed and few parameters need to be tuned. In this paper, we proposed a bottom-up, global, dynamic, and supervised discretization method on the basis of Univariate Marginal Distribution Algorithm.The experimental results showed that the proposed method could effectively improve the accuracy of classifier.

Download Full-text

A Survey Paper on Task Scheduling Methods in Cluster Computing Environment for High Performance

2015 Fifth International Conference on Advanced Computing & Communication Technologies ◽

10.1109/acct.2015.64 ◽

2015 ◽

Author(s):

Harvinder Singh ◽

Gurdev Singh

Keyword(s):

Task Scheduling ◽

High Performance ◽

Cluster Computing ◽

Computing Environment ◽

Survey Paper

Download Full-text

On Construction of a Diskless Cluster Computing Environment in a Computer Classroom

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2012100105 ◽

2012 ◽

Vol 4 (4) ◽

pp. 68-88

Author(s):

Chao-Tung Yang ◽

Wen-Feng Hsieh

Keyword(s):

High Performance ◽

Cluster Computing ◽

Relevant Information ◽

Computing Environment ◽

Cluster Architecture ◽

Computer Classroom ◽

Computation Node ◽

Cluster Environment ◽

Computing Performance ◽

Performance Computing

This paper’s objective is to implement and evaluate a high-performance computing environment by clustering idle PCs (personal computers) with diskless slave nodes on campuses to obtain the effectiveness of the largest computer potency. Two sets of Cluster platforms, BCCD and DRBL, are used to compare computing performance. It’s to prove that DRBL has better performance than BCCD in this experiment. Originally, DRBL was created to facilitate instructions for a Free Software Teaching platform. In order to achieve the purpose, DRBL is applied to the computer classroom with 32 PCs so to enable PCs to be switched manually or automatically among different OS (operating systems). The bioinformatics program, mpiBLAST, is executed smoothly in the Cluster architecture as well. From management’s view, the state of each Computation Node in Clusters is monitored by “Ganglia”, an existing Open Source. The authors gather the relevant information of CPU, Memory, and Network Load for each Computation Node in every network section. Through comparing aspects of performance, including performance of Swap and different network environment, they attempted to find out the best Cluster environment in a computer classroom at the school. Finally, HPL of HPCC is used to demonstrate cluster performance.

Download Full-text

A Hybrid Algorithm of Mining Closed Itemsets for Large Databases

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.145.292 ◽

2011 ◽

Vol 145 ◽

pp. 292-296

Author(s):

Lee Wen Huang

Keyword(s):

Data Mining ◽

Association Rules ◽

Execution Time ◽

Hybrid Algorithm ◽

Hybrid Approach ◽

Market Basket Analysis ◽

Market Basket ◽

Large Databases ◽

Closed Itemsets ◽

Simulation Results

Data Mining means a process of nontrivial extraction of implicit, previously and potentially useful information from data in databases. Mining closed large itemsets is a further work of mining association rules, which aims to find the set of necessary subsets of large itemsets that could be representative of all large itemsets. In this paper, we design a hybrid approach, considering the character of data, to mine the closed large itemsets efficiently. Two features of market basket analysis are considered – the number of items is large; the number of associated items for each item is small. Combining the cut-point method and the hash concept, the new algorithm can find the closed large itemsets efficiently. The simulation results show that the new algorithm outperforms the FP-CLOSE algorithm in the execution time and the space of storage.

Download Full-text

Service Composition with Data Mining in Ubiquitous Computing Environment

The KIPS Transactions PartD ◽

10.3745/kipstd.2006.13d.4.491 ◽

2006 ◽

Vol 13D (4) ◽

pp. 491-500

Author(s):

Sun-Young Lee ◽

Jong-Yun Lee

Keyword(s):

Data Mining ◽

Ubiquitous Computing ◽

Service Composition ◽

Computing Environment ◽

Ubiquitous Computing Environment

Download Full-text