Parallel Computing for Mining Association Rules in Distributed P2P Networks

Author(s):  
Huiwei Guan

Distributed computing and Peer-to-Peer (P2P) systems have emerged as an active research field that combines techniques which cover networks, distributed computing, distributed database, and the various distributed applications. Distributed Computing and P2P systems realize information systems that scale to voluminous information on very large numbers of participating nodes. Data mining on large distributed databases is a very important research area. Recently, most work for mining association rules focused on a single machine or client-server network model. However, this traditional approach does not satisfy the requirements from the large distributed databases and applications in a P2P computing system. Two important challenges are raised, one is how to implement data mining for large distributed databases in P2P computing systems, and the other is how to develop parallel data mining algorithms and tools for the distributed P2P computing systems to improve the efficiency. In this chapter, a parallel association rule mining approach in a P2P computing system is designed and implemented, which satisfies the distribution of the P2P computing system well and makes parallel computing become true. The performance and comparison of the parallel algorithm with the sequential algorithm is analyzed and evaluated, which presents the parallel algorithm features consistent implementation, higher performance, and fine scalable ability.

Data Mining ◽  
2013 ◽  
pp. 107-124
Author(s):  
Huiwei Guan

Distributed computing and Peer-to-Peer (P2P) systems have emerged as an active research field that combines techniques which cover networks, distributed computing, distributed database, and the various distributed applications. Distributed Computing and P2P systems realize information systems that scale to voluminous information on very large numbers of participating nodes. Data mining on large distributed databases is a very important research area. Recently, most work for mining association rules focused on a single machine or client-server network model. However, this traditional approach does not satisfy the requirements from the large distributed databases and applications in a P2P computing system. Two important challenges are raised, one is how to implement data mining for large distributed databases in P2P computing systems, and the other is how to develop parallel data mining algorithms and tools for the distributed P2P computing systems to improve the efficiency. In this chapter, a parallel association rule mining approach in a P2P computing system is designed and implemented, which satisfies the distribution of the P2P computing system well and makes parallel computing become true. The performance and comparison of the parallel algorithm with the sequential algorithm is analyzed and evaluated, which presents the parallel algorithm features consistent implementation, higher performance, and fine scalable ability.


Author(s):  
Ghada Farouk Elkabbany ◽  
Mohamed Rasslan

Distributed computing systems allow homogenous/heterogeneous computers and workstations to act as a computing environment. In this environment, users can uniformly access local and remote resources in order to run processes. Users are not aware of which computers their processes are running on. This might pose some complicated security problems. This chapter provides a security review of distributed systems. It begins with a survey about different and diverse definitions of distributed computing systems in the literature. Different systems are discussed with emphasize on the most recent. Finally, different aspects of distributed systems security and prominent research directions are explored.


2010 ◽  
Vol 108-111 ◽  
pp. 50-56 ◽  
Author(s):  
Liang Zhong Shen

Due to the popularity of knowledge discovery and data mining, in practice as well as among academic and corporate professionals, association rule mining is receiving increasing attention. The technology of data mining is applied in analyzing data in databases. This paper puts forward a new method which is suit to design the distributed databases.


Author(s):  
A. F. Zadorozhny ◽  
V. A. Melent’ev

The aspects of topological compatibility of parallel computing systems and tasks are investigated in the present contribution. Based on the original topological model of parallel computations and on the unconventional graph description by its projections, the introduction of appropriate indexes is proposed and elucidated. On the example of hypercubic computing system (CS) and tasks with ring and star information topologies, we demonstrate the determination of indexes and their use in a comparative analysis of the applicability of interconnect with a given topology to solve the tasks with the same and different types of information topologies.


Author(s):  
Vidya S. Handur, Et. al.

Development of technology like Cloud Computing and its widespread usage has given rise to exponential increase in the volume of traffic. With this increase in huge traffic the resources in the network would either be insufficient to handle the traffic or the situation may cause some of the resources to be over utilized or underutilized. This condition leads to reduced performance of the system. To improve the performance of the system the traffic requires to be regulated such that all the resources are utilized conferring to their capacity which is known as load balancing. Load balancing has been one of the concerns in the distributed computing systems where the computing nodes do not have a global view of the network. There have been constant efforts to provide an efficient solution for load balancing through the approaches like game theory, fuzzy logic, heuristics and metaheuristics. Even though various solutions exist for balancing the load, the issue is challenging as there does not exist one best fit solution. The paper aims at the study of how Particle Swarm Optimization approach is used to achieve an optimal solution for load balancing in distributed computing system.


Author(s):  
Nutan Kumari Chauhan ◽  
Harendra Kumar

Distributed computing system (DCS) is a very popular field of computer science. DCS consists of various computers (processors) located at possibly different sites and connected by a communication link in such a manner that it appears as one system to the user. Tasks scheduling is a very interesting field of research in DCS. The main objectives of tasks scheduling problems are load balancing of processors, maximization of system reliability, minimizing the system cost, and minimizing the response time. Obviously, it is very complicated to satisfy all of the above objectives simultaneously. So, most of the researchers have solved the tasks scheduling problem with one or more objectives. The purpose of this chapter is to produce an overview of much (certainly not all) of tasks scheduling algorithms. The chapter is covering the little much valuable survey, tasks scheduling strategies, and different approaches used for tasks scheduling with one or more objectives.


Author(s):  
Grzegorz Chmaj ◽  
Krzysztof Walkowiak ◽  
Michał Tarnawski ◽  
Michał Kucharzak

Abstract Recently, distributed computing system have been gaining much attention due to a growing demand for various kinds of effective computations in both industry and academia. In this paper, we focus on Peer-to-Peer (P2P) computing systems, also called public-resource computing systems or global computing systems. P2P computing systems, contrary to grids, use personal computers and other relatively simple electronic equipment (e.g., the PlayStation console) to process sophisticated computational projects. A significant example of the P2P computing idea is the BOINC (Berkeley Open Infrastructure for Network Computing) project. To improve the performance of the computing system, we propose to use the P2P approach to distribute results of computational projects, i.e., results are transmitted in the system like in P2P file sharing systems (e.g., BitTorrent). In this work, we concentrate on offline optimization of the P2P computing system including two elements: scheduling of computations and data distribution. The objective is to minimize the system OPEX cost related to data processing and data transmission. We formulate an Integer Linear Problem (ILP) to model the system and apply this formulation to obtain optimal results using the CPLEX solver. Next, we propose two heuristic algorithms that provide results very


2019 ◽  
Vol 214 ◽  
pp. 03048
Author(s):  
Xiaomei Zhang ◽  
Kang Li ◽  
Xiang Hu Zhao ◽  
Tian Yan ◽  
Yong Sun

The Jiangmen Underground Neutrino Observatory (JUNO) is going to apply parallel computing in its software to accelerate JUNO data processing and fully use capability of multi-core and manycore CPUs. Therefore, it is necessary for the JUNO distributed computing system to explore the way to support single-core and multi-core jobs in a consistent way. To support multi-core jobs, a series of changes to the job descriptions, scheduling, monitoring needs to be considered, in which the pilot-based scheduling for a hybrid of single-core and multi-core jobs is the most complicated part. Two scheduling modes and their efficiency are presented and compared in this paper, and also a way to optimize efficiency is provided.


2019 ◽  
Vol 23 (2) ◽  
pp. 153-173
Author(s):  
M. Sadeq Jaafar

Purpose of research. The object of the study is a network cloud service built on the basis of a replicated database. Data in distributed computing systems are replicated in order to ensure the reliability of their storage, facilitate access to data as well as to improve the storage system performance. In this regard, the problem of analyzing the effectiveness of processing the queries to replicated databases in a network-based cloud environment, and, in particular, the problem of organizing priority query queues for updating databae copies (update requests) and for searching and reading information in databases (query-requests). The purpose of this work is to study and organize priority modes in a network distributed computing system with cloud service architecture.Methods. The study was conducted on the basis of two types of behavioural patterns: models based on Petri nets to describe and verify the functioning of a distributed computing system with replicated databases represented as a pool of resource units with several units, and models based on the GPSS simulation language for possible evaluation of passage of query time of each type in queues depending on the priority of queries.Results. Based on two simulation methods, the operation of a cloud system with database replicas was analyzed. In this system two distributed cloud computing systems interact: MANET Cloud based on a wireless network and Internet Cloud based on the Internet. These databases together are the basis of the DBaaSoD (Data Bases as a Service on Demand) cloud service (databases as a service organized at user’s query). To study this system the models of two classes were developed. The model based on Petri nets is designed to test the simulated distributed application for proper functioning. The decisions on the mapping of Petri nets on the architecture of computer networks are discussed. The simulation statistical model is used to compare the priority and non-priority maintenance modes of query- and update-requests by the criterion of average passage of time of queries in queues.Conclusion. System models based on Petri nets were tested, which showed their liveness and security, which makes it possible to move from models to building formalized specifications for network applications for network cloud services in distributed computing systems with replicated databases. The study of GPSS-model showed that in the case of priority service of update-requests, the time of passage for them is reduced by about 2 to 4 times compared with query-requests, depending on the intensity of the query-requests. In the non-priority mode, the serving conditions for update-queries deteriorate and the time of passage in the queue for them increases by about 2 to 6 times as compared with query-requests depending on the intensity of the query-requests.


Sign in / Sign up

Export Citation Format

Share Document