Advances in Systems Analysis, Software Engineering, and High Performance Computing - Large-Scale Distributed Computing and Applications
Latest Publications


TOTAL DOCUMENTS

11
(FIVE YEARS 0)

H-INDEX

0
(FIVE YEARS 0)

Published By IGI Global

9781615207039, 9781615207046

Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

Large scale distributed systems are used for executing a wide variety of applications; while the first distributed applications were from the scientific area, today many of them are dedicated to businesses or even to home users. The constantly increasing demand for large scale distributed applications has brought on a need for tools and frameworks that ease their development. The main role of these tools and frameworks is to assist the developer in implementing some common functionalities and patterns that are specific to distributed applications – for example, dividing a large computational task into smaller subtasks to be executed on multiple machines, or sending e-mails automatically, or managing the access to resources in a secure way. One of the most important issues that the application development frameworks have to address is the abstraction of the underlying middleware: their main objective is to relieve the application programmer from the effort of dealing with lowerlevel components. Another important aspect is the performance of the communication among the application components; hence, some development tools are specifically targeted to optimizing the communication performance. We also observe an increasing interest in the interoperability among applications developed with different platforms, which has led to many standardization initiatives. This chapter discusses the issues introduced above, and makes an overview of the current tools and frameworks for developing various types of distributed applications. We start with web applications, which are the most frequently used nowadays; we introduce some general design issues, and present tools for server-side and client-side programming. Then, we discuss about developing applications in grids, clouds and peer-to-peer systems; we present the specific aspects of programming applications in these types of systems and introduce some of the most widely used tools and frameworks. The last section is dedicated to distributed workflows – complex applications that are composed of multiple smaller applications or services; the development and execution of workflows poses more challenges compared to traditional applications, requiring specific tools and runtime environments.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The resource management is an important component in LSDS implemented for a variety of architectures and services. This chapter considers the management of distributed resources, virtual resources and provides the requirements for resource management in large scale distributed system. A resource management system is defined as a service that is provided by a distributed network component system that manages a pool of named resources that is available such that a system- centric or job-centric performance metric is optimized. Due to issues such as extensibility, adaptability, site autonomy, QoS, and co-allocation, resource management systems is more challenging in large scale distributed computing environments. The taxonomy of resource management systems (RMS) for very large-scale network computing systems presents the variety of requirements for this tool. The taxonomy could be used to identify architectural approaches and issues that have not been fully explored in the research. The resource management system could support different users constrains, so the multiple policies is provided. In general, requiring the RMS to support multiple policies can compel the scheduling mechanisms to solve a multi-criteria optimization problem. An important subject presented in this chapter is Agents Frameworks for resource management that offer a mechanism for distributes resources management. The chapter ends with presentation of WSRF (Web Services Resource Framework) that is the new solution for resources management based on SOA (OGSA – Open Grid Service Architecture). Resource management in Grid implies a quite large number of functionalities, from resource discovery to scheduling, execution management, status monitoring and accounting. In this section, we shall focus on scheduling systems, and we shall present the monitoring functionalities and the Grid information systems in a further section. We shall introduce here some general issues, and then we shall present taxonomy of the scheduling systems and some details regarding the scheduling mechanisms used in the most important current Grid projects.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

This chapter covers the subject of application in LSDS. The chapter is organized in two parts. The chapter parts present two aspect of application in LSDS: the overview of applications in entire world and the method of applications development. It is also presented a description of current projects and applications in large scale distributed systems, like applications from OSG projects in USA, EGEE and SEE-GRID applications in Europe and Asia, DEISA initiative (Distributed European Infrastructure for Supercomputing Applications). This part also presents the relevant applications in LSDS, like Grids, P2P systems.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

Security in distributed systems is a combination of confidentiality, integrity and availability of their components. It mainly targets the communication channels between users and/or processes located in different computers, the access control of users / processes to resources and services, and the management of keys, users and user groups. Distributed systems are more vulnerable to security threats due to several characteristics such as their large scale, the distributed nature of the control, and the remote nature of the access. In addition, an increasing number of distributed applications (such as Internet banking) manipulate sensitive information and have special security requirements. After discussing important security concepts in the Background section, this chapter addresses several important problems that are at the aim of current research in the security of large scale distributed systems: security models (which represent the theoretical foundation for solving security problems), access control (more specific the access control in distributed multi-organizational platforms), secure communication (with emphasis on the secure group communication, which is a hot topic in security research today), security management (especially key management for collaborative environments), secure distributed architectures (which are the blueprints for designing and building security systems), and security environments / frameworks.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

This chapter introduces the macroscopic views on distributed systems’ components and their inter-relations. The importance of the architecture for understanding, designing, implementing, and maintaining distributed systems is presented first. Then the currently used architectures and their derivatives are analyzed. The presentation refers to the client-server (with details about Multi-tiered, REST, Remote Evaluation, and Code-on-Demand architectures), hierarchical (with insights in the protocol oriented Grid architecture), service-oriented architectures including OGSA (Open Grid Service Architecture), cloud, cluster, and peer-to-peer (with its versions: hierarchical, decentralized, distributed, and event-based integration architectures). Due to the relation between architecture and application categories supported, the chapter’s structure is similar to that of Chapter 1. Nevertheless, the focus is different. In the current chapter, for each architecture the model, advantages, disadvantages and areas of applicability are presented. Also the chapter includes concrete cases of use (namely actual distributed systems and platforms), and clarifies the relation between the architecture and the enabling technology used in its instantiation. Finally, Chapter 2 frames the discussion in the other chapters, which refer to specific components and services for large scale distributed systems.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The general presentation of Large Scale Distributed Computing and Applications can be done from different perspectives: historical, conceptual, architectural, technological, social, and others. This Introduction takes a pragmatic approach. It starts with a short presentation of definitions, goals, and fundamental concepts that frame the subjects targeted in the book: the Internet, the Web, Enterprise Information Systems, Peer-to-Peer Systems, Grids, Utility Computer Systems, and others. Then, each of these actual large scale distributed system categories is characterized in terms of typical applications, motivation of use, requirements and problems posed by their development: specific concepts, models, paradigms, and technologies. The focus is on describing the Large Scale Distributed Computing such as it appears today. Nevertheless, presenting actually used solutions will offer the opportunity to found that older theoretical results can still be exploited to build high performance artifacts. Also, the ever-ending stimulating relationship between users, who require better computing services, and providers, who discover new ways to satisfy them, is the motivation to introduce future trends in the domain, which pave the way towards the next generation Cyberinfrastructure. The chapter introduces a comprehensive set of concepts, models, and technologies, which are discussed in details in the next chapters.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The domains of usage of large scale distributed systems have been extending during the past years from scientific to commercial applications. Together with the extension of the application domains, new requirements have emerged for large scale distributed systems. Among these requirements, fault tolerance is needed by more and more modern distributed applications, not only by the critical ones. In this chapter we analyze current existing work in enabling fault tolerance in case of large scale distributed systems, presenting specific problem, existing solution, as well as several future trends. The characteristics of these systems pose problems to ensuring fault tolerance especially because of their complexity, involving many resources and users geographically distributed, because of the volatility of resources that are available only for limited amounts of time, and because of the constraints imposed by the applications and resource owners. A general fault tolerant architecture should, at a minimum, be comprised of at least a mechanism to detect failures and a component capable to recover and handle the detected failures, usually using some form of a replication mechanism. In this chapter we analyzed existing fault tolerance implementations, as well as solutions adopted in real world large scale distributed systems. We analyzed the fault tolerance architectures being proposed for particular distributed architectures, such as Grid or P2P systems.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

Communication in large scale distributed systems has a major impact on the overall performance and widely acceptance of such systems. In this chapter we analyze existing work in enabling high-performance communications in large scale distributed systems, presenting specific problems and existing solutions, as well as several future trends. Because applications running in Grids, P2Ps and other types of large scale distributed systems have specific communication requirements, we present different the problem of delivering efficient communication in case of P2P and Grid systems. We present existing work in enabling high-speed networks to support research worldwide, together with problems related to traffic engineering, QoS assurance, protocols designed to overcome current limitation with the TCP protocol in the context of high bandwidth traffic. We next analyze several group communication models, based on hybrid multicast delivery frameworks, path diversity, multicast trees, and distributed communication. Finally, we analyze data communication solutions specifically designed for P2P and Grid systems.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The latest advances in network and distributedsystem technologies now allow integration of a vast variety of services with almost unlimited processing power, using large amounts of data. Sharing of resources is often viewed as the key goal for distributed systems, and in this context the sharing of stored data appears as the most important aspect of distributed resource sharing. Scientific applications are the first to take advantage of such environments as the requirements of current and future high performance computing experiments are pressing, in terms of even higher volumes of issued data to be stored and managed. While these new environments reveal huge opportunities for large-scale distributed data storage and management, they also raise important technical challenges, which need to be addressed. The ability to support persistent storage of data on behalf of users, the consistent distribution of up-to-date data, the reliable replication of fast changing datasets or the efficient management of large data transfers are just some of these new challenges. In this chapter we discuss how the existing distributed computing infrastructure is adequate for supporting the required data storage and management functionalities. We highlight the issues raised from storing data over large distributed environments and discuss the recent research efforts dealing with challenges of data retrieval, replication and fast data transfers. Interaction of data management with other data sensitive, emerging technologies as the workflow management is also addressed.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The architectural shift presented in the previous chapters towards high performance computers assembled from large numbers of commodity resources raises numerous design issues and assumptions pertaining to traceability, fault tolerance and scalability. Hence, one of the key challenges faced by high performance distributed systems is scalable monitoring of system state. The aim of this chapter is to realize a survey study of existing work and trends in distributed systems monitoring by introducing the involved concepts and requirements, techniques, models and related standardization activities. Monitoring can be defined as the process of dynamic collection, interpretation and presentation of information concerning the characteristics and status of resources of interest. It is needed for various purposes such as debugging, testing, program visualization and animation. It may also be used for general management activities, which have a more permanent and continuous nature (performance management, configuration management, fault management, security management, etc.). In this case the behavior of the system is observed and monitoring information is gathered. This information is used to make management decisions and perform the appropriate control actions on the system. Unlike monitoring which is generally a passive process, control actively changes the behavior of the managed system and it has to be considered and modeled separately. Monitoring proves to be an essential process to observe and improve the reliability and the performance of large-scale distributed systems.


Sign in / Sign up

Export Citation Format

Share Document