MemBox: Shared Memory Device for Memory-Centric Computing Applicable to Deep Learning Problems

Large-scale computational problems that need to be addressed in modern computers, such as deep learning or big data analysis, cannot be solved in a single computer, but can be solved with distributed computer systems. Since most distributed computing systems, consisting of a large number of networked computers, should propagate their computational results to each other, they can suffer the problem of an increasing overhead, resulting in lower computational efficiencies. To solve these problems, we proposed an architecture of a distributed system that used a shared memory that is simultaneously accessible by multiple computers. Our architecture aimed to be implemented in FPGA or ASIC. Using an FPGA board that implemented our architecture, we configured the actual distributed system and showed the feasibility of our system. We compared the results of the deep learning application test using our architecture with that using Google Tensorflow’s parameter server mechanism. We showed improvements in our architecture beyond Google Tensorflow’s parameter server mechanism and we determined the future direction of research by deriving the expected problems.

Download Full-text

Shared mini/microcomputer memory performance at remote computer network nodes in large scale distributed computing systems

Microprocessing and Microprogramming ◽

10.1016/0165-6074(87)90266-3 ◽

1987 ◽

Vol 19 (2) ◽

pp. 143-152

Author(s):

Arumalla V Reddi

Keyword(s):

Distributed Computing ◽

Large Scale ◽

Computer Network ◽

Memory Performance ◽

Distributed Computing Systems ◽

Computing Systems ◽

Network Nodes ◽

Remote Computer

Download Full-text

Models for Configuring Large-Scale Distributed Computing Systems

AT&T Technical Journal ◽

10.1002/j.1538-7305.1985.tb00360.x ◽

1985 ◽

Vol 64 (2) ◽

pp. 491-532 ◽

Cited By ~ 18

Author(s):

B. Gavish

Keyword(s):

Distributed Computing ◽

Large Scale ◽

Distributed Computing Systems ◽

Computing Systems

Download Full-text

Exploiting P2P and Grid Computing Technologies for Resource Sharing to Support High Performance Distributed System

Handbook of Research on P2P and Grid Systems for Service-Oriented Computing ◽

10.4018/978-1-61520-686-5.ch019 ◽

2010 ◽

pp. 450-475

Author(s):

Liangxiu Han

Keyword(s):

Resource Sharing ◽

High Performance ◽

Large Scale ◽

Fault Tolerant ◽

Resource Discovery ◽

Distributed Computing Systems ◽

Computing Systems ◽

Efficient Resource ◽

Service Oriented ◽

Important Design

This chapter identifies challenges and requirements for resource sharing to support high performance distributed Service-Oriented Computing (SOC) systems. The chapter draws attention to two popular and important design paradigms: Grid and Peer-to-Peer (P2P) computing systems, which are evolving as two practical solutions to supporting wide-area resource sharing over the Internet. As a fundamental task of resource sharing, the efficient resource discovery is playing an important role in the context of the SOC setting. The chapter presents the resource discovery in Grid and P2P environments through an overview of related systems, both historical and emerging. The chapter then discusses the exploitation of both technologies for facilitating the resource discovery within large-scale distributed computing systems in a flexible, scalable, fault-tolerant, interoperable and security fashion.

Download Full-text

Designing Fault Tolerance Strategy by Iterative Redundancy for Component-Based Distributed Computing Systems

Mathematical Problems in Engineering ◽

10.1155/2014/197423 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11

Author(s):

Hui Wang ◽

Yun Wang

Keyword(s):

Fault Tolerance ◽

Distributed Computing ◽

Large Scale ◽

Critical Issue ◽

Distributed Computing Systems ◽

Computing Systems ◽

Tolerance Strategy ◽

Use Of Resources ◽

Distributed Components ◽

Modular Redundancy

Reliability is a critical issue for component-based distributed computing systems, some distributed software allows the existence of large numbers of potentially faulty components on an open network. Faults are inevitable in this large-scale, complex, distributed components setting, which may include a lot of untrustworthy parts. How to provide highly reliable component-based distributed systems is a challenging problem and a critical research. Generally, redundancy and replication are utilized to realize the goal of fault tolerance. In this paper, we propose a CFI (critical fault iterative) redundancy technique, by which the efficiency can be guaranteed to make use of resources (e.g., computation and storage) and to create fault-tolerance applications. When operating in an environment with unknown components’ reliability, CFI redundancy is more efficient and adaptive than other techniques (e.g., K-Modular Redundancy and N-Version Programming). In the CFI strategy of redundancy, the function invocation relationships and invocation frequencies are employed to rank the functions’ importance and identify the most vulnerable function implemented via functionally equivalent components. A tradeoff has to be made between efficiency and reliability. In this paper, a formal theoretical analysis and an experimental analysis are presented. Compared with the existing methods, the reliability of components-based distributed system can be greatly improved by tolerating a small part of significant components.

Download Full-text

An assessment of accountability policies for large-scale distributed computing systems

Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research Cyber Security and Information Intelligence Challenges and Strategies - CSIIRW '09 ◽

10.1145/1558607.1558652 ◽

2009 ◽

Author(s):

Wonjun Lee ◽

Anna C. Squicciarini ◽

Elisa Bertino

Keyword(s):

Distributed Computing ◽

Large Scale ◽

Distributed Computing Systems ◽

Computing Systems

Download Full-text

REQUIREMENTS FOR AN EVENT-BASED SIMULATION PACKAGE FOR GRID SYSTEMS

Journal of Interconnection Networks ◽

10.1142/s0219265907001965 ◽

2007 ◽

Vol 08 (02) ◽

pp. 163-178 ◽

Cited By ~ 29

Author(s):

FATOS XHAFA ◽

JAVIER CARRETERO ◽

LEONARD BAROLLI ◽

ARJAN DURRESI

Keyword(s):

Distributed Systems ◽

Distributed Computing ◽

Large Scale ◽

Experimental Studies ◽

Distributed Applications ◽

Distributed Computing Systems ◽

Computing Systems ◽

Grid Systems ◽

Recent Developments ◽

Simulation Package

In this paper we present a study on the requirements for the design and implementation of simulation packages for Grid systems. Grids are emerging as new distributed computing systems whose main objective is to manage and allocate geographically distributed computing resources to applications and users in an efficient and transparent manner. Grid systems are at present very difficult and complex to use for experimental studies of large-scale distributed applications. Although the field of simulation of distributed computing systems is mature, recent developments in large-scale distributed systems are raising needs not present in the simulation of the traditional distributed systems. Motivated by this, we present in this work a set of basic requirements that any simulation package for Grid computing should offer. This set of functionalities is obtained after a careful review of most important existing Grid simulation packages and includes new requirements not considered in such simulation packages. Based on the identified set of requirements, a Grid simulator is developed and exemplified for the Grid scheduling problem.

Download Full-text

A Hierarchical Load Balancing Strategy Considering Communication Delay Overhead for Large Distributed Computing Systems

Mathematical Problems in Engineering ◽

10.1155/2016/5641831 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Jixiang Yang ◽

Ling Ling ◽

Haibin Liu

Keyword(s):

Distributed Systems ◽

Distributed Computing ◽

Load Balancing ◽

Distributed System ◽

Communication Delay ◽

Time Varying ◽

Random Delay ◽

Distributed Computing Systems ◽

Computing Systems ◽

And Migration

Load balancing technology can effectively exploit potential enormous compute power available on distributed systems and achieve scalability. Communication delay overhead on distributed system, which is time-varying and is usually ignored or assumed to be deterministic for traditional load balancing strategies, can greatly degrade the load balancing performance. Considering communication delay overhead and its time-varying feature, a hierarchical load balancing strategy based on generalized neural network (HLBSGNN) is presented for large distributed systems. The novelty of the HLBSGNN is threefold: (1) the hierarchy with optimized communication is employed to reduce load balancing overhead for large distributed computing systems, (2) node computation rate and communication delay randomness imposed by the communication medium are considered, and (3) communication and migration overheads are optimized via forecasting delay. Comparisons with traditional strategies, such as centralized, distributed, and random delay strategies, indicate that the HLBSGNN is more effective and efficient.

Download Full-text

Privacy Protection in Service Discovery for Large-Scale Distributed Computing Systems

2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum ◽

10.1109/ipdps.2011.250 ◽

2011 ◽

Cited By ~ 1

Author(s):

Jun Yeol Choi ◽

Zhong Yuan Li ◽

Hee Yong Youn ◽

Ohyoung Song

Keyword(s):

Distributed Computing ◽

Privacy Protection ◽

Service Discovery ◽

Large Scale ◽

Distributed Computing Systems ◽

Computing Systems

Download Full-text

Parallel Coordinates Visualization in the ELK Stack

10.51130/graphicon-2020-2-3-10 ◽

2020 ◽

pp. paper10-1-paper10-12

Author(s):

Timofei Galkin ◽

Maria Grigorieva

Keyword(s):

Data Analysis ◽

Large Scale ◽

Visual Exploration ◽

Visual Data ◽

Parallel Coordinates ◽

Distributed Computing Systems ◽

Computing Systems ◽

Multiple Parameters ◽

Interactive Analysis ◽

Visualization Tools

Modern large-scale distributed computing systems, processing large volumes of data, require mature monitoring systems able to control and track in re-sources, networks, computing tasks, queues and other components. In recent years, the ELK stack has become very popular for the monitoring of computing environment, largely due to the efficiency and flexibility of the Elastic Search storage and wide variety of Kibana visualization tools. The analysis of computing infrastructure metadata often requires the visual exploration of multiple parameters simultaneously on one graphical image. Stacked bar charts, heat maps, radar charts are widely used for the multivariate visual data analysis, but these methods have limitations on the number of parameters. In this research the authors propose to enhance the capacity of Kibana, adding Parallel Coordinates diagram - one of the most powerful method for visual interactive analysis of high-dimensional data. It allows to compare many variables together and observe correlations between them. This work describes the development process of Parallel Coordinates as a Kibana plugin, and demonstrates an example of visual data analysis based on the Nginx logs metadata.

Download Full-text