scholarly journals MemBox: Shared Memory Device for Memory-Centric Computing Applicable to Deep Learning Problems

Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2720
Author(s):  
Yongseok Choi ◽  
Eunji Lim ◽  
Jaekwon Shin ◽  
Cheol-Hoon Lee

Large-scale computational problems that need to be addressed in modern computers, such as deep learning or big data analysis, cannot be solved in a single computer, but can be solved with distributed computer systems. Since most distributed computing systems, consisting of a large number of networked computers, should propagate their computational results to each other, they can suffer the problem of an increasing overhead, resulting in lower computational efficiencies. To solve these problems, we proposed an architecture of a distributed system that used a shared memory that is simultaneously accessible by multiple computers. Our architecture aimed to be implemented in FPGA or ASIC. Using an FPGA board that implemented our architecture, we configured the actual distributed system and showed the feasibility of our system. We compared the results of the deep learning application test using our architecture with that using Google Tensorflow’s parameter server mechanism. We showed improvements in our architecture beyond Google Tensorflow’s parameter server mechanism and we determined the future direction of research by deriving the expected problems.

Author(s):  
Liangxiu Han

This chapter identifies challenges and requirements for resource sharing to support high performance distributed Service-Oriented Computing (SOC) systems. The chapter draws attention to two popular and important design paradigms: Grid and Peer-to-Peer (P2P) computing systems, which are evolving as two practical solutions to supporting wide-area resource sharing over the Internet. As a fundamental task of resource sharing, the efficient resource discovery is playing an important role in the context of the SOC setting. The chapter presents the resource discovery in Grid and P2P environments through an overview of related systems, both historical and emerging. The chapter then discusses the exploitation of both technologies for facilitating the resource discovery within large-scale distributed computing systems in a flexible, scalable, fault-tolerant, interoperable and security fashion.


2014 ◽  
Vol 2014 ◽  
pp. 1-11
Author(s):  
Hui Wang ◽  
Yun Wang

Reliability is a critical issue for component-based distributed computing systems, some distributed software allows the existence of large numbers of potentially faulty components on an open network. Faults are inevitable in this large-scale, complex, distributed components setting, which may include a lot of untrustworthy parts. How to provide highly reliable component-based distributed systems is a challenging problem and a critical research. Generally, redundancy and replication are utilized to realize the goal of fault tolerance. In this paper, we propose a CFI (critical fault iterative) redundancy technique, by which the efficiency can be guaranteed to make use of resources (e.g., computation and storage) and to create fault-tolerance applications. When operating in an environment with unknown components’ reliability, CFI redundancy is more efficient and adaptive than other techniques (e.g., K-Modular Redundancy and N-Version Programming). In the CFI strategy of redundancy, the function invocation relationships and invocation frequencies are employed to rank the functions’ importance and identify the most vulnerable function implemented via functionally equivalent components. A tradeoff has to be made between efficiency and reliability. In this paper, a formal theoretical analysis and an experimental analysis are presented. Compared with the existing methods, the reliability of components-based distributed system can be greatly improved by tolerating a small part of significant components.


2007 ◽  
Vol 08 (02) ◽  
pp. 163-178 ◽  
Author(s):  
FATOS XHAFA ◽  
JAVIER CARRETERO ◽  
LEONARD BAROLLI ◽  
ARJAN DURRESI

In this paper we present a study on the requirements for the design and implementation of simulation packages for Grid systems. Grids are emerging as new distributed computing systems whose main objective is to manage and allocate geographically distributed computing resources to applications and users in an efficient and transparent manner. Grid systems are at present very difficult and complex to use for experimental studies of large-scale distributed applications. Although the field of simulation of distributed computing systems is mature, recent developments in large-scale distributed systems are raising needs not present in the simulation of the traditional distributed systems. Motivated by this, we present in this work a set of basic requirements that any simulation package for Grid computing should offer. This set of functionalities is obtained after a careful review of most important existing Grid simulation packages and includes new requirements not considered in such simulation packages. Based on the identified set of requirements, a Grid simulator is developed and exemplified for the Grid scheduling problem.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Jixiang Yang ◽  
Ling Ling ◽  
Haibin Liu

Load balancing technology can effectively exploit potential enormous compute power available on distributed systems and achieve scalability. Communication delay overhead on distributed system, which is time-varying and is usually ignored or assumed to be deterministic for traditional load balancing strategies, can greatly degrade the load balancing performance. Considering communication delay overhead and its time-varying feature, a hierarchical load balancing strategy based on generalized neural network (HLBSGNN) is presented for large distributed systems. The novelty of the HLBSGNN is threefold: (1) the hierarchy with optimized communication is employed to reduce load balancing overhead for large distributed computing systems, (2) node computation rate and communication delay randomness imposed by the communication medium are considered, and (3) communication and migration overheads are optimized via forecasting delay. Comparisons with traditional strategies, such as centralized, distributed, and random delay strategies, indicate that the HLBSGNN is more effective and efficient.


2020 ◽  
pp. paper10-1-paper10-12
Author(s):  
Timofei Galkin ◽  
Maria Grigorieva

Modern large-scale distributed computing systems, processing large volumes of data, require mature monitoring systems able to control and track in re-sources, networks, computing tasks, queues and other components. In recent years, the ELK stack has become very popular for the monitoring of computing environment, largely due to the efficiency and flexibility of the Elastic Search storage and wide variety of Kibana visualization tools. The analysis of computing infrastructure metadata often requires the visual exploration of multiple parameters simultaneously on one graphical image. Stacked bar charts, heat maps, radar charts are widely used for the multivariate visual data analysis, but these methods have limitations on the number of parameters. In this research the authors propose to enhance the capacity of Kibana, adding Parallel Coordinates diagram - one of the most powerful method for visual interactive analysis of high-dimensional data. It allows to compare many variables together and observe correlations between them. This work describes the development process of Parallel Coordinates as a Kibana plugin, and demonstrates an example of visual data analysis based on the Nginx logs metadata.


Sign in / Sign up

Export Citation Format

Share Document