Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs

Dynamic Programming (DP) is an important and popular method for solving a wide variety of discrete optimization problems such as scheduling, string-editing, packaging, and inventory management. DP breaks problems into simpler subproblems and combines their solutions into solutions to original ones. This paper focuses on one type of dynamic programming called Nonserial Polyadic Dynamic Programming (NPDP). To run NPDP applications efficiently on an emerging General-Purpose Graphic Processing Unit (GPGPU), the authors have to exploit more parallelism to fully utilize the computing power of the hundreds of processing units in it. However, the parallelism degree varies significantly in different phases of the NPDP applications. To address the problem, the authors propose a method that can adjust the thread-level parallelism to provide a sufficient and steadier parallelism degree for different phases. If a phase has insufficient parallelism, the authors split threads into subthreads. On the other hand, the authors can limit the total number of threads in a phase by merging threads. The authors also examine the difference between the conventional problem of finding the minimum on a GPU and the NPDP-featured problem of finding the minimums of many independent sets on a GPU. Finally, the authors examine how to design an appropriate data structure to apply the memory coalescing optimization technique. The experimental results demonstrate our method can obtain the best speedup of 13.40 over the algorithm published previously.

Download Full-text

A Parallelization of Non-Serial Polyadic Dynamic Programming on GPU

Journal of Computing and Information Technology ◽

10.20532/cit.2019.1004579 ◽

2019 ◽

Vol 27 (2) ◽

pp. 55-66

Keyword(s):

Dynamic Programming ◽

Computational Complexity ◽

State Of The Art ◽

Mapping Method ◽

Computing Power ◽

Processing Elements ◽

Load Imbalance ◽

Input Size ◽

Thread Level Parallelism ◽

Level Parallelism

Parallelization of Non-Serial Polyadic Dynamic Programming (NPDP) on high-throughput manycore architectures, such as NVIDIA GPUs, suffers from load imbalance, i.e. non-optimal mapping between the sub-problems of NPDP and the processing elements of the GPU. NPDP exhibits non-uniformity in the number of subproblems as well as computational complexity across the phases. In NPDP parallelization, phases are computed sequentially whereas subproblems of each phase are computed concurrently. Therefore, it is essential to effectively map the subproblems of each phase to the processing elements while implementing thread level parallelism. We propose an adaptive Generalized Mapping Method (GMM) for NPDP parallelization that utilizes the GPU for efficient mapping of subproblems onto processing threads in each phase. Input-size and targeted GPU decide the computing power and the best mapping for each phase in NPDP parallelization. The performance of GMM is compared with different conventional parallelization approaches. For sufficiently large inputs, our technique outperforms the state-of-the-art conventional parallelization approach and achieves a significant speedup of a factor 30. We also summarize the general heuristics for achieving better gain in the NPDP parallelization.

Download Full-text

Using Inter-Block Synchronization to Improve the Knapsack Problem on GPUs

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2018100105 ◽

2018 ◽

Vol 10 (4) ◽

pp. 83-98 ◽

Cited By ~ 1

Author(s):

Xue Sun ◽

Chao-Chin Wu ◽

Liang-Rui Chen ◽

Jian-You Lin

Keyword(s):

Dynamic Programming ◽

Shared Memory ◽

Performance Improvement ◽

Graphics Processing Unit ◽

General Purpose ◽

Parallel Processors ◽

Processing Unit ◽

Compression Method ◽

Popular Method ◽

Graphics Processing

This article describes how as one of the hot parallel processors, the general-purpose graphics processing unit (GPU) has been widely adopted to accelerate various time-consuming algorithms. Dynamic programming (DP) optimization is a popular method to solve a particular class of complex problems. This article focuses on serial-monadic DP problems onto NVIDIA GPUs. As 0/1 knapsack is one of the most representational problems in this category and it often arises in many other fields of applications. The previous work proposed the compression method to reduce the amount of data transferred, but data in shared memory cannot be reused. This article demonstrates how to apply a more condensed data structure and the inter-block synchronization to efficiently map the serial-monadic DP onto GPUs. Computational experiments reveal that the best performance improvement of the approach is about 100% comparing with the previous work.

Download Full-text

Optimizing the Learning Process of Feedforward Neural Networks Using Lightning Search Algorithm

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213016500330 ◽

2016 ◽

Vol 25 (06) ◽

pp. 1650033 ◽

Cited By ~ 26

Author(s):

Hossam Faris ◽

Ibrahim Aljarah ◽

Nailah Al-Madi ◽

Seyedali Mirjalili

Keyword(s):

Neural Network ◽

Neural Networks ◽

Optimization Problems ◽

Search Algorithm ◽

Optimization Technique ◽

Back Propagation ◽

Feedforward Neural Networks ◽

Training Algorithms ◽

Local Optima ◽

Local Solutions

Evolutionary Neural Networks are proven to be beneficial in solving challenging datasets mainly due to the high local optima avoidance. Stochastic operators in such techniques reduce the probability of stagnation in local solutions and assist them to supersede conventional training algorithms such as Back Propagation (BP) and Levenberg-Marquardt (LM). According to the No-Free-Lunch (NFL), however, there is no optimization technique for solving all optimization problems. This means that a Neural Network trained by a new algorithm has the potential to solve a new set of problems or outperform the current techniques in solving existing problems. This motivates our attempts to investigate the efficiency of the recently proposed Evolutionary Algorithm called Lightning Search Algorithm (LSA) in training Neural Network for the first time in the literature. The LSA-based trainer is benchmarked on 16 popular medical diagnosis problems and compared to BP, LM, and 6 other evolutionary trainers. The quantitative and qualitative results show that the LSA algorithm is able to show not only better local solutions avoidance but also faster convergence speed compared to the other algorithms employed. In addition, the statistical test conducted proves that the LSA-based trainer is significantly superior in comparison with the current algorithms on the majority of datasets.

Download Full-text

Analysis of Inventory Management in Reverse Supply Chain Using Stochastic Dynamic Programming Model

Volume 8: Energy Systems: Analysis, Thermodynamics and Sustainability; Sustainable Products and Processes ◽

10.1115/imece2008-67374 ◽

2008 ◽

Author(s):

Badr O. Johar ◽

Surendra M. Gupta

Keyword(s):

Dynamic Programming ◽

Inventory Management ◽

Reverse Logistics ◽

Stochastic Dynamic Programming ◽

Public Awareness ◽

Financial Burden ◽

Probabilistic Approach ◽

Complex Nature ◽

Stochastic Dynamic

Reverse logistics is a critical topic that has captured the attention of government, private entities and researchers in recent years. This increase in the concern was driven by current set of government regulations, increase of public awareness, and the attractive economic opportunities. Also, environmentalists have always demanded Original Equipment Manufacturers (OEMs) to be more involved and be responsible of their products at the end of its life cycle. However, the uncertainty in quality of items returned, and its quantity discourage OEMs from participating in such programs. Because of the unique problems associated and the complex nature of the reverse logistics activities, numerous studies have been carried out in this field. One of those crucial areas is inventory management of End-of-Life (EOL) products. The take back program could possibly bring financial burden to OEM if it is not managed well. Thus, an efficient yet cost effective system should be implemented to appropriately manage the overwhelming number of returns. Previously, we have analyzed the problem based on the assumption that the number of core products returned and disassembled parts and subassemblies are known in advance. In this paper, we introduce a probabilistic approach where different quality levels of for every component disassembled are considered and different probabilities of these qualities given the quality of the returned product. The model utilizes a multi-period stochastic dynamic programming in a disassembly line context to solve the problem, and generate the best option that will maximize the system total profit. A numerical example is given to illustrate the approach. Finally, directions for future research are suggested.

Download Full-text

Optimization in Trajectory Planning of Multi-Jointed Fingers in Dextrous Hand Designs

ASME 1991 Computers in Engineering Conference: Volume 2 — Finite Elements/Computational Geometry; Computers in Education; Robotics and Controls ◽

10.1115/cie1991-0160 ◽

1991 ◽

Author(s):

A. Meghdari ◽

H. Sayyaadi

Keyword(s):

Dynamic Programming ◽

Motion Control ◽

Trajectory Planning ◽

Degrees Of Freedom ◽

Optimization Technique ◽

Dynamic Programming Algorithm ◽

Programming Algorithm ◽

Kinematics And Dynamics ◽

Feasible Solutions ◽

Dextrous Hand

Abstract An optimization technique based on the well known Dynamic Programming Algorithm is applied to the motion control trajectories and path planning of multi-jointed fingers in dextrous hand designs. A three fingered hand with each finger containing four degrees of freedom is considered for analysis. After generating the kinematics and dynamics equations of such a hand, optimum values of the joints torques and velocities are computed such that the finger-tips of the hand are moved through their prescribed trajectories with the least time or/and energy to reach the object being grasped. Finally, optimal as well as feasible solutions for the multi-jointed fingers are identified and the results are presented.

Download Full-text

Dynamic Programming for Computing Power Indices for Weighted Voting Games with Precoalitions

Games ◽

10.3390/g13010006 ◽

2021 ◽

Vol 13 (1) ◽

pp. 6

Author(s):

Jochen Staudacher ◽

Felix Wagner ◽

Jan Filipp

Keyword(s):

Dynamic Programming ◽

Power Indices ◽

Efficient Computation ◽

Banzhaf Index ◽

Weighted Voting ◽

Computing Power ◽

Weighted Voting Games ◽

Voting Games ◽

Large Numbers ◽

New Algorithms

We study the efficient computation of power indices for weighted voting games with precoalitions amongst subsets of players (reflecting, e.g., ideological proximity) using the paradigm of dynamic programming. Starting from the state-of-the-art algorithms for computing the Banzhaf and Shapley–Shubik indices for weighted voting games, we present a framework for fast algorithms for the three most common power indices with precoalitions, i.e., the Owen index, the Banzhaf–Owen index and the symmetric coalitional Banzhaf index, and point out why our new algorithms are applicable for large numbers of players. We discuss implementations of our algorithms for the three power indices with precoalitions in C++ and review computing times, as well as storage requirements.

Download Full-text

CPU AND GPU PERFORMANCE ANALYSIS ON 2D MATRIX OPERATION

Proxies : Jurnal Informatika ◽

10.24167/proxies.v2i1.3194 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1

Author(s):

Kwek Benny Kurniawan ◽

YB Dwi Setianto

Keyword(s):

Graphic Processing Unit ◽

Computing Time ◽

General Purpose ◽

Parallel Processors ◽

Processing Unit ◽

Matrix Operation ◽

Industry Standard ◽

Programming Interfaces ◽

Complete Matrix ◽

Matrix Calculation

GPU or Graphic Processing Unit can be used on many platforms in general GPUs are used for rendering graphics but now GPUs are general purpose parallel processors with support for easily accessible programming interfaces and industry standard languages such as C, Python and Fortran. In this study, the authors will compare CPU and GPU for completing some matrix calculation. To compare between CPU and GPU, the authors have done some testing to observe the use of Processing Unit, memory and computing time to complete matrix calculations by changing matrix sizes and dimensions. The results of tests that have been done shows asynchronous GPU is faster than sequential. Furthermore, thread for GPU needs to be adjusted to achieve efficiency in GPU load.

Download Full-text

Multirow Intersection Cuts Based on the Infinity Norm

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.1027 ◽

2021 ◽

Author(s):

Álinson S. Xavier ◽

Ricardo Fukasawa ◽

Laurent Poirrier

Keyword(s):

Optimization Problems ◽

Valid Inequalities ◽

Linear Optimization ◽

Computational Cost ◽

General Purpose ◽

Mixed Integer ◽

Infinity Norm ◽

Intersection Cuts ◽

Linear Optimization Problems ◽

Mixed Integer Linear Optimization

When generating multirow intersection cuts for mixed-integer linear optimization problems, an important practical question is deciding which intersection cuts to use. Even when restricted to cuts that are facet defining for the corner relaxation, the number of potential candidates is still very large, especially for instances of large size. In this paper, we introduce a subset of intersection cuts based on the infinity norm that is very small, works for relaxations having arbitrary number of rows and, unlike many subclasses studied in the literature, takes into account the entire data from the simplex tableau. We describe an algorithm for generating these inequalities and run extensive computational experiments in order to evaluate their practical effectiveness in real-world instances. We conclude that this subset of inequalities yields, in terms of gap closure, around 50% of the benefits of using all valid inequalities for the corner relaxation simultaneously, but at a small fraction of the computational cost, and with a very small number of cuts. Summary of Contribution: Cutting planes are one of the most important techniques used by modern mixed-integer linear programming solvers when solving a variety of challenging operations research problems. The paper advances the state of the art on general-purpose multirow intersection cuts by proposing a practical and computationally friendly method to generate them.

Download Full-text