scholarly journals Multi-GPU Support on Single Node Using Directive-Based Programming Model

2015 ◽  
Vol 2015 ◽  
pp. 1-15 ◽  
Author(s):  
Rengan Xu ◽  
Xiaonan Tian ◽  
Sunita Chandrasekaran ◽  
Barbara Chapman

Existing studies show that using single GPU can lead to obtaining significant performance gains. We should be able to achieve further performance speedup if we use more than one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential and are often considered as a leading candidate for porting complex scientific applications. Unfortunately programming heterogeneous systems requires more effort than what is required for traditional multicore systems. Directive-based programming approaches are being widely adopted since they make it easy to use/port/maintain application code. OpenMP and OpenACC are two popular models used to port applications to accelerators. However, neither of the models provides support for multiple GPUs. A plausible solution is to use combination of OpenMP and OpenACC that forms a hybrid model; however, building this model has its own limitations due to lack of necessary compilers’ support. Moreover, the model also lacks support for direct device-to-device communication. To overcome these limitations, an alternate strategy is to extend OpenACC by proposing and developing extensions that follow a task-based implementation for supporting multiple GPUs. We critically analyze the applicability of the hybrid model approach and evaluate the proposed strategy using several case studies and demonstrate their effectiveness.


Author(s):  
Ramon Amela ◽  
Cristian Ramon-Cortes ◽  
Jorge Ejarque ◽  
Javier Conejero ◽  
Rosa M. Badia

Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.



2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Jayaraman J. Thiagarajan ◽  
Deepta Rajan ◽  
Sameeksha Katoch ◽  
Andreas Spanias

Abstract Effective patient care mandates rapid, yet accurate, diagnosis. With the abundance of non-invasive diagnostic measurements and electronic health records (EHR), manual interpretation for differential diagnosis has become time-consuming and challenging. This has led to wide-spread adoption of AI-powered tools, in pursuit of improving accuracy and efficiency of this process. While the unique challenges presented by each modality and clinical task demand customized tools, the cumbersome process of making problem-specific choices has triggered the critical need for a generic solution to enable rapid development of models in practice. In this spirit, we develop DDxNet, a deep architecture for time-varying clinical data, which we demonstrate to be well-suited for diagnostic tasks involving different modalities (ECG/EEG/EHR), required level of characterization (abnormality detection/phenotyping) and data fidelity (single-lead ECG/22-channel EEG). Using multiple benchmark problems, we show that DDxNet produces high-fidelity predictive models, and sometimes even provides significant performance gains over problem-specific solutions.





Author(s):  
Christopher K. Allen ◽  
Andrew J. Goupee ◽  
Jeffrey Lindner ◽  
Robert Berry

This work investigates the implementation of a novel, NASA-developed Fluid Harmonic Absorber (FHA) technology to mitigate platform motions and structural loads that can lead to lighter platforms, increased turbine performance, and ultimately, a lower LCOE. The novel damping strategy takes advantage of existing water ballast in the VolturnUS semi-submersible platform to achieve significant performance gains with minimal additional equipment and complexity. NREL’s FOWT software FAST is modified to include the primary features of the FHA technology. A study of the University of Maine-developed VolturnUS semi-submersible FOWT augmented with FHA technology is undertaken to quantify global performance of the system. When compared to the baseline technology, numerical simulations of a redesigned platform utilizing the FHA dampers indicate a reduction of 15.8% in hull structural material. Finally, the improvements in LCOE resulting from this mass reduction are assessed to demonstrate the advantages of NASA’s FHA technology for FOWT applications.



Electronics ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 648 ◽  
Author(s):  
Xiangpeng Wan ◽  
Hakim Ghazzai ◽  
Yehia Massoud

Modern taxi services are usually classified into two major categories: traditional taxicabs and ride-hailing services. For both services, it is required to design highly efficient recommendation systems to satisfy passengers’ quality of experience and drivers’ benefits. Customers desire to minimize their waiting time before rides, while drivers aim to speed up their customer hunting. In this paper, we propose to leverage taxi service efficiency by designing a generic and smart recommendation system that exploits the benefits of Vehicular Social Networks (VSNs). Aiming at optimizing three key performance metrics, number of pick-ups, customer waiting time, and vacant traveled distance for both taxi services, the proposed recommendation system starts by efficiently estimating the future customer demands in different clusters of the area of interest. Then, it proposes an optimal taxi-to-region matching according to the location of each taxi and the future requested demand of each region. Finally, an optimized geo-routing algorithm is developed to minimize the navigation time spent by drivers. Our simulation model is applied to the borough of Manhattan and is validated with realistic data. Selected results show that significant performance gains are achieved thanks to the additional cooperation among taxi drivers enabled by VSN, as compared to traditional cases.



Author(s):  
Ximing Li ◽  
Jiaojiao Zhang ◽  
Jihong Ouyang

Conventional topic models suffer from a severe sparsity problem when facing extremely short texts such as social media posts. The family of Dirichlet multinomial mixture (DMM) can handle the sparsity problem, however, they are still very sensitive to ordinary and noisy words, resulting in inaccurate topic representations at the document level. In this paper, we alleviate this problem by preserving local neighborhood structure of short texts, enabling to spread topical signals among neighboring documents, so as to correct the inaccurate topic representations. This is achieved by using variational manifold regularization, constraining the close short texts should have similar variational topic representations. Upon this idea, we propose a novel Laplacian DMM (LapDMM) topic model. During the document graph construction, we further use the word mover’s distance with word embeddings to measure document similarities at the semantic level. To evaluate LapDMM, we compare it against the state-of-theart short text topic models on several traditional tasks. Experimental results demonstrate that our LapDMM achieves very significant performance gains over baseline models, e.g., achieving even about 0.2 higher scores on clustering and classification tasks in many cases.



2020 ◽  
Vol 34 (04) ◽  
pp. 6267-6274
Author(s):  
Xiao Wang ◽  
Ruijia Wang ◽  
Chuan Shi ◽  
Guojie Song ◽  
Qingyong Li

The interactions of users and items in recommender system could be naturally modeled as a user-item bipartite graph. In recent years, we have witnessed an emerging research effort in exploring user-item graph for collaborative filtering methods. Nevertheless, the formation of user-item interactions typically arises from highly complex latent purchasing motivations, such as high cost performance or eye-catching appearance, which are indistinguishably represented by the edges. The existing approaches still remain the differences between various purchasing motivations unexplored, rendering the inability to capture fine-grained user preference. Therefore, in this paper we propose a novel Multi-Component graph convolutional Collaborative Filtering (MCCF) approach to distinguish the latent purchasing motivations underneath the observed explicit user-item interactions. Specifically, there are two elaborately designed modules, decomposer and combiner, inside MCCF. The former first decomposes the edges in user-item graph to identify the latent components that may cause the purchasing relationship; the latter then recombines these latent components automatically to obtain unified embeddings for prediction. Furthermore, the sparse regularizer and weighted random sample strategy are utilized to alleviate the overfitting problem and accelerate the optimization. Empirical results on three real datasets and a synthetic dataset not only show the significant performance gains of MCCF, but also well demonstrate the necessity of considering multiple components.





Sign in / Sign up

Export Citation Format

Share Document