ZenLDA: Large-scale topic model training on distributed data-parallel platform

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

CHORD: Distributed Data-sharing via Hybrid ROS 1 and 2 for Multi-robot Exploration of Large-scale Complex Environments

IEEE Robotics and Automation Letters ◽

10.1109/lra.2021.3061393 ◽

2021 ◽

pp. 1-1

Author(s):

Muhammad Fadhil Ginting ◽

Kyohei Otsu ◽

Jeffrey Edlund ◽

Jay Gao ◽

Ali-akbar Agha-mohammadi

Keyword(s):

Data Sharing ◽

Large Scale ◽

Distributed Data ◽

Complex Environments ◽

Multi Robot

Download Full-text

Distributed Data Processing for Large-Scale Simulations on Cloud

10.1109/emc/si/pi/emceurope52599.2021.9559316 ◽

2021 ◽

Author(s):

Tianjian Lu ◽

Stephan Hoyer ◽

Qing Wang ◽

Lily Hu ◽

Yi-Fan Chen

Keyword(s):

Data Processing ◽

Large Scale ◽

Distributed Data ◽

Distributed Data Processing ◽

Large Scale Simulations

Download Full-text

The Management Strategy of Metadata in Large-scale Network Storage System

MATEC Web of Conferences ◽

10.1051/matecconf/201822801011 ◽

2018 ◽

Vol 228 ◽

pp. 01011

Author(s):

Haifeng Zhong ◽

Jianying Xiong

Keyword(s):

Large Scale ◽

Storage System ◽

Hash Table ◽

Distributed Hash Table ◽

Distributed Data ◽

Mass Storage ◽

Large Scale Network ◽

Network Storage System ◽

Mass Storage System ◽

Scale Network

The wan Internet storage system based on Distributed Hash Table uses fully distributed data and metadata management, and constructs an extensible and efficient mass storage system for the application based on Internet. However, such systems work in highly dynamic environments, and the frequent entry and exit of nodes will lead to huge communication costs. Therefore, this paper proposes a new hierarchical metadata routing management mechanism based on DHT, which makes full use of the node stabilization point to reduce the maintenance overhead of the overlay. Analysis shows that the algorithm can effectively improve efficiency and enhance stability.

Download Full-text

A distributed data management system to support large-scale data analysis

Journal of Systems and Software ◽

10.1016/j.jss.2018.11.007 ◽

2019 ◽

Vol 148 ◽

pp. 105-115 ◽

Cited By ~ 6

Author(s):

Tamer Z. Emara ◽

Joshua Zhexue Huang

Keyword(s):

Data Analysis ◽

Data Management ◽

Management System ◽

Large Scale ◽

Data Management System ◽

Distributed Data ◽

Distributed Data Management ◽

Large Scale Data ◽

Scale Data

Download Full-text

Parallel Object-Oriented Computation Applied to a Finite Element Problem

Scientific Programming ◽

10.1155/1993/859092 ◽

1993 ◽

Vol 2 (4) ◽

pp. 133-144 ◽

Cited By ~ 2

Author(s):

Jon B. Weissman ◽

Andrew S. Grimshaw ◽

R.D. Ferraro

Keyword(s):

Finite Element ◽

Message Passing ◽

Large Scale ◽

Processing System ◽

Object Oriented ◽

Data Parallel ◽

Programming Tools ◽

Comparable Performance ◽

Oriented Parallel ◽

Performance Results

The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.

Download Full-text