Evaluation of Clustering Algorithms on HPC Platforms

Juan M. Cebrian; Baldomero Imbernón; Jesús Soto; José M. Cecilia

doi:10.3390/math9172156

Evaluation of Clustering Algorithms on HPC Platforms

Mathematics ◽

10.3390/math9172156 ◽

2021 ◽

Vol 9 (17) ◽

pp. 2156

Author(s):

Juan M. Cebrian ◽

Baldomero Imbernón ◽

Jesús Soto ◽

José M. Cecilia

Keyword(s):

State Of The Art ◽

Clustering Algorithms ◽

Computational Cost ◽

Common Features ◽

Fitness Functions ◽

Fuzzy Methods ◽

Trade Offs ◽

Energy Trade ◽

Group A ◽

Data Elements

Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform.

Download Full-text

Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6025 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5700-5708 ◽

Cited By ~ 3

Author(s):

Jianghao Shen ◽

Yue Wang ◽

Pengfei Xu ◽

Yonggan Fu ◽

Zhangyang Wang ◽

...

Keyword(s):

State Of The Art ◽

Computational Cost ◽

Expressive Power ◽

Dependent Manner ◽

Binary Decision ◽

Trade Offs ◽

Adaptive Inference ◽

Unified View ◽

Supplementary Material ◽

Core Idea

While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, “softer” decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate “soft” choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two “extremes” (i.e., full bitwidth and zero bitwidth). In this way, DFS can “fractionally” exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at https://github.com/Torment123/DFS.

Download Full-text

Exploring Fault-Energy Trade-offs in Approximate DNN Hardware Accelerators

2021 22nd International Symposium on Quality Electronic Design (ISQED) ◽

10.1109/isqed51717.2021.9424345 ◽

2021 ◽

Author(s):

Ayesha Siddique ◽

Kanad Basu ◽

Khaza Anuarul Hoque

Keyword(s):

Hardware Accelerators ◽

Trade Offs ◽

Energy Trade

Download Full-text

Performance and energy trade-offs analysis of L2 on-chip cache architectures for embedded MPSoCs

Proceedings of the 20th symposium on Great lakes symposium on VLSI - GLSVLSI '10 ◽

10.1145/1785481.1785552 ◽

2010 ◽

Cited By ~ 4

Author(s):

Mohamed M. Sabry ◽

Martino Ruggiero ◽

Pablo G. Del Valle

Keyword(s):

Trade Offs ◽

Energy Trade ◽

On Chip

Download Full-text

Plant Leaf Disease Recognition Using Depth-Wise Separable Convolution-Based Models

Symmetry ◽

10.3390/sym13030511 ◽

2021 ◽

Vol 13 (3) ◽

pp. 511

Author(s):

Syed Mohammad Minhaz Hossain ◽

Kaushik Deb ◽

Pranab Kumar Dhar ◽

Takeshi Koshiba

Keyword(s):

State Of The Art ◽

Computational Cost ◽

Region Of Interest ◽

Number Of Clusters ◽

Plant Leaf ◽

Leaf Disease ◽

Automatic Initialization ◽

Adequate Accuracy ◽

Model Size ◽

High Computational Cost

Proper plant leaf disease (PLD) detection is challenging in complex backgrounds and under different capture conditions. For this reason, initially, modified adaptive centroid-based segmentation (ACS) is used to trace the proper region of interest (ROI). Automatic initialization of the number of clusters (K) using modified ACS before recognition increases tracing ROI’s scalability even for symmetrical features in various plants. Besides, convolutional neural network (CNN)-based PLD recognition models achieve adequate accuracy to some extent. However, memory requirements (large-scaled parameters) and the high computational cost of CNN-based PLD models are burning issues for the memory restricted mobile and IoT-based devices. Therefore, after tracing ROIs, three proposed depth-wise separable convolutional PLD (DSCPLD) models, such as segmented modified DSCPLD (S-modified MobileNet), segmented reduced DSCPLD (S-reduced MobileNet), and segmented extended DSCPLD (S-extended MobileNet), are utilized to represent the constructive trade-off among accuracy, model size, and computational latency. Moreover, we have compared our proposed DSCPLD recognition models with state-of-the-art models, such as MobileNet, VGG16, VGG19, and AlexNet. Among segmented-based DSCPLD models, S-modified MobileNet achieves the best accuracy of 99.55% and F1-sore of 97.07%. Besides, we have simulated our DSCPLD models using both full plant leaf images and segmented plant leaf images and conclude that, after using modified ACS, all models increase their accuracy and F1-score. Furthermore, a new plant leaf dataset containing 6580 images of eight plants was used to experiment with several depth-wise separable convolution models.

Download Full-text

Persistent memory hash indexes

Proceedings of the VLDB Endowment ◽

10.14778/3446095.3446101 ◽

2021 ◽

Vol 14 (5) ◽

pp. 785-798

Author(s):

Daokun Hu ◽

Zhiwen Chen ◽

Jianbing Wu ◽

Jianhua Sun ◽

Hao Chen

Keyword(s):

Future Development ◽

High Performance ◽

Performance Metrics ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Hash Tables ◽

Trade Offs ◽

Depth Analysis ◽

Persistent Memory ◽

Memory Modules

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.

Download Full-text

Evaluating simplified chemical mechanisms within present-day simulations of the Community Earth System Model version 1.2 with CAM4 (CESM1.2 CAM-chem): MOZART-4 vs. Reduced Hydrocarbon vs. Super-Fast chemistry

Geoscientific Model Development ◽

10.5194/gmd-11-4155-2018 ◽

2018 ◽

Vol 11 (10) ◽

pp. 4155-4174 ◽

Cited By ~ 2

Author(s):

Benjamin Brown-Steiner ◽

Noelle E. Selin ◽

Ronald Prinn ◽

Simone Tilmes ◽

Louisa Emmons ◽

...

Keyword(s):

Computational Cost ◽

Earth System Model ◽

Chemical Mechanism ◽

System Model ◽

Earth System ◽

Chemical Mechanisms ◽

Trade Offs ◽

Community Earth System Model ◽

Time Periods ◽

Fast Mechanism

Abstract. While state-of-the-art complex chemical mechanisms expand our understanding of atmospheric chemistry, their sheer size and computational requirements often limit simulations to short lengths or ensembles to only a few members. Here we present and compare three 25-year present-day offline simulations with chemical mechanisms of different levels of complexity using the Community Earth System Model (CESM) Version 1.2 CAM-chem (CAM4): the Model for Ozone and Related Chemical Tracers, version 4 (MOZART-4) mechanism, the Reduced Hydrocarbon mechanism, and the Super-Fast mechanism. We show that, for most regions and time periods, differences in simulated ozone chemistry between these three mechanisms are smaller than the model–observation differences themselves. The MOZART-4 mechanism and the Reduced Hydrocarbon are in close agreement in their representation of ozone throughout the troposphere during all time periods (annual, seasonal, and diurnal). While the Super-Fast mechanism tends to have higher simulated ozone variability and differs from the MOZART-4 mechanism over regions of high biogenic emissions, it is surprisingly capable of simulating ozone adequately given its simplicity. We explore the trade-offs between chemical mechanism complexity and computational cost by identifying regions where the simpler mechanisms are comparable to the MOZART-4 mechanism and regions where they are not. The Super-Fast mechanism is 3 times as fast as the MOZART-4 mechanism, which allows for longer simulations or ensembles with more members that may not be feasible with the MOZART-4 mechanism given limited computational resources.

Download Full-text

Towards Expert-Inspired Automatic Criterion to Cut a Dendrogram for Real-Industrial Applications

10.3233/faia210140 ◽

2021 ◽

Author(s):

Shikha Suman ◽

Ashutosh Karna ◽

Karina Gibert

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithms ◽

Computational Cost ◽

Real Life ◽

Ground Truth ◽

Industrial Applications ◽

Underlying Structure ◽

Cluster Validity ◽

Cluster Validity Index ◽

Number Of Clusters

Hierarchical clustering is one of the most preferred choices to understand the underlying structure of a dataset and defining typologies, with multiple applications in real life. Among the existing clustering algorithms, the hierarchical family is one of the most popular, as it permits to understand the inner structure of the dataset and find the number of clusters as an output, unlike popular methods, like k-means. One can adjust the granularity of final clustering to the goals of the analysis themselves. The number of clusters in a hierarchical method relies on the analysis of the resulting dendrogram itself. Experts have criteria to visually inspect the dendrogram and determine the number of clusters. Finding automatic criteria to imitate experts in this task is still an open problem. But, dependence on the expert to cut the tree represents a limitation in real applications like the fields industry 4.0 and additive manufacturing. This paper analyses several cluster validity indexes in the context of determining the suitable number of clusters in hierarchical clustering. A new Cluster Validity Index (CVI) is proposed such that it properly catches the implicit criteria used by experts when analyzing dendrograms. The proposal has been applied on a range of datasets and validated against experts ground-truth overcoming the results obtained by the State of the Art and also significantly reduces the computational cost.

Download Full-text

Evaluate trade-offs between I/sub sp/ and lifetime for a specified fuel elements state-of-the-art

10.2172/4237816 ◽

1971 ◽

Author(s):

Keyword(s):

State Of The Art ◽

Trade Offs ◽

Fuel Elements

Download Full-text

Disentangled Feature Learning Network for Vehicle Re-Identification

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/66 ◽

2020 ◽

Author(s):

Yan Bai ◽

Yihang Lou ◽

Yongxing Dai ◽

Jun Liu ◽

Ziqian Chen ◽

...

Keyword(s):

State Of The Art ◽

Feature Learning ◽

Feature Representation ◽

Public Security ◽

The Public ◽

Common Features ◽

Learning Network ◽

Single Feature ◽

Art Performance

Vehicle Re-Identification (ReID) has attracted lots of research efforts due to its great significance to the public security. In vehicle ReID, we aim to learn features that are powerful in discriminating subtle differences between vehicles which are visually similar, and also robust against different orientations of the same vehicle. However, these two characteristics are hard to be encapsulated into a single feature representation simultaneously with unified supervision. Here we propose a Disentangled Feature Learning Network (DFLNet) to learn orientation specific and common features concurrently, which are discriminative at details and invariant to orientations, respectively. Moreover, to effectively use these two types of features for ReID, we further design a feature metric alignment scheme to ensure the consistency of the metric scales. The experiments show the effectiveness of our method that achieves state-of-the-art performance on three challenging datasets.

Download Full-text

Evaluating Simplified Chemical Mechanisms within CESM Version 1.2 CAM-chem (CAM4): MOZART-4 vs. Reduced Hydrocarbon vs. Super-Fast Chemistry

10.5194/gmd-2018-16 ◽

2018 ◽

Cited By ~ 2

Author(s):

Benjamin Brown-Steiner ◽

Noelle E. Selin ◽

Ronald Prinn ◽

Simone Tilmes ◽

Louisa Emmons ◽

...

Keyword(s):

Atmospheric Chemistry ◽

Computational Cost ◽

Chemical Mechanism ◽

Chemical Mechanisms ◽

Trade Offs ◽

Fast Chemistry ◽

Sheer Size ◽

Time Periods ◽

Computational Resources ◽

Fast Mechanism

Abstract. While state-of-the-art complex chemical mechanisms expand our understanding of atmospheric chemistry, their sheer size and computational requirements often limit simulations to short length, or ensembles to only a few members. Here we present and compare three 25-year offline simulations with chemical mechanisms of different levels of complexity using CESM Version 1.2 CAM-chem (CAM4): the MOZART-4 mechanism, the Reduced Hydrocarbon mechanism, and the Super-Fast mechanism. We show that, for most regions and time periods, differences in simulated ozone chemistry between these three mechanisms is smaller than the model-observation differences themselves. The MOZART-4 mechanism and the Reduced Hydrocarbon are in close agreement in their representation of ozone throughout the troposphere during all time periods (annual, seasonal and diurnal). While the Super-Fast mechanism tends to have higher simulated ozone variability and differs from the MOZART-4 mechanism over regions of high biogenic emissions, it is surprisingly capable of simulating ozone adequately given its simplicity. We explore the trade-offs between chemical mechanism complexity and computational cost by identifying regions where the simpler mechanisms are comparable to the MOZART-4 mechanism, and regions where they are not. The Super-Fast mechanism is three times as fast as the MOZART-4 mechanism, which allows for longer simulations, or ensembles with more members, that may not be feasible with the MOZART-4 mechanism given limited computational resources.

Download Full-text