Uma Nova Solução para Visibility Culling de Cenas Genéricas, baseada em Replicação, Heurísticas e Redução de Draw Calls

Mapping Intimacies ◽

10.5753/ctd.2018.3665 ◽

2018 ◽

Author(s):

Yvens R. Serpa ◽

Mária Andréia F. Rodrigues

Keyword(s):

State Of The Art ◽

Computational Cost ◽

Time Estimation ◽

Space Partitioning ◽

Fundamental Interest ◽

Processing Cost ◽

Interactive Frame ◽

Visibility Culling ◽

Graphics Processing ◽

High Computational Cost

Graphics applications with visual quality and increasing levels of interactivity have been of fundamental interest. Within this context, visibility culling algorithms restrict the processing to the objects actually visible by the observer, speeding up the scene visualization. However, state-of-the-art solutions still require a high computational cost, do not scale in complex scenarios and are limited in generalization. In contrast, this work presents RHView, an innovative generic solution for static and dynamics scenes, which is based on a replicated space-partitioning structure and heuristics. RHView uses novel heuristics for rendering time estimation and balance between processing cost and triangle removal accuracy, while maintaining interactive frame rates, even in scenes with billions of triangles. It is the only solution currently available to reduce draw calls, one of the factors that have the greatest impact on graphics processing. Systematic tests have shown that RHView can be up to 2.8 times faster than the state-of-the-art algorithms.

Download Full-text

Plant Leaf Disease Recognition Using Depth-Wise Separable Convolution-Based Models

Symmetry ◽

10.3390/sym13030511 ◽

2021 ◽

Vol 13 (3) ◽

pp. 511

Author(s):

Syed Mohammad Minhaz Hossain ◽

Kaushik Deb ◽

Pranab Kumar Dhar ◽

Takeshi Koshiba

Keyword(s):

State Of The Art ◽

Computational Cost ◽

Region Of Interest ◽

Number Of Clusters ◽

Plant Leaf ◽

Leaf Disease ◽

Automatic Initialization ◽

Adequate Accuracy ◽

Model Size ◽

High Computational Cost

Proper plant leaf disease (PLD) detection is challenging in complex backgrounds and under different capture conditions. For this reason, initially, modified adaptive centroid-based segmentation (ACS) is used to trace the proper region of interest (ROI). Automatic initialization of the number of clusters (K) using modified ACS before recognition increases tracing ROI’s scalability even for symmetrical features in various plants. Besides, convolutional neural network (CNN)-based PLD recognition models achieve adequate accuracy to some extent. However, memory requirements (large-scaled parameters) and the high computational cost of CNN-based PLD models are burning issues for the memory restricted mobile and IoT-based devices. Therefore, after tracing ROIs, three proposed depth-wise separable convolutional PLD (DSCPLD) models, such as segmented modified DSCPLD (S-modified MobileNet), segmented reduced DSCPLD (S-reduced MobileNet), and segmented extended DSCPLD (S-extended MobileNet), are utilized to represent the constructive trade-off among accuracy, model size, and computational latency. Moreover, we have compared our proposed DSCPLD recognition models with state-of-the-art models, such as MobileNet, VGG16, VGG19, and AlexNet. Among segmented-based DSCPLD models, S-modified MobileNet achieves the best accuracy of 99.55% and F1-sore of 97.07%. Besides, we have simulated our DSCPLD models using both full plant leaf images and segmented plant leaf images and conclude that, after using modified ACS, all models increase their accuracy and F1-score. Furthermore, a new plant leaf dataset containing 6580 images of eight plants was used to experiment with several depth-wise separable convolution models.

Download Full-text

FPL: fast Presburger arithmetic through transprecision

Proceedings of the ACM on Programming Languages ◽

10.1145/3485539 ◽

2021 ◽

Vol 5 (OOPSLA) ◽

pp. 1-26

Author(s):

Arjun Pitchanathan ◽

Christian Ulmann ◽

Michel Weber ◽

Torsten Hoefler ◽

Tobias Grosser

Keyword(s):

Performance Optimization ◽

Memory Management ◽

State Of The Art ◽

Computational Cost ◽

End User ◽

Loop Optimization ◽

Presburger Arithmetic ◽

Benchmark Suite ◽

Compilation Techniques ◽

High Computational Cost

Presburger arithmetic provides the mathematical core for the polyhedral compilation techniques that drive analytical cache models, loop optimization for ML and HPC, formal verification, and even hardware design. Polyhedral compilation is widely regarded as being slow due to the potentially high computational cost of the underlying Presburger libraries. Researchers typically use these libraries as powerful black-box tools, but the perceived internal complexity of these libraries, caused by the use of C as the implementation language and a focus on end-user-facing documentation, holds back broader performance-optimization efforts. With FPL, we introduce a new library for Presburger arithmetic built from the ground up in modern C++. We carefully document its internal algorithmic foundations, use lightweight C++ data structures to minimize memory management costs, and deploy transprecision computing across the entire library to effectively exploit machine integers and vector instructions. On a newly-developed comprehensive benchmark suite for Presburger arithmetic, we show a 5.4x speedup in total runtime over the state-of-the-art library isl in its default configuration and 3.6x over a variant of isl optimized with element-wise transprecision computing. We expect that the availability of a well-documented and fast Presburger library will accelerate the adoption of polyhedral compilation techniques in production compilers.

Download Full-text

Hybrid pooling with wavelets for convolutional neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219223 ◽

2022 ◽

pp. 1-10

Author(s):

Daniel Trevino-Sanchez ◽

Vicente Alarcon-Aquino

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Computational Cost ◽

Relevant Information ◽

Accuracy Improvement ◽

Proposed Model ◽

Benchmark Datasets ◽

Augmentation Techniques ◽

High Computational Cost

The need to detect and classify objects correctly is a constant challenge, being able to recognize them at different scales and scenarios, sometimes cropped or badly lit is not an easy task. Convolutional neural networks (CNN) have become a widely applied technique since they are completely trainable and suitable to extract features. However, the growing number of convolutional neural networks applications constantly pushes their accuracy improvement. Initially, those improvements involved the use of large datasets, augmentation techniques, and complex algorithms. These methods may have a high computational cost. Nevertheless, feature extraction is known to be the heart of the problem. As a result, other approaches combine different technologies to extract better features to improve the accuracy without the need of more powerful hardware resources. In this paper, we propose a hybrid pooling method that incorporates multiresolution analysis within the CNN layers to reduce the feature map size without losing details. To prevent relevant information from losing during the downsampling process an existing pooling method is combined with wavelet transform technique, keeping those details "alive" and enriching other stages of the CNN. Achieving better quality characteristics improves CNN accuracy. To validate this study, ten pooling methods, including the proposed model, are tested using four benchmark datasets. The results are compared with four of the evaluated methods, which are also considered as the state-of-the-art.

Download Full-text

Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2021.i4.114-148 ◽

2021 ◽

pp. 114-148

Author(s):

Wonkyung Jung ◽

Sangpyo Kim ◽

Jung Ho Ahn ◽

Jung Hee Cheon ◽

Younho Lee

Keyword(s):

State Of The Art ◽

Homomorphic Encryption ◽

Computational Cost ◽

Main Memory ◽

Fully Homomorphic Encryption ◽

Major Drawback ◽

Kernel Fusion ◽

Model Training ◽

Gpu Implementation ◽

High Computational Cost

Fully Homomorphic encryption (FHE) has been gaining in popularity as an emerging means of enabling an unlimited number of operations in an encrypted message without decryption. A major drawback of FHE is its high computational cost. Specifically, a bootstrapping step that refreshes the noise accumulated through consequent FHE operations on the ciphertext can even take minutes of time. This significantly limits the practical use of FHE in numerous real applications.By exploiting the massive parallelism available in FHE, we demonstrate the first instance of the implementation of a GPU for bootstrapping CKKS, one of the most promising FHE schemes supporting the arithmetic of approximate numbers. Through analyzing CKKS operations, we discover that the major performance bottleneck is their high main-memory bandwidth requirement, which is exacerbated by leveraging existing optimizations targeted to reduce the required computation. These observations motivate us to utilize memory-centric optimizations such as kernel fusion and reordering primary functions extensively.Our GPU implementation shows a 7.02× speedup for a single CKKS multiplication compared to the state-of-the-art GPU implementation and an amortized bootstrapping time of 0.423us per bit, which corresponds to a speedup of 257× over a single-threaded CPU implementation. By applying this to logistic regression model training, we achieved a 40.0× speedup compared to the previous 8-thread CPU implementation with the same data.

Download Full-text

Nearest Neighbors Search Using Multi-GPU

10.5753/wscad.2021.18509 ◽

2021 ◽

Author(s):

Vinícius Nogueira ◽

Lucas Amorim ◽

Igor Baratta ◽

Gabriel Pereira ◽

Renato Mesquita

Keyword(s):

Graphics Processing Units ◽

Computational Cost ◽

Grid Method ◽

Nearest Neighbors ◽

Meshless Methods ◽

K Nearest Neighbors ◽

Simulation Domain ◽

Domain Discretization ◽

Graphics Processing ◽

High Computational Cost

Meshless methods are increasingly gaining space in the study of electromagnetic phenomena as an alternative to traditional mesh-based methods. One of their biggest advantages is the absence of a mesh to describe the simulation domain. Instead, the domain discretization is done by spreading nodes along the domain and its boundaries. Thus, meshless methods are based on the interactions of each node with all its neighbors, and determining the neighborhood of the nodes becomes a fundamental task. The k-nearest neighbors (kNN) is a well-known algorithm used for this purpose, but it becomes a bottleneck for these methods due to its high computational cost. One of the alternatives to reduce the kNN high computational cost is to use spatial partitioning data structures (e.g., planar grid) that allow pruning when performing the k-nearest neighbors search. Furthermore, many of these strategies employed for kNN search have been adapted for graphics processing units (GPUs) and can take advantage of its high potential for parallelism. Thus, this paper proposes a multi-GPU version of the grid method for solving the kNN problem. It was possible to achieve a speedup of up to 1.99x and up to 3.94x using two and four GPUs, respectively, when compared against the single-GPU version of the grid method.

Download Full-text

An Alternate Algorithm for (3x3) Median Filtering of Digital Images

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v1i1.6732 ◽

2012 ◽

Vol 2 (1) ◽

pp. 7-9 ◽

Cited By ~ 2

Author(s):

Satinderjit Singh

Keyword(s):

Median Filter ◽

Computational Cost ◽

Spatial Coherence ◽

General Purpose ◽

Median Filtering ◽

Basic Algorithm ◽

Temporal Complexity ◽

Filter Kernel ◽

One Step ◽

High Computational Cost

Median filtering is a commonly used technique in image processing. The main problem of the median filter is its high computational cost (for sorting N pixels, the temporal complexity is O(NÂ·log N), even with the most efficient sorting algorithms). When the median filter must be carried out in real time, the software implementation in general-purpose processorsdoes not usually give good results. This Paper presents an efficient algorithm for median filtering with a 3x3 filter kernel with only about 9 comparisons per pixel using spatial coherence between neighboring filter computations. The basic algorithm calculates two medians in one step and reuses sorted slices of three vertical neighboring pixels. An extension of this algorithm for 2D spatial coherence is also examined, which calculates four medians per step.

Download Full-text

Methods for studying dissolved oxygen levels in coastal and estuarine waters receiving combined sewer overflows

Water Science & Technology ◽

10.2166/wst.1995.0081 ◽

1995 ◽

Vol 32 (2) ◽

pp. 95-103

Author(s):

José A. Revilla ◽

Kalin N. Koev ◽

Rafael Díaz ◽

César Álvarez ◽

Antonio Roldán

Keyword(s):

Dissolved Oxygen ◽

Coastal Waters ◽

Computational Cost ◽

Oxygen Deficit ◽

Alternative Methods ◽

Coastal Zones ◽

Combined Sewer Overflows ◽

Sewer Systems ◽

Combined Sewer ◽

High Computational Cost

One factor in determining the transport capacity of coastal interceptors in Combined Sewer Systems (CSS) is the reduction of Dissolved Oxygen (DO) in coastal waters originating from the overflows. The study of the evolution of DO in coastal zones is complex. The high computational cost of using mathematical models discriminates against the required probabilistic analysis being undertaken. Alternative methods, based on such mathematical modelling, employed in a limited number of cases, are therefore needed. In this paper two alternative methods are presented for the study of oxygen deficit resulting from overflows of CSS. In the first, statistical analyses focus on the causes of the deficit (the volume discharged). The second concentrates on the effects (the concentrations of oxygen in the sea). Both methods have been applied in a study of the coastal interceptor at Pasajes Estuary (Guipúzcoa, Spain) with similar results.

Download Full-text

Using graphics processing units on the cloud to accelerate and reduce processing cost of parameters estimation of seismic processing algorithm

10.22564/16cisbgf2019.221 ◽

2019 ◽

Author(s):

Nicholas Okita ◽

Tiago Coimbra ◽

José Ribeiro ◽

Martin Tygel

Keyword(s):

Graphics Processing Units ◽

Parameters Estimation ◽

Processing Algorithm ◽

Seismic Processing ◽

Processing Cost ◽

Graphics Processing

Download Full-text

Visualizing Profiles of Large Datasets of Weighted and Mixed Data

Mathematics ◽

10.3390/math9080891 ◽

2021 ◽

Vol 9 (8) ◽

pp. 891

Author(s):

Aurea Grané ◽

Alpha A. Sow-Barry

Keyword(s):

Multidimensional Scaling ◽

Random Sample ◽

Simulation Study ◽

Clustering Algorithm ◽

Computational Cost ◽

Interpolation Formula ◽

Large Datasets ◽

Mixed Data ◽

Multivariate Techniques ◽

High Computational Cost

This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower’s interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower’s distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.

Download Full-text

Reliability and reliability-based sensitivity analysis of self-centering buckling restrained braces using meta-models

Journal of Intelligent Material Systems and Structures ◽

10.1177/1045389x211026382 ◽

2021 ◽

pp. 1045389X2110263

Author(s):

Seyede Vahide Hashemi ◽

Mahmoud Miri ◽

Mohsen Rashki ◽

Sadegh Etedali

Keyword(s):

Failure Probability ◽

Limit State ◽

Computational Cost ◽

Sensitivity Analyses ◽

State Function ◽

Reliability Indices ◽

Buckling Restrained Brace ◽

Polynomial Response Surface ◽

Nonlinear Dynamic Analyses ◽

High Computational Cost

This paper aims to carry out sensitivity analyses to study how the effect of each design variable on the performance of self-centering buckling restrained brace (SC-BRB) and the corresponding buckling restrained brace (BRB) without shape memory alloy (SMA) rods. Furthermore, the reliability analyses of BRB and SC-BRB are performed in this study. Considering the high computational cost of the simulation methods, three Meta-models including the Kriging, radial basis function (RBF), and polynomial response surface (PRSM) are utilized to construct the surrogate models. For this aim, the nonlinear dynamic analyses are conducted on both BRB and SC-BRB by using OpenSees software. The results showed that the SMA area, SMA length ratio, and BRB core area have the most effect on the failure probability of SC-BRB. It is concluded that Kriging-based Monte Carlo Simulation (MCS) gives the best performance to estimate the limit state function (LSF) of BRB and SC-BRB in the reliability analysis procedures. Considering the effects of changing the maximum cyclic loading on the failure probability computation and comparison of the failure probability for different LSFs, it is also found that the reliability indices of SC-BRB were always higher than the corresponding reliability indices determined for BRB which confirms the performance superiority of SC-BRB than BRB.

Download Full-text