High-Performance Image Filters via Sparse Approximations

Author(s):  
Kersten Schuster ◽  
Philip Trettner ◽  
Leif Kobbelt

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.

2017 ◽  
Vol 20 (4) ◽  
pp. 1151-1159 ◽  
Author(s):  
Folker Meyer ◽  
Saurabh Bagchi ◽  
Somali Chaterji ◽  
Wolfgang Gerlach ◽  
Ananth Grama ◽  
...  

Abstract As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1–3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community’s data analysis tasks.


2012 ◽  
Vol 7 (1) ◽  
pp. 37-46
Author(s):  
Gustavo Sanchez ◽  
Marcelo Porto ◽  
Diego Noble ◽  
Sergio Bampi ◽  
Luciano Agostini

This paper presents an efficient hardware design using the new Motion Estimation (ME) algorithms named: Multi-point Diamond Search (MPDS) and Dynamic Multi-Point Diamond Search (DMPDS). These algorithms are more efficient to avoid from local minima falls than traditional fast algorithms.This fact contributes to increase the quality of the motion vectors, especially in High Definition (HD) videos, were the number of local minima are considerable higher. Two versions of MPDS algorithm were proposed. The first one, focused on high performance, is capable to process videos QFHD at 30 frames per second when synthesized to Altera Stratix 4 and 90nm TSCM, with only 18mW. The second version is focused on quality enhancement and is capable to process HD 1080p videos in real time. The DMPDS architecture has been developed focusing on high performance and was synthesized to Altera stratix 4. This architecture is capable to process videos QFHD at 34 frames per second. In comparison to related works, our solutions obtained the highest processing rates, and a good trade-off among power consumption, area, memory bits and performance.


2015 ◽  
Vol 25 (03) ◽  
pp. 1541005
Author(s):  
Alexandra Vintila Filip ◽  
Ana-Maria Oprescu ◽  
Stefania Costache ◽  
Thilo Kielmann

High-Performance Computing (HPC) systems consume large amounts of energy. As the energy consumption predictions for HPC show increasing numbers, it is important to make users aware of the energy spent for the execution of their applications. Drawing from our experience with exposing cost and performance in public clouds, in this paper we present a generic mechanism to compute fast and accurate estimates for the tradeoffs between the performance (expressed as makespan) and the energy consumption of applications running on HPC clusters. We validate our approach by implementing it in a prototype, called E-BaTS and validating it with a wide variety of HPC bags-of-tasks. Our experiments show that E-BaTS produces conservative estimates with errors below 5%, while requiring at most 12% of the energy and time of an exhaustive search for providing configurations close to the optimal ones in terms of trade-offs between energy consumption and makespan.


1998 ◽  
Vol 525 ◽  
Author(s):  
Pushkar P. Apte ◽  
Sharad Saxena ◽  
Suraj Rao ◽  
Karthik Vasanth ◽  
Douglas A. Prinslow ◽  
...  

ABSTRACTIn integrated circuit (IC) fabrication, understanding and optimizing process interactions and variability is critical for swift process integration and performance enhancement, especially at dimensions ≤0.25μm. We present here an approach to address this challenge, and we apply it to improve the process design for two critical modules in a typical CMOS IC process—salicide and source/drain. Together, these modules impact the silicide-to-diffusion contact resistance (Rc), and the gate sheet resistance (Rs); which, in turn, significantly affect transistor series resistance and circuit delays respectively. In our approach, we have investigated a process domain consisting of both silicide and source/drain process variables; and we have developed a quantitative framework for analysis and optimization, along with qualitative insight into underlying the physical mechanisms. We demonstrate that the transistor drive current (Id) improves by ≈5‥, and circuit performance, as measured by the figure-of-merit (FOM), by ≈4‥. This improvement is significant, and an added benefit is that other transistor characteristics such as effective channel length, off-current, substrate current etc. are affected minimally. Finally, we use this approach to optimize trade-offs such as Rc vs Rs and performance vs manufacturability; thus enabling manufacturable processes that meet the requirements for high performance.


Author(s):  
Thiago S. Hallak ◽  
Manuel Ventura ◽  
C. Guedes Soares

A methodology for the initial dimensioning and evaluation of semi-submersible offshore accommodation units by use of a novel optimization procedure is presented. The method developed has two main steps. First, the optimization procedure itself, in which a set of different configurations is considered and one by one optimized following a genetic algorithm in terms of cost estimates, whereas several other characteristics of the units are evaluated according to an embedded synthesis model. Second, the set of optimized geometries are run in commercial software for accurate seakeeping and stability analysis, where requirements’ compliance are checked. The final solution may then be chosen considering the trade-offs between cost and performance. In the last section of this paper, a case study is presented and the results obtained are discussed.


Author(s):  
D. E. Newbury ◽  
R. D. Leapman

Trace constituents, which can be very loosely defined as those present at concentration levels below 1 percent, often exert influence on structure, properties, and performance far greater than what might be estimated from their proportion alone. Defining the role of trace constituents in the microstructure, or indeed even determining their location, makes great demands on the available array of microanalytical tools. These demands become increasingly more challenging as the dimensions of the volume element to be probed become smaller. For example, a cubic volume element of silicon with an edge dimension of 1 micrometer contains approximately 5×1010 atoms. High performance secondary ion mass spectrometry (SIMS) can be used to measure trace constituents to levels of hundreds of parts per billion from such a volume element (e. g., detection of at least 100 atoms to give 10% reproducibility with an overall detection efficiency of 1%, considering ionization, transmission, and counting).


2020 ◽  
Vol 12 (2) ◽  
pp. 19-50 ◽  
Author(s):  
Muhammad Siddique ◽  
Shandana Shoaib ◽  
Zahoor Jan

A key aspect of work processes in service sector firms is the interconnection between tasks and performance. Relational coordination can play an important role in addressing the issues of coordinating organizational activities due to high level of interdependence complexity in service sector firms. Research has primarily supported the aspect that well devised high performance work systems (HPWS) can intensify organizational performance. There is a growing debate, however, with regard to understanding the “mechanism” linking HPWS and performance outcomes. Using relational coordination theory, this study examines a model that examine the effects of subsets of HPWS, such as motivation, skills and opportunity enhancing HR practices on relational coordination among employees working in reciprocal interdependent job settings. Data were gathered from multiple sources including managers and employees at individual, functional and unit levels to know their understanding in relation to HPWS and relational coordination (RC) in 218 bank branches in Pakistan. Data analysis via structural equation modelling, results suggest that HPWS predicted RC among officers at the unit level. The findings of the study have contributions to both, theory and practice.


2019 ◽  
Vol 14 ◽  
pp. 155892501989525
Author(s):  
Yu Yang ◽  
Yanyan Jia

Ultrafine crystallization of industrial pure titanium allowed for higher tensile strength, corrosion resistance, and thermal stability and is therefore widely used in medical instrumentation, aerospace, and passenger vehicle manufacturing. However, the ultrafine crystallizing batch preparation of tubular industrial pure titanium is limited by the development of the spinning process and has remained at the theoretical research stage. In this article, the tubular TA2 industrial pure titanium was taken as the research object, and the ultrafine crystal forming process based on “5-pass strong spin-heat treatment-3 pass-spreading-heat treatment” was proposed. Based on the spinning process test, the ultimate thinning rate of the method is explored and the evolution of the surface microstructure was analyzed by metallographic microscope. The research suggests that the multi-pass, medium–small, and thinning amount of spinning causes the grain structure to be elongated in the axial and tangential directions, and then refined, and the axial fiber uniformity is improved. The research results have certain scientific significance for reducing the consumption of high-performance metals improving material utilization and performance, which also promote the development of ultrafine-grain metals’ preparation technology.


2021 ◽  
Author(s):  
Santiago Bouzas ◽  
María F. Barbarich ◽  
Eduardo M. Soto ◽  
Julián Padró ◽  
Valeria P. Carreira ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5287
Author(s):  
Hiwa Mahmoudi ◽  
Michael Hofbauer ◽  
Bernhard Goll ◽  
Horst Zimmermann

Being ready-to-detect over a certain portion of time makes the time-gated single-photon avalanche diode (SPAD) an attractive candidate for low-noise photon-counting applications. A careful SPAD noise and performance characterization, however, is critical to avoid time-consuming experimental optimization and redesign iterations for such applications. Here, we present an extensive empirical study of the breakdown voltage, as well as the dark-count and afterpulsing noise mechanisms for a fully integrated time-gated SPAD detector in 0.35-μm CMOS based on experimental data acquired in a dark condition. An “effective” SPAD breakdown voltage is introduced to enable efficient characterization and modeling of the dark-count and afterpulsing probabilities with respect to the excess bias voltage and the gating duration time. The presented breakdown and noise models will allow for accurate modeling and optimization of SPAD-based detector designs, where the SPAD noise can impose severe trade-offs with speed and sensitivity as is shown via an example.


Sign in / Sign up

Export Citation Format

Share Document