On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster

Many highly parallel algorithms usually generate large volumes of data containing both valid and invalid elements, and high-performance solutions to the stream compaction problem reveal extremely important in such scenarios. Although parallel stream compaction has been extensively studied in GPU-based platforms, and more recently, in the Intel Xeon Phi platform, no study has considered yet its parallelization using a low-cost computing cluster, even when general-purpose single-board computing devices are gaining popularity among the scientific community due to their high performance per $ and watt. In this work, we consider the case of an extremely low-cost cluster composed by four Odroid C2 single-board computers (SDCs), showing that stream compaction can also benefit—important speedups can be obtained—from this kind of platforms. To do so, we derive two parallel implementations for the stream compaction problem using MPI. Then, we evaluate them considering varying number of processes and/or SDCs, as well as different input sizes. In general, we see that unless the number of elements in the stream is too small, the best results are obtained when eight MPI processes are distributed among the four SDCs that conform the cluster. To add value to the obtained results, we also consider the execution of the two parallel implementations for the stream compaction problem on a very high-performance but power-hungry 18-core Intel Xeon E5-2695 v4 multicore processor, obtaining that the Odroid C2 SDC cluster constitutes a much more efficient alternative when both resulting execution time and required energy are taken into account. Finally, we also implement and evaluate a parallel version of the stream split problem to store also the invalid elements after the valid ones. Our implementation shows good scalability on the Odroid C2 SDC cluster and more compensated computation/communication ratio when compared to the stream compaction problem.

Download Full-text

Structural color coatings for high performance BIPV

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/855/1/012011 ◽

2021 ◽

Vol 855 (1) ◽

pp. 012011

Author(s):

R Habets ◽

Z Vroon ◽

B Erich ◽

N Meulendijks ◽

D Mann ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Light Transmission ◽

Low Cost ◽

Solution Process ◽

Structural Color ◽

Colored Glass ◽

Solar Panels ◽

Cell Efficiency ◽

Very High

Abstract Building integrated photovoltaics (BIPV) offer aesthetics and freedom of design for architects and home owners. This can accelerate implementation and free up new spaces for solar energy harvesting at building level, which is a necessary step towards a climate neutral built environment. Colored solar panels with high conversion efficiency and low cost price are an important development for large scale market penetration of BIPV. Here we report a solution processed structural color coating for solar panels and solar collectors. We show that virtually any color can be prepared, that the desired coating stack can be designed using optical calculations and that the exact color can be produced via a low cost solution process. Furthermore, we show that the light transmission for the colored glass plates is still very high, exceeding commonly used absorbing colors and enables very high solar cell efficiency. The colored PV panels have been tested in real environment and via accelerated lifetime testing for 3 years without any performance decline or degradation.

Download Full-text

Accelerating IDCT Algorithm on Xeon Phi Coprocessor

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3114 ◽

2013 ◽

Vol 756-759 ◽

pp. 3114-3120

Author(s):

Jin Qi ◽

Can Qun Yang ◽

Cheng Chen ◽

Qiang Wu ◽

Tao Tang

Keyword(s):

Discrete Cosine Transform ◽

High Performance ◽

Xeon Phi ◽

Cosine Transform ◽

Inverse Discrete Cosine Transform ◽

The Many ◽

Many Integrated Core ◽

Beta Version ◽

Very High ◽

Intel Xeon

Inverse Discrete Cosine Transform (IDCT) is an important operation for image and videos decompression. How to accelerate the IDCT algorithm has been frequently studied. Recently Intel has proposed Xeon Phi coprocessors based on the many integrated core (MIC) architecture. Xeon Phi is integrated with 61 cores and 512-bit SIMD extension within each core, thus providing very high performance. In this paper, we employ the Knights Corner (a beta version of Xeon Phi) to accelerate the IDCT algorithm. By employing the 512-bit SIMD instruction and data pre-fetching optimization, our implementation achieves (1) averagely 5.82 speedup over the none-SIMD version, (2) averagely 27.3% performance benefit with the data pre-fetching optimization, and (3) averagely 1.53 speedup on one Knights Corner coprocessor over the implementation on one octal-core Intel Xeon E5-2670 CPU.

Download Full-text

Applications of Machine Learning and High-Performance Computing in the Era of COVID-19

Applied System Innovation ◽

10.3390/asi4030040 ◽

2021 ◽

Vol 4 (3) ◽

pp. 40

Author(s):

Abdul Majeed

Keyword(s):

Machine Learning ◽

High Performance Computing ◽

High Performance ◽

Low Cost ◽

General Purpose ◽

Prototype System ◽

Successful Implementation ◽

Privacy And Security ◽

Applications Of Machine Learning ◽

Performance Computing

During the ongoing pandemic of the novel coronavirus disease 2019 (COVID-19), latest technologies such as artificial intelligence (AI), blockchain, learning paradigms (machine, deep, smart, few short, extreme learning, etc.), high-performance computing (HPC), Internet of Medical Things (IoMT), and Industry 4.0 have played a vital role. These technologies helped to contain the disease’s spread by predicting contaminated people/places, as well as forecasting future trends. In this article, we provide insights into the applications of machine learning (ML) and high-performance computing (HPC) in the era of COVID-19. We discuss the person-specific data that are being collected to lower the COVID-19 spread and highlight the remarkable opportunities it provides for knowledge extraction leveraging low-cost ML and HPC techniques. We demonstrate the role of ML and HPC in the context of the COVID-19 era with the successful implementation or proposition in three contexts: (i) ML and HPC use in the data life cycle, (ii) ML and HPC use in analytics on COVID-19 data, and (iii) the general-purpose applications of both techniques in COVID-19’s arena. In addition, we discuss the privacy and security issues and architecture of the prototype system to demonstrate the proposed research. Finally, we discuss the challenges of the available data and highlight the issues that hinder the applicability of ML and HPC solutions on it.

Download Full-text

Investigation of Various Commercial PEDOT:PSS (Poly(3,4-Ethylenedioxythiophene)Polystyrene Sulfonate) As A Hole Transport Layer in Lead Iodide Based Inverted Planar Perovskite Solar Cells

10.21203/rs.3.rs-269998/v1 ◽

2021 ◽

Author(s):

Hamed Moeini Alishah ◽

Fatma Pinar Gokdemir Choi ◽

Serap Gunes

Keyword(s):

Solar Cells ◽

High Performance ◽

Low Cost ◽

Perovskite Solar Cells ◽

Vital Role ◽

Hole Transport ◽

Transport Layer ◽

Hole Transport Layer ◽

Lead Iodide ◽

Very High

Abstract Inverted-type perovskite solar cells have drawn remarkable attention due to solution-processable, straightforward configuration, low-cost processing, and manufacturing at very high throughput, even on top of flexible materials. The hole transport material (HTM) plays a vital role to achieve high performance in inverted type of perovskite solar cells. Herein, we report on the effect of different commercial PEDOT: PSS such as PH 1000, PH 500, P VP AI, and P T2, on the performance of CH3NH3PbI3 based planar perovskite solar cells.

Download Full-text

Performance Analysis of Parallelized Bioinformatics Applications

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2018.7.2.1881 ◽

2018 ◽

Vol 7 (2) ◽

pp. 70-74

Author(s):

Dhruv Chander Pant ◽

O. P. Gupta

Keyword(s):

Distributed Computing ◽

High Performance ◽

Low Cost ◽

General Purpose ◽

Computational Power ◽

Genome Data ◽

Technological Developments ◽

General Purpose Computer ◽

Performance Computing ◽

Purpose Computer

The main challenges bioinformatics applications facing today are to manage, analyze and process a huge volume of genome data. This type of analysis and processing is very difficult using general purpose computer systems. So the need of distributed computing, cloud computing and high performance computing in bioinformatics applications arises. Now distributed computers, cloud computers and multi-core processors are available at very low cost to deal with bulk amount of genome data. Along with these technological developments in distributed computing, many efforts are being done by the scientists and bioinformaticians to parallelize and implement the algorithms to take the maximum advantage of the additional computational power. In this paper a few bioinformatics algorithms have been discussed. The parallelized implementations of these algorithms have been explained. The performance of these parallelized algorithms has been also analyzed. It has been also observed that in parallel implementations of the various bioinformatics algorithms, impact of communication subsystems with respect to the job sizes should also be analyzed.

Download Full-text

Significance of Logic Synthesis in FPGA-Based Design of Image and Signal Processing Systems

Pattern Recognition Technologies and Applications ◽

10.4018/978-1-59904-807-9.ch012 ◽

2008 ◽

pp. 265-283 ◽

Cited By ~ 3

Author(s):

Mariusz Rawski ◽

Henry Selvaraj ◽

Bogdan J. Falkowski ◽

Tadeusz Luba

Keyword(s):

Signal Processing ◽

High Performance ◽

Digital Filters ◽

Logic Synthesis ◽

General Purpose ◽

Digital Systems ◽

Fir Filters ◽

Distributed Arithmetic ◽

Embedded Dsp ◽

Very High

This chapter, taking FIR filters as an example, presents the discussion on efficiency of different implementation methodologies of DSP algorithms targeting modern FPGA architectures. Nowadays, programmable technology provides the possibility to implement digital systems with the use of specialized embedded DSP blocks. However, this technology gives the designer the possibility to increase efficiency of designed systems by exploitation of parallelisms of implemented algorithms. Moreover, it is possible to apply special techniques, such as distributed arithmetic (DA). Since in this approach, general-purpose multipliers are replaced by combinational LUT blocks, it is possible to construct digital filters of very high performance. Additionally, application of the functional decomposition-based method to LUT blocks optimization, and mapping has been investigated. The chapter presents results of the comparison of various design approaches in these areas.

Download Full-text

Salt-free Chromium Tanning: Practical Approaches

Journal of the American Leather Chemists Association ◽

10.34314/jalca.v117i1.4690 ◽

2022 ◽

Vol 117 (1) ◽

Author(s):

M. Sathish ◽

R. Aravindhan ◽

J. Raghava Rao

Keyword(s):

High Performance ◽

Current Practice ◽

Raw Materials ◽

Low Cost ◽

Physical Strength ◽

Environmentally Sustainable ◽

Environmental Threats ◽

Matrix Stabilization ◽

Efficient Alternative ◽

Alternative Approaches

Chromium tanning finds a prominant place in leather manufacturing for permanent stabilization of hide/skin matrix. Though, it has multiple advantages in terms of high thermal stability, easy process and low cost etc., the current practice is not environmentally sustainable. Poor chromium exhaustion and TDS load generation are the major environmental threats of conventional chromium tanning systems. On the other hand, salt-free chromium tanning is identified as one of the efficient alternative approaches for hide/skin matrix stabilization. However, it has not been commercially practiced due to the several practical difficulties. In this work attempts have been made to develop a practically viable high-performance salt-free chromium tanning system using deliming liquor as tanning float and changing the order of addition of masking salt. The developed methodologies completely avoid the use of salt/basification process and it is suitable for all kinds of raw materials and tannery houses. Besides, the process enjoys 71-77% reduction in TDS load and the uptake of chromium is around 90%. The physical strength characteristics are on par with conventional process and the leathers exhibit good grain tightness and roundness. The developed methodologies are simple and do not require any specialty chemicals.

Download Full-text

Performance Comparisons of Timing Techniques in a Non-Real-Time Environment

Volume 3: 25th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2005-85234 ◽

2005 ◽

Author(s):

Frederick M. Proctor ◽

Justin R. Hibbits

Keyword(s):

Operating System ◽

Real Time ◽

Operating Systems ◽

High Performance ◽

Low Cost ◽

General Purpose ◽

Control Applications ◽

Real Time Operating System ◽

Performance Comparisons

General-purpose computers are increasingly being used for serious control applications, due to their prevalence, low cost and high performance. Real-time operating systems are available for PCs that overcome the nondeterminism inherent in desktop operating systems. Depending on the timing requirements, however, many users can get by with a non-real-time operating system. This paper discusses timing techniques applicable to non-real-time operating systems, using Linux as an example, and compares them with the performance that can be obtained with true real-time OSes.

Download Full-text

An approach towards tailoring interfacial structures and properties of multiphase renewable thermoplastics from lignin–nitrile rubber

Green Chemistry ◽

10.1039/c6gc01067a ◽

2016 ◽

Vol 18 (20) ◽

pp. 5423-5437 ◽

Cited By ~ 21

Author(s):

Tony Bova ◽

Chau D. Tran ◽

Mikhail Y. Balakshin ◽

Jihua Chen ◽

Ewellyn A. Capanema ◽

...

Keyword(s):

High Performance ◽

Low Cost ◽

General Purpose ◽

Nitrile Rubber ◽

Butadiene Rubber ◽

Acrylonitrile Butadiene Rubber ◽

Interfacial Structures ◽

Structures And Properties ◽

Reactive Mixing

High-performance multiphase thermoplastics were synthesized by reactive mixing of unmodified industrial lignin and low-cost additives in a matrix of general-purpose acrylonitrile-butadiene rubber (NBR).

Download Full-text

Special Issue on Computer Architecture for Robotics

Journal of Robotics and Mechatronics ◽

10.20965/jrm.1990.p0417 ◽

1990 ◽

Vol 2 (6) ◽

pp. 417-417

Author(s):

Michitaka Kameyama ◽

Keyword(s):

Computer Architecture ◽

Integrated Circuit ◽

High Speed ◽

High Performance ◽

System Development ◽

General Purpose ◽

System Configuration ◽

Design Development ◽

Very High ◽

Vlsi Chips

In the realization of intelligent robots, highly intelligent manipulation and movement techniques are required such as intelligent man-machine interfaces, intelligent information processing for path planning and problem solutions, practical robot vision, and high-speed sensor signal processing. Thus, very high-speed processing to cope with vast amounts of data as well as the development of various algorithms has become important subjects. To fulfill such requirements, the development of high-performance computer architecture using advanced microelectronics technology is required. For these purposes, the development of implementing computer systems’ for robots will be classified as follows: (a) Use of general-purpose computers As the performance of workstations and personal computers is increased year by year, software development is the major task without requiring hardware development except the interfaces with peripheral equipment. Since current high-level languages and software can be applied, the approach is excellent in case of system development, but the processing performance is limited. (b) Use of commercially available (V) LSI chips This is an approach to design a computer system by the combination of commercially available LSIs. Since the development of both hardware and software is involved in this system development, the development period tends to be longer than in (a). These chips include general-purpose microprocessors, memory chips, digital signal processors (DSPs) and multiply-adder LSIs. Though the kinds of available chips are limited to some degree, the approach can cope with a considerably high-performance specifications because a number of chips can be flexibly used. (c) Design, development and system configuration of VLSI chips This is an approach to develop new special-purpose VLSI chips using ASIC (Application Specific Integrated Circuit) technology, that is, semicustom or full-custom technology. If these attain practical use and are marketed, they will be widely used as high-performance VLSI chips of the level (b). Since a very high-performance specification must be satisfied, the study of very high performance VLSI computer architecture becomes very important. But this approach involving chip development requires a very long period in the design-development from the determination of processor specifications to the system configuration using the fabricated chips. For the above three approaches, the order from the viewpoint of ease of development will be (a), (b) and (c), while that from the viewpoint of performance will be (c), (b) and (a). Each approach is not exclusive but is complementary each other. For example, the development of new chips by (c) can also give new impact as the components of (a) and (b). Further, the common point of these approaches is that performance improvement by highly parallel architecture becomes important. This special edition introduces, from the above standpoint, the latest information on the present state and' future prospects of the computer techniques in Japan. We hope that this edition will contribute to the development of this field.

Download Full-text