Reconfigurable Framework for Resilient Semantic Segmentation for Space Applications

Deep learning (DL) presents new opportunities for enabling spacecraft autonomy, onboard analysis, and intelligent applications for space missions. However, DL applications are computationally intensive and often infeasible to deploy on radiation-hardened (rad-hard) processors, which traditionally harness a fraction of the computational capability of their commercial-off-the-shelf counterparts. Commercial FPGAs and system-on-chips present numerous architectural advantages and provide the computation capabilities to enable onboard DL applications; however, these devices are highly susceptible to radiation-induced single-event effects (SEEs) that can degrade the dependability of DL applications. In this article, we propose Reconfigurable ConvNet (RECON), a reconfigurable acceleration framework for dependable, high-performance semantic segmentation for space applications. In RECON, we propose both selective and adaptive approaches to enable efficient SEE mitigation. In our selective approach, control-flow parts are selectively protected by triple-modular redundancy to minimize SEE-induced hangs, and in our adaptive approach, partial reconfiguration is used to adapt the mitigation of dataflow parts in response to a dynamic radiation environment. Combined, both approaches enable RECON to maximize system performability subject to mission availability constraints. We perform fault injection and neutron irradiation to observe the susceptibility of RECON and use dependability modeling to evaluate RECON in various orbital case studies to demonstrate a 1.5–3.0× performability improvement in both performance and energy efficiency compared to static approaches.

Download Full-text

Fault-Tolerant FPGA-Based Nanosatellite Balancing High-Performance and Safety for Cryptography Application

Electronics ◽

10.3390/electronics10172148 ◽

2021 ◽

Vol 10 (17) ◽

pp. 2148

Author(s):

Laurent Gantel ◽

Quentin Berthet ◽

Emna Amri ◽

Alexandre Karlov ◽

Andres Upegui

Keyword(s):

High Performance ◽

Fault Tolerant ◽

Fault Injection ◽

Detection Mechanism ◽

Radiation Level ◽

Dynamically Reconfigurable ◽

Reconfigurable Platform ◽

Commercial Off The Shelf ◽

Modular Redundancy ◽

The Impact

With the growth of the nano-satellites market, the usage of commercial off-the-shelf FPGAs for payload applications is also increasing. Due to the fact that these commercial devices are not radiation-tolerant, it is necessary to enhance them with fault mitigation mechanisms against Single Event Upsets (SEU). Several mechanisms such as memory scrubbing, triple modular redundancy (TMR) and Dynamic and Partial Reconfiguration (DPR), can help to detect, isolate and recover from SEU faults. In this paper, we introduce a dynamically reconfigurable platform equipped with configuration memory scrubbing and TMR mechanisms. We study their impacts when combined with DPR, providing three different execution modes: low-power, safe and high-performance mode. The fault detection mechanism permits the system to measure the radiation level and to estimate the risk of future faults. This enables the possibility of dynamically selecting the appropriate execution mode in order to adopt the best trade-off between performance and reliability. The relevance of the platform is demonstrated in a nano-satellite cryptographic application running on a Zynq UltraScale+ MPSoC device. A fault injection campaign has been performed to evaluate the impact of faulty configuration bits and to assess the efficiency of the proposed mitigation and the overall system reliability.

Download Full-text

ScOSA system software: the reliable and scalable middleware for a heterogeneous and distributed on-board computer architecture

CEAS Space Journal ◽

10.1007/s12567-021-00371-7 ◽

2021 ◽

Author(s):

Andreas Lund ◽

Zain Alabedin Haj Hammadeh ◽

Patrick Kenny ◽

Vishav Vishav ◽

Andrii Kovalov ◽

...

Keyword(s):

Computer Architecture ◽

Distributed System ◽

High Performance ◽

State Of The Art ◽

Space Applications ◽

Trade Off ◽

Flight Experiment ◽

Performance Space ◽

Commercial Off The Shelf ◽

The Individual

AbstractDesigning on-board computers (OBC) for future space missions is determined by the trade-off between reliability and performance. Space applications with higher computational demands are not supported by currently available, state-of-the-art, space-qualified computing hardware, since their requirements exceed the capabilities of these components. Such space applications include Earth observation with high-resolution cameras, on-orbit real-time servicing, as well as autonomous spacecraft and rover missions on distant celestial bodies. An alternative to state-of-the-art space-qualified computing hardware is the use of commercial-off-the-shelf (COTS) components for the OBC. Not only are these components cheap and widely available, but they also achieve high performance. Unfortunately, they are also significantly more vulnerable to errors induced by radiation than space-qualified components. The ScOSA (Scalable On-board Computing for Space Avionics) Flight Experiment project aims to develop an OBC architecture which avoids this trade-off by combining space-qualified radiation-hardened components (the reliable computing nodes, RCNs) together with COTS components (the high performance nodes, HPNs) into a single distributed system. To abstract this heterogeneous architecture for the application developers, we are developing a middleware for the aforementioned OBC architecture. Besides providing an monolithic abstraction of the distributed system, the middleware shall also enhance the architecture by providing additional reliability and fault tolerance. In this paper, we present the individual components comprising the middleware, alongside the features the middleware offers. Since the ScOSA Flight Experiment project is a successor of the OBC-NG and the ScOSA projects, its middleware is also a further development of the existing middleware. Therefore, we will present and discuss our contributions and plans for enhancement of the middleware in the course of the current project. Finally, we will present first results for the scalability of the middleware, which we obtained by conducting software-in-the-loop experiments of different sized scenarios.

Download Full-text

Research on Distance Transform and Neural Network Lidar Information Sampling Classification-Based Semantic Segmentation of 2D Indoor Room Maps

Sensors ◽

10.3390/s21041365 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1365

Author(s):

Tao Zheng ◽

Zhizhao Duan ◽

Jin Wang ◽

Guodong Lu ◽

Shengjie Li ◽

...

Keyword(s):

Neural Network ◽

High Speed ◽

High Performance ◽

High Efficiency ◽

Semantic Segmentation ◽

Raspberry Pi ◽

Distance Transform ◽

Testing Stage ◽

Sampling Points ◽

Information Sampling

Semantic segmentation of room maps is an essential issue in mobile robots’ execution of tasks. In this work, a new approach to obtain the semantic labels of 2D lidar room maps by combining distance transform watershed-based pre-segmentation and a skillfully designed neural network lidar information sampling classification is proposed. In order to label the room maps with high efficiency, high precision and high speed, we have designed a low-power and high-performance method, which can be deployed on low computing power Raspberry Pi devices. In the training stage, a lidar is simulated to collect the lidar detection line maps of each point in the manually labelled map, and then we use these line maps and the corresponding labels to train the designed neural network. In the testing stage, the new map is first pre-segmented into simple cells with the distance transformation watershed method, then we classify the lidar detection line maps with the trained neural network. The optimized areas of sparse sampling points are proposed by using the result of distance transform generated in the pre-segmentation process to prevent the sampling points selected in the boundary regions from influencing the results of semantic labeling. A prototype mobile robot was developed to verify the proposed method, the feasibility, validity, robustness and high efficiency were verified by a series of tests. The proposed method achieved higher scores in its recall, precision. Specifically, the mean recall is 0.965, and mean precision is 0.943.

Download Full-text

Supervised Domain Adaptation for Automated Semantic Segmentation of the Atrial Cavity

Entropy ◽

10.3390/e23070898 ◽

2021 ◽

Vol 23 (7) ◽

pp. 898

Author(s):

Marta Saiz-Vivó ◽

Adrián Colomer ◽

Carles Fonfría ◽

Luis Martí-Bonmatí ◽

Valery Naranjo

Keyword(s):

High Performance ◽

Computational Models ◽

Domain Adaptation ◽

Semantic Segmentation ◽

Patient Specific ◽

Mr Images ◽

Training Samples ◽

Volumetric Images ◽

Acquisition Costs ◽

Left And Right

Atrial fibrillation (AF) is the most common cardiac arrhythmia. At present, cardiac ablation is the main treatment procedure for AF. To guide and plan this procedure, it is essential for clinicians to obtain patient-specific 3D geometrical models of the atria. For this, there is an interest in automatic image segmentation algorithms, such as deep learning (DL) methods, as opposed to manual segmentation, an error-prone and time-consuming method. However, to optimize DL algorithms, many annotated examples are required, increasing acquisition costs. The aim of this work is to develop automatic and high-performance computational models for left and right atrium (LA and RA) segmentation from a few labelled MRI volumetric images with a 3D Dual U-Net algorithm. For this, a supervised domain adaptation (SDA) method is introduced to infer knowledge from late gadolinium enhanced (LGE) MRI volumetric training samples (80 LA annotated samples) to a network trained with balanced steady-state free precession (bSSFP) MR images of limited number of annotations (19 RA and LA annotated samples). The resulting knowledge-transferred model SDA outperformed the same network trained from scratch in both RA (Dice equals 0.9160) and LA (Dice equals 0.8813) segmentation tasks.

Download Full-text

Custom Built of Smart Computing Platform for Supporting Optimization Methods and Artificial Intelligence Research

Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences ◽

10.53560/ppasa(58-sp1)733 ◽

2021 ◽

Vol 58 (S) ◽

pp. 59-64

Author(s):

Indar Sugiarto ◽

Doddy Prayogo ◽

Henry Palit ◽

Felix Pasila ◽

Resmana Lim ◽

...

Keyword(s):

Artificial Intelligence ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Message Passing Interface ◽

Optimization Methods ◽

Computer Hardware ◽

Production Environment ◽

Computing Platform ◽

Commercial Off The Shelf

This paper describes a prototype of a computing platform dedicated to artificial intelligence explorations. The platform, dubbed as PakCarik, is essentially a high throughput computing platform with GPU (graphics processing units) acceleration. PakCarik is an Indonesian acronym for Platform Komputasi Cerdas Ramah Industri Kreatif, which can be translated as “Creative Industry friendly Intelligence Computing Platform”. This platform aims to provide complete development and production environment for AI-based projects, especially to those that rely on machine learning and multiobjective optimization paradigms. The method for constructing PakCarik was based on a computer hardware assembling technique that uses commercial off-the-shelf hardware and was tested on several AI-related application scenarios. The testing methods in this experiment include: high-performance lapack (HPL) benchmarking, message passing interface (MPI) benchmarking, and TensorFlow (TF) benchmarking. From the experiment, the authors can observe that PakCarik's performance is quite similar to the commonly used cloud computing services such as Google Compute Engine and Amazon EC2, even though falls a bit behind the dedicated AI platform such as Nvidia DGX-1 used in the benchmarking experiment. Its maximum computing performance was measured at 326 Gflops. The authors conclude that PakCarik is ready to be deployed in real-world applications and it can be made even more powerful by adding more GPU cards in it.

Download Full-text

Total Ionizing Dose Effects on a Delay-Based Physical Unclonable Function Implemented in FPGAs

Electronics ◽

10.3390/electronics7090163 ◽

2018 ◽

Vol 7 (9) ◽

pp. 163 ◽

Cited By ~ 5

Author(s):

Honorio Martin ◽

Pedro Martin-Holgado ◽

Yolanda Morilla ◽

Luis Entrena ◽

Enrique San-Millan

Keyword(s):

Low Cost ◽

Ring Oscillator ◽

Radiation Environment ◽

Key Generation ◽

Total Ionizing Dose ◽

Space Applications ◽

Physical Unclonable Functions ◽

Dose Effects ◽

Total Ionizing Dose Effects

Physical Unclonable Functions (PUFs) are hardware security primitives that are increasingly being used for authentication and key generation in ICs and FPGAs. For space systems, they are a promising approach to meet the needs for secure communications at low cost. To this purpose, it is essential to determine if they are reliable in the space radiation environment. In this work we evaluate the Total Ionizing Dose effects on a delay-based PUF implemented in SRAM-FPGA, namely a Ring Oscillator PUF. Several major quality metrics have been used to analyze the evolution of the PUF response with the total ionizing dose. Experimental results demonstrate that total ionizing dose has a perceptible effect on the quality of the PUF response, but it could still be used for space applications by making some appropriate corrections.

Download Full-text

Buffer Placement and Sizing for High-Performance Dataflow Circuits

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3477053 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-32

Author(s):

Lana Josipović ◽

Shabnam Sheikhha ◽

Andrea Guerrieri ◽

Paolo Ienne ◽

Jordi Cortadella

Keyword(s):

Performance Optimization ◽

Optimization Model ◽

High Performance ◽

Control Flow ◽

High Level Synthesis ◽

Software Applications ◽

Marked Graphs ◽

Variable Latency ◽

High Level ◽

Strong Contrast

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.

Download Full-text

META-pipe cloud setup and execution

F1000Research ◽

10.12688/f1000research.13204.1 ◽

2017 ◽

Vol 6 ◽

pp. 2060

Author(s):

Aleksandr Agafonov ◽

Kimmo Mattila ◽

Cuong Duong Tuan ◽

Lars Tiede ◽

Inge Alexander Raknes ◽

...

Keyword(s):

Functional Annotation ◽

High Performance ◽

Sequence Data ◽

Metagenomic Data ◽

Taxonomic Profiling ◽

Geographically Distributed ◽

Computationally Intensive ◽

High Performance Computing Cluster ◽

And Storage ◽

Performance Computing

META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.

Download Full-text

High-Performance Computing of BEAST/BEAGLE in Bayesian Phylogenetics using SDumont Hybrid Resources

10.5753/bresci.2020.11190 ◽

2020 ◽

Author(s):

Kary Ocaña ◽

Micaella Coelho ◽

Guilherme Freire ◽

Carla Osthoff

Keyword(s):

High Performance Computing ◽

High Performance ◽

Current Knowledge ◽

Data Sets ◽

Length Estimation ◽

Computationally Intensive ◽

Mcmc Chain ◽

Performance Computing ◽

Insight Into

Bayesian phylogenetic algorithms are computationally intensive. BEAST 1.10 inferences made use of the BEAGLE 3 high-performance library for efficient likelihood computations. The strategy allows phylogenetic inference and dating in current knowledge for SARS-CoV-2 transmission. Follow-up simulations on hybrid resources of Santos Dumont supercomputer using four phylogenomic data sets, we characterize the scaling performance behavior of BEAST 1.10. Our results provide insight into the species tree and MCMC chain length estimation, identifying preferable requirements to improve the use of high-performance computing resources. Ongoing steps involve analyzes of SARS-CoV-2 using BEAST 1.8 in multi-GPUs.

Download Full-text

Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs

Electronics ◽

10.3390/electronics10212622 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2622

Author(s):

Jurgen Vandendriessche ◽

Nick Wouters ◽

Bruno da Silva ◽

Mimoun Lamrini ◽

Mohamed Yassin Chkouri ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Machine Learning Techniques ◽

Sound Recognition ◽

Learning Approaches ◽

Environmental Sound ◽

Embedded Devices ◽

Power Efficient ◽

Computationally Intensive ◽

Environmental Sound Recognition

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.

Download Full-text