Low latency Montgomery multiplier for cryptographic applications

In this modern era, data protection is very important. To achieve this, the data must be secured using either public-key or private-key cryptography (PKC). PKC eliminates the need of sharing key at the beginning of communication. PKC systems such as ECC and RSA is implemented for different security services such as key exchange between sender, receiver and key distribution between different network nodes and authentication protocols. PKC is based on computationally intensive finite field arithmetic operations. In the PKC schemes, modular multiplication (MM) is the most critical operation. Usually, this operation is performed by integer multiplication (IM) followed by a reduction modulo M. However, the reduction step involves a long division operation that is expensive in terms of area, time and resources. Montgomery multiplication algorithm facilitates faster MM operation without the division operation. In this paper, low latency hardware implementation of the Montgomery multiplier is proposed. Many interesting and novel optimization strategies are adopted in the proposed design. The proposed Montgomery multiplier is based on school-book multiplier, Karatsuba-Ofman algorithm and fast adders techniques. The Karatsuba-Ofman algorithm and school-book multiplier recommends cutting down the operands into smaller chunks while adders facilitate fast addition for large size operands. The proposed design is simulated, synthesized and implemented using Xilinx ISE Design Suite by targeting different Xilinx FPGA devices for different bit sizes (64-1024). The proposed design is evaluated on the basis of computational time, area consumption, and throughput. The implementation results show that the proposed design can easily outperform the state of the art

Download Full-text

You Only Look Once, But Compute Twice: Service Function Chaining for Low-Latency Object Detection in Softwarized Networks

Applied Sciences ◽

10.3390/app11052177 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2177

Author(s):

Zuo Xiang ◽

Patrick Seeling ◽

Frank H. P. Fitzek

Keyword(s):

Machine Learning ◽

Object Detection ◽

Low Latency ◽

Connected Vehicles ◽

Network Nodes ◽

Machine Learning Model ◽

Fold Reduction ◽

Broad Variety ◽

Computational Resources ◽

Service Latency

With increasing numbers of computer vision and object detection application scenarios, those requiring ultra-low service latency times have become increasingly prominent; e.g., those for autonomous and connected vehicles or smart city applications. The incorporation of machine learning through the applications of trained models in these scenarios can pose a computational challenge. The softwarization of networks provides opportunities to incorporate computing into the network, increasing flexibility by distributing workloads through offloading from client and edge nodes over in-network nodes to servers. In this article, we present an example for splitting the inference component of the YOLOv2 trained machine learning model between client, network, and service side processing to reduce the overall service latency. Assuming a client has 20% of the server computational resources, we observe a more than 12-fold reduction of service latency when incorporating our service split compared to on-client processing and and an increase in speed of more than 25% compared to performing everything on the server. Our approach is not only applicable to object detection, but can also be applied in a broad variety of machine learning-based applications and services.

Download Full-text

IMPROVED FRACTAL IMAGE COMPRESSION BASED ON ROBUST FEATURE DESCRIPTORS

International Journal of Image and Graphics ◽

10.1142/s0219467811004251 ◽

2011 ◽

Vol 11 (04) ◽

pp. 571-587 ◽

Cited By ~ 10

Author(s):

WILLIAM ROBSON SCHWARTZ ◽

HELIO PEDRINI

Keyword(s):

Image Compression ◽

Computational Cost ◽

Computational Time ◽

Natural Scenes ◽

Fractal Image Compression ◽

Fractal Image ◽

Feature Descriptors ◽

High Compression ◽

Speed Up ◽

Computationally Intensive

Fractal image compression is one of the most promising techniques for image compression due to advantages such as resolution independence and fast decompression. It exploits the fact that natural scenes present self-similarity to remove redundancy and obtain high compression rates with smaller quality degradation compared to traditional compression methods. The main drawback of fractal compression is its computationally intensive encoding process, due to the need for searching regions with high similarity in the image. Several approaches have been developed to reduce the computational cost to locate similar regions. In this work, we propose a method based on robust feature descriptors to speed up the encoding time. The use of robust features provides more discriminative and representative information for regions of the image. When the regions are better represented, the search for similar parts of the image can be reduced to focus only on the most likely matching candidates, which leads to reduction on the computational time. Our experimental results show that the use of robust feature descriptors reduces the encoding time while keeping high compression rates and reconstruction quality.

Download Full-text

AN FPGA-BASED PLATFORM FOR A NETWORK ARCHITECTURE WITH DELAY GUARANTEE

Journal of Circuits System and Computers ◽

10.1142/s021812661350045x ◽

2013 ◽

Vol 22 (06) ◽

pp. 1350045 ◽

Cited By ~ 3

Author(s):

MACIEJ WIELGOSZ ◽

MAURITZ PANGGABEAN ◽

JIANG WANG ◽

LEIF ARNE RØNNINGEN

Keyword(s):

Network Architecture ◽

Multimedia Data ◽

Low Latency ◽

Traffic Condition ◽

Network Nodes ◽

Delay Guarantee ◽

Design And Implementation ◽

Platform Architecture ◽

Delay Sensitive ◽

Field Programmable

The background that underlies this work is the envisioned real-time tele-immersive collaboration system for the future that supports delay-sensitive applications involving participants from remote places via their collaboration spaces (CSs). The end-to-end delay as high as 20 ms is required for good synchronization of such applications, for example collaborative dancing and remote conducting of choir. It is much lower than that facilitated by existing teleconference systems. A novel network architecture with delay guarantee, namely Distributed Multimedia Plays (DMP), has been proposed and designed to realize the vision. The maximum low latency is guaranteed because DMP network nodes can drop DMP packets of multimedia data from the CSs due to instantaneous traffic condition. Besides ultrafast processing time, modularity, and scalability must be taken into account in hardware design and implementation of the nodes for seamless incorporation of the modules. These lead us to employing field-programmable gate array (FPGA) due to its substantial computational power and flexibility. This paper presents an FPGA-based platform for the design and implementation of DMP network nodes. It provides a detailed introduction to the platform architecture and the simulation-implementation environment for the design. The modularity of the implemented node is shown by addressing three important modules for packet dropping, 3D warping, and image transform. Our compact implementation of the network node on Xilinx Virtex-6 ML605 mostly consumes very small amount of available resources. Moreover the elementary operations on our implementation takes (much) less than 5 μs as desired to meet the low-latency requirement.

Download Full-text

Methodology to Solve the Multi-Objective Optimization of Acrylic Acid Production Using Neural Networks as Meta-Models

Processes ◽

10.3390/pr8091184 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1184

Author(s):

Geraldine Cáceres Sepulveda ◽

Silvia Ochoa ◽

Jules Thibault

Keyword(s):

Neural Networks ◽

Acrylic Acid ◽

Performance Criteria ◽

Computational Time ◽

Multi Objective Optimization ◽

Meta Model ◽

Multi Objective ◽

Pareto Domain ◽

Computationally Intensive ◽

Time Required

It is paramount to optimize the performance of a chemical process in order to maximize its yield and productivity and to minimize the production cost and the environmental impact. The various objectives in optimization are often in conflict, and one must determine the best compromise solution usually using a representative model of the process. However, solving first-principle models can be a computationally intensive problem, thus making model-based multi-objective optimization (MOO) a time-consuming task. In this work, a methodology to perform the multi-objective optimization for a two-reactor system for the production of acrylic acid, using artificial neural networks (ANNs) as meta-models, is proposed in an effort to reduce the computational time required to circumscribe the Pareto domain. The performance of the meta-model confirmed good agreement between the experimental data and the model-predicted values of the existent relationships between the eight decision variables and the nine performance criteria of the process. Once the meta-model was built, the Pareto domain was circumscribed based on a genetic algorithm (GA) and ranked with the net flow method (NFM). Using the ANN surrogate model, the optimization time decreased by a factor of 15.5.

Download Full-text

Atmospheric inverse modeling with known physical bounds: an example from trace gas emissions

Geoscientific Model Development ◽

10.5194/gmd-7-303-2014 ◽

2014 ◽

Vol 7 (1) ◽

pp. 303-315 ◽

Cited By ~ 26

Author(s):

S. M. Miller ◽

A. M. Michalak ◽

P. J. Levi

Keyword(s):

Inverse Problems ◽

Lagrange Multipliers ◽

Inverse Modeling ◽

Synthetic Data ◽

Computational Time ◽

Mcmc Methods ◽

Atmospheric Sciences ◽

Physical Constraints ◽

Data Inversion ◽

Computationally Intensive

Abstract. Many inverse problems in the atmospheric sciences involve parameters with known physical constraints. Examples include nonnegativity (e.g., emissions of some urban air pollutants) or upward limits implied by reaction or solubility constants. However, probabilistic inverse modeling approaches based on Gaussian assumptions cannot incorporate such bounds and thus often produce unrealistic results. The atmospheric literature lacks consensus on the best means to overcome this problem, and existing atmospheric studies rely on a limited number of the possible methods with little examination of the relative merits of each. This paper investigates the applicability of several approaches to bounded inverse problems. A common method of data transformations is found to unrealistically skew estimates for the examined example application. The method of Lagrange multipliers and two Markov chain Monte Carlo (MCMC) methods yield more realistic and accurate results. In general, the examined MCMC approaches produce the most realistic result but can require substantial computational time. Lagrange multipliers offer an appealing option for large, computationally intensive problems when exact uncertainty bounds are less central to the analysis. A synthetic data inversion of US anthropogenic methane emissions illustrates the strengths and weaknesses of each approach.

Download Full-text

Parallel Simulation of Population Balance Model-Based Particulate Processes Using Multicore CPUs and GPUs

Modelling and Simulation in Engineering ◽

10.1155/2013/475478 ◽

2013 ◽

Vol 2013 ◽

pp. 1-16 ◽

Cited By ~ 3

Author(s):

Anuj V. Prakash ◽

Anwesha Chaudhury ◽

Rohit Ramachandran

Keyword(s):

Population Balance ◽

Process Models ◽

Balance Model ◽

Computational Time ◽

Population Balance Model ◽

Particulate Processes ◽

Unit Operations ◽

Computationally Intensive ◽

Many Core ◽

Multicore Cpus

Computer-aided modeling and simulation are a crucial step in developing, integrating, and optimizing unit operations and subsequently the entire processes in the chemical/pharmaceutical industry. This study details two methods of reducing the computational time to solve complex process models, namely, the population balance model which given the source terms can be very computationally intensive. Population balance models are also widely used to describe the time evolutions and distributions of many particulate processes, and its efficient and quick simulation would be very beneficial. The first method illustrates utilization of MATLAB's Parallel Computing Toolbox (PCT) and the second method makes use of another toolbox, JACKET, to speed up computations on the CPU and GPU, respectively. Results indicate significant reduction in computational time for the same accuracy using multicore CPUs. Many-core platforms such as GPUs are also promising towards computational time reduction for larger problems despite the limitations of lower clock speed and device memory. This lends credence to the use of highfidelity models (in place of reduced order models) for control and optimization of particulate processes.

Download Full-text

Task Resource Allocation in Grid using Swift Scheduler

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2009.2.2423 ◽

2009 ◽

Vol 4 (2) ◽

pp. 158 ◽

Cited By ~ 8

Author(s):

K. Somasundaram ◽

S. Radhakrishnan

Keyword(s):

Waiting Time ◽

Heuristic Search ◽

Dynamic Scheduling ◽

Weather Forecasting ◽

Search Algorithm ◽

Scheduling Algorithm ◽

Computational Time ◽

Parallel And Distributed Computing ◽

Heuristic Search Algorithm ◽

Computationally Intensive

In nature, Grid computing is the combination of parallel and distributed computing where running computationally intensive applications like sequence alignment, weather forecasting, etc are needed a proficient scheduler to solve the problems awfully fast. Most of the Grid tasks are scheduled based on the First come first served (FCFS) or FCFS with advanced reservation, Shortest Job First (SJF) and etc. But these traditional algorithms seize more computational time due to soar waiting time of jobs in job queue. In Grid scheduling algorithm, the resources selection is NPcomplete. To triumph over the above problem, we proposed a new dynamic scheduling algorithm which is the combination of heuristic search algorithm and traditional SJF algorithm called swift scheduler. The proposed algorithm takes care of Job’s memory and CPU requirements along with the priority of jobs and resources. Our experimental results shows that our scheduler reduces the average waiting time in the job queue and reduces the over all computational time.

Download Full-text

Comprehensive Analysis of Fluid-Structure Interactive Schemes for Modeling Aircraft Water Impact Scenarios

Volume 2: Fluid Mechanics; Multiphase Flows ◽

10.1115/fedsm2020-20465 ◽

2020 ◽

Author(s):

Suraj Jain Megharaja ◽

Javid Bayandor

Keyword(s):

Experimental Studies ◽

Structural Response ◽

Computational Time ◽

Water Impact ◽

Computationally Efficient ◽

Aerospace Applications ◽

Computationally Intensive ◽

Emergency Scenarios ◽

The Right ◽

Effective Transfer

Abstract Aircraft emergency water landings (ditching) are uncommon but remain an ever-present possibility. Therefore, crashworthiness standards as part of the Federal Aviation Regulations demand such situations to be accounted for during the certification phase. The criteria require aircraft to prove its ability to survive ditching and be able to float after impact for a duration long enough for the passengers to be rescued. In emergency scenarios, it is preferred to choose an open body of water as the landing location as opposed to hard terrain. It would be prohibitively expensive to test impacts of this nature to cover all required certification cases. The data collection can also be a tedious process. Due to these hindrances, performing numerical validation of aircraft water ditching (fluid-solid) interactions has become more important than ever. In case of hard terrain impact, most of the energy is absorbed by the frame of an aircraft. However, in water impacts, the initial load is distributed over the skin. As a result, the ability of an aircraft to withstand crash becomes dependent on the strength of the shear panels to allow an effective transfer of impact energy to damage-absorbing members and mechanisms before failling. Large full-scale simulations to capture the structural response of an aircraft under severe impact loading however can be computationally intensive. This work focusses on comparative analysis of numerical strategies for assessing fluid-structural Interactions. Two of the methods considered are Lagrangian, and Arbitrary Lagrangian and Eulerian (ALE) schemes. For preliminary validations, the experimental studies performed by other research groups have been used to investigate the effect of mesh refinement and computational time on the Lagrangian and ALE schemes. These simulations will provide a basis for selecting the right formulations when developing fluid-solid interactive models for aerospace applications. Based on the results of the studies conducted, the most computationally efficient scheme was then used to perform the simulations of an aircraft fuselage section when impacting water in an emergency landing situation. The fuselage model used in this project was pre-validated against a rigid terrain experimental drop test before it was applied to the ditching studies. Overall, this investigation aims at assessing advanced modeling techniques and approaches that can pave the way for analysis-assited water impact certification and, ultimately, certification by analysis.

Download Full-text

Fast and Accurate Model of Interior Permanent-Magnet Machine for Dynamic Characterization

Energies ◽

10.3390/en12050783 ◽

2019 ◽

Vol 12 (5) ◽

pp. 783 ◽

Cited By ~ 8

Author(s):

Klemen Drobnič ◽

Lovrenc Gašparin ◽

Rastko Fišer

Keyword(s):

Permanent Magnet ◽

Current Model ◽

Dynamic Performance ◽

Design Stage ◽

Computational Time ◽

Element Analysis ◽

Interior Permanent Magnet ◽

Computationally Intensive ◽

State Of Affairs ◽

Linkage Model

A high-fidelity two-axis model of an interior permanent-magnet synchronous machine (IPM) presents a convenient way for the characterization and validation of motor dynamic performance during the design stage. In order to consider a nonlinear IPM nature, the model is parameterized with a standard dataset calculated beforehand by finite-element analysis. From two possible model implementations, the current model (CM) seems to be preferable to the flux-linkage model (FLM). A particular reason for this state of affairs is the rather complex and time-demanding parameterization of FLM in comparison with CM. For this reason, a procedure for the fast and reliable parameterization of FLM is presented. The proposed procedure is significantly faster than comparable methods, hence providing considerable improvement in terms of computational time. Additionally, the execution time of FLM was demonstrated to be up to 20% shorter in comparison to CM. Therefore, the FLM should be used in computationally intensive simulation scenarios that have a significant number of iterations, or excessive real-time time span.

Download Full-text

MEMORY EFFICIENT IMPLEMENTATION OF AES S-BOXES ON FPGA

Journal of Circuits System and Computers ◽

10.1142/s0218126607003873 ◽

2007 ◽

Vol 16 (04) ◽

pp. 603-611 ◽

Cited By ~ 8

Author(s):

ARSHAD AZIZ ◽

NASSAR IKRAM

Keyword(s):

Processing Speed ◽

Gate Arrays ◽

Embedded Platform ◽

Substitution Boxes ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Computationally Intensive ◽

Active Research ◽

Xilinx Fpga ◽

Memory Efficient

Optimized implementation of computationally intensive cryptographic transformation is an area of active research, mainly focused on Advanced Encryption Standard (AES). Byte substitution implemented using substitution boxes (S-boxes), is the main transformation in AES which strains the enabling embedded platform, e.g., Field Programmable Gate Arrays. We represent a novel clocking technique enabling optimized implementation of Byte Substitution that enhances processing speed and reduces the area required for S-boxes on Xilinx FPGA Block RAM (BRAM).

Download Full-text