scholarly journals Polynomial multiplication on embedded vector architectures

Author(s):  
Hanno Becker ◽  
Jose Maria Bermudo Mera ◽  
Angshuman Karmakar ◽  
Joseph Yiu ◽  
Ingrid Verbauwhede

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being singleissue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber

Author(s):  
Martin R. Albrecht ◽  
Christian Hanser ◽  
Andrea Hoeller ◽  
Thomas Pöppelmann ◽  
Fernando Virdia ◽  
...  

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.


Author(s):  
Jose Maria Bermudo Mera ◽  
Angshuman Karmakar ◽  
Ingrid Verbauwhede

Since the introduction of the ring-learning with errors problem, the number theoretic transform (NTT) based polynomial multiplication algorithm has been studied extensively. Due to its faster quasilinear time complexity, it has been the preferred choice of cryptographers to realize ring-learning with errors cryptographic schemes. Compared to NTT, Toom-Cook or Karatsuba based polynomial multiplication algorithms, though being known for a long time, still have a fledgling presence in the context of post-quantum cryptography.In this work, we observe that the pre- and post-processing steps in Toom-Cook based multiplications can be expressed as linear transformations. Based on this observation we propose two novel techniques that can increase the efficiency of Toom-Cook based polynomial multiplications. Evaluation is reduced by a factor of 2, and we call this method precomputation, and interpolation is reduced from quadratic to linear, and we call this method lazy interpolation.As a practical application, we applied our algorithms to the Saber post-quantum key-encapsulation mechanism. We discuss in detail the various implementation aspects of applying our algorithms to Saber. We show that our algorithm can improve the efficiency of the computationally costly matrix-vector multiplication by 12−37% compared to previous methods on their respective platforms. Secondly, we propose different methods to reduce the memory footprint of Saber for Cortex-M4 microcontrollers. Our implementation shows between 2.6 and 5.7 KB reduction in the memory usage with respect to the smallest implementation in the literature.


MRS Bulletin ◽  
1997 ◽  
Vol 22 (10) ◽  
pp. 49-54 ◽  
Author(s):  
E. Todd Ryan ◽  
Andrew J. McKerrow ◽  
Jihperng Leu ◽  
Paul S. Ho

Continuing improvement in device density and performance has significantly affected the dimensions and complexity of the wiring structure for on-chip interconnects. These enhancements have led to a reduction in the wiring pitch and an increase in the number of wiring levels to fulfill demands for density and performance improvements. As device dimensions shrink to less than 0.25 μm, the propagation delay, crosstalk noise, and power dissipation due to resistance-capacitance (RC) coupling become significant. Accordingly the interconnect delay now constitutes a major fraction of the total delay limiting the overall chip performance. Equally important is the processing complexity due to an increase in the number of wiring levels. This inevitably drives cost up by lowering the manufacturing yield due to an increase in defects and processing complexity.To address these problems, new materials for use as metal lines and interlayer dielectrics (ILDs) and alternative architectures have surfaced to replace the current Al(Cu)/SiO2 interconnect technology. These alternative architectures will require the introduction of low-dielectric-constant k materials as the interlayer dielectrics and/or low-resistivity conductors such as copper. The electrical and thermomechanical properties of SiO2 are ideal for ILD applications, and a change to material with different properties has important process-integration implications. To facilitate the choice of an alternative ILD, it is necessary to establish general criterion for evaluating thin-film properties of candidate low-k materials, which can be later correlated with process-integration problems.


Energies ◽  
2021 ◽  
Vol 14 (14) ◽  
pp. 4089
Author(s):  
Kaiqiang Zhang ◽  
Dongyang Ou ◽  
Congfeng Jiang ◽  
Yeliang Qiu ◽  
Longchuan Yan

In terms of power and energy consumption, DRAMs play a key role in a modern server system as well as processors. Although power-aware scheduling is based on the proportion of energy between DRAM and other components, when running memory-intensive applications, the energy consumption of the whole server system will be significantly affected by the non-energy proportion of DRAM. Furthermore, modern servers usually use NUMA architecture to replace the original SMP architecture to increase its memory bandwidth. It is of great significance to study the energy efficiency of these two different memory architectures. Therefore, in order to explore the power consumption characteristics of servers under memory-intensive workload, this paper evaluates the power consumption and performance of memory-intensive applications in different generations of real rack servers. Through analysis, we find that: (1) Workload intensity and concurrent execution threads affects server power consumption, but a fully utilized memory system may not necessarily bring good energy efficiency indicators. (2) Even if the memory system is not fully utilized, the memory capacity of each processor core has a significant impact on application performance and server power consumption. (3) When running memory-intensive applications, memory utilization is not always a good indicator of server power consumption. (4) The reasonable use of the NUMA architecture will improve the memory energy efficiency significantly. The experimental results show that reasonable use of NUMA architecture can improve memory efficiency by 16% compared with SMP architecture, while unreasonable use of NUMA architecture reduces memory efficiency by 13%. The findings we present in this paper provide useful insights and guidance for system designers and data center operators to help them in energy-efficiency-aware job scheduling and energy conservation.


Author(s):  
Xiaomo Jiang ◽  
Craig Foster

Gas turbine simple or combined cycle plants are built and operated with higher availability, reliability, and performance in order to provide the customer with sufficient operating revenues and reduced fuel costs meanwhile enhancing customer dispatch competitiveness. A tremendous amount of operational data is usually collected from the everyday operation of a power plant. It has become an increasingly important but challenging issue about how to turn this data into knowledge and further solutions via developing advanced state-of-the-art analytics. This paper presents an integrated system and methodology to pursue this purpose by automating multi-level, multi-paradigm, multi-facet performance monitoring and anomaly detection for heavy duty gas turbines. The system provides an intelligent platform to drive site-specific performance improvements, mitigate outage risk, rationalize operational pattern, and enhance maintenance schedule and service offerings via taking appropriate proactive actions. In addition, the paper also presents the components in the system, including data sensing, hardware, and operational anomaly detection, expertise proactive act of company, site specific degradation assessment, and water wash effectiveness monitoring and analytics. As demonstrated in two examples, this remote performance monitoring aims to improve equipment efficiency by converting data into knowledge and solutions in order to drive value for customers including lowering operating fuel cost and increasing customer power sales and life cycle value.


Author(s):  
Roberto Dieci ◽  
Xue-Zhong He

AbstractThis paper presents a stylized model of interaction among boundedly rational heterogeneous agents in a multi-asset financial market to examine how agents’ impatience, extrapolation, and switching behaviors can affect cross-section market stability. Besides extrapolation and performance based switching between fundamental and extrapolative trading documented in single asset market, we show that a high degree of ‘impatience’ of agents who are ready to switch to more profitable trading strategy in the short run provides a further cross-section destabilizing mechanism. Though the ‘fundamental’ steady-state values, which reflect the standard present-value of the dividends, represent an unbiased equilibrium market outcome in the long run (to a certain extent), the price deviation from the fundamental price in one asset can spill-over to other assets, resulting in cross-section instability. Based on a (Neimark–Sacker) bifurcation analysis, we provide explicit conditions on how agents’ impatience, extrapolation, and switching can destabilize the market and result in a variety of short and long-run patterns for the cross-section asset price dynamics.


AIHA Journal ◽  
2003 ◽  
Vol 64 (5) ◽  
pp. 660-667 ◽  
Author(s):  
Katharyn A. Grant ◽  
John G. Garland ◽  
Todd C. Joachim ◽  
Andrew Wallen ◽  
Twyla Vital

2003 ◽  
Vol 51 (5) ◽  
pp. 543 ◽  
Author(s):  
María A. Pérez-Fernández ◽  
Byron B. Lamont

Six Spanish legumes, Cytisus balansae, C. multiflorus, C. scoparius, C. striatus, Genista hystrix and Retama sphaerocarpa, were able to form effective nodules when grown in six south-western Australian soils. Soils and nodules were collected from beneath natural stands of six native Australian legumes, Jacksonia floribunda, Gompholobium tomentosum, Bossiaea aquifolium, Daviesia horrida, Gastrolobium spinosum and Templetonia retusa. Four combinations of soils and bacterial treatments were used as the soil treatments: sterile soil (S), sterile inoculated soils (SI), non-treated soil (N) and non-treated inoculated soils (NI). Seedlings of the Australian species were inoculated with rhizobia cultured from nodules of the same species, while seedlings of the Spanish species were inoculated with cultures from each of the Australian species. All Australian rhizobia infected all the Spanish species, suggesting a high degree of 'promiscuity' among the bacteria and plant species. The results from comparing six Spanish and six Australian species according to their biomass and total nitrogen in the presence (NI) or absence (S) of rhizobia showed that all species benefitted from nodulation (1.02–12.94 times), with R.�sphaerocarpa and C. striatus benefiting more than the native species. Inoculation (SI and NI) was just as effective as, or more effective than the non-treated soil (i.e. non-sterile) in inducing nodules. Nodules formed on the Spanish legumes were just as efficient at fixing N2 as were those formed on the Australian legumes. Inoculation was less effective than non-treated soil at increasing biomass but just as effective as the soil at increasing nitrogen content. Promiscuity in the legume–bacteria symbiosis should increase the ability of legumes to spread into new habitats throughout the world.


PEDIATRICS ◽  
1996 ◽  
Vol 97 (2) ◽  
pp. 179-184
Author(s):  
Bahman Joorabchi ◽  
Jeffrey M. Devries

Objective. To evaluate a 3-year experience with the Objective Structured Clinical Examinations (OSCEs) and to compare faculty expectations with resident performance. Design. Descriptive analysis of measures of resident performance. Setting. Community-based pediatric residency program in Michigan. Participants. One hundred twenty-six pediatric residents at all levels of training. Methods. The three examinations consisted of 36 to 42 5-minute stations, testing skills in physical examination, history, counseling, telephone management, and test interpretation. A committee of faculty and chief residents predetermined minimum pass levels for each resident level. Results were compared with other indices of resident performance. Results. There was evidence for content, construct, and concurrent validity, as well as a high degree of reliability. However, 40% to 96% of residents scored below the minimum pass levels for their levels. In each examination, third-year residents had the highest failure rates, yet they scored well on the American Board of Pediatrics in-training examination and on their monthly clinical evaluations. Furthermore, for residents at all levels, the scores reflecting application of data were significantly lower than those assessing data gathering. Conclusions. The gaps between expectations and performance, and between data gathering and application, have important implications for institutional educational philosophy, suggesting a shift toward more clinically oriented and learner-directed strategies in the design of instructional and evaluation methods.


Sign in / Sign up

Export Citation Format

Share Document