Neon NTT: Faster Dilithium, Kyber, and Saber on Cortex-A72 and Apple M1

We present new speed records on the Armv8-A architecture for the latticebased schemes Dilithium, Kyber, and Saber. The core novelty in this paper is the combination of Montgomery multiplication and Barrett reduction resulting in “Barrett multiplication” which allows particularly efficient modular one-known-factor multiplication using the Armv8-A Neon vector instructions. These novel techniques combined with fast two-unknown-factor Montgomery multiplication, Barrett reduction sequences, and interleaved multi-stage butterflies result in significantly faster code. We also introduce “asymmetric multiplication” which is an improved technique for caching the results of the incomplete NTT, used e.g. for matrix-to-vector polynomial multiplication. Our implementations target the Arm Cortex-A72 CPU, on which our speed is 1.7× that of the state-of-the-art matrix-to-vector polynomial multiplication in kyber768 [Nguyen–Gaj 2021]. For Saber, NTTs are far superior to Toom–Cook multiplication on the Armv8-A architecture, outrunning the matrix-to-vector polynomial multiplication by 2.0×. On the Apple M1, our matrix-vector products run 2.1× and 1.9× faster for Kyber and Saber respectively.

Download Full-text

The Structure and Development of Microbodies in the Fat Body and other Tissues of an Insect (Calpodes Ethlius)

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010006773x ◽

1970 ◽

Vol 28 ◽

pp. 148-149

Author(s):

M. Locke ◽

J. T. McMahon

Keyword(s):

Electron Microscopy ◽

Liquid Crystals ◽

Fat Body ◽

Competitive Inhibitor ◽

Dense Core ◽

Urate Oxidase ◽

Blood Proteins ◽

Free Base ◽

The Core ◽

The Matrix

The fat body of insects has always been compared functionally to the liver of vertebrates. Both synthesize and store glycogen and lipid and are concerned with the formation of blood proteins. The comparison becomes even more apt with the discovery of microbodies and the localization of urate oxidase and catalase in insect fat body.The microbodies are oval to spherical bodies about 1μ across with a depression and dense core on one side. The core is made of coiled tubules together with dense material close to the depressed membrane. The tubules may appear loose or densely packed but always intertwined like liquid crystals, never straight as in solid crystals (Fig. 1). When fat body is reacted with diaminobenzidine free base and H2O2 at pH 9.0 to determine the distribution of catalase, electron microscopy shows the enzyme in the matrix of the microbodies (Fig. 2). The reaction is abolished by 3-amino-1, 2, 4-triazole, a competitive inhibitor of catalase. The fat body is the only tissue which consistantly reacts positively for urate oxidase. The reaction product is sharply localized in granules of about the same size and distribution as the microbodies. The reaction is inhibited by 2, 6, 8-trichloropurine, a competitive inhibitor of urate oxidase.

Download Full-text

Efficient Three-Way Split Formulas for Binary Polynomial Multiplication and Toeplitz Matrix Vector Product

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.e101.a.239 ◽

2018 ◽

Vol E101.A (1) ◽

pp. 239-248

Author(s):

Sun-Mi PARK ◽

Ku-Young CHANG ◽

Dowon HONG ◽

Changho SEO

Keyword(s):

Toeplitz Matrix ◽

Vector Product ◽

Polynomial Multiplication ◽

Matrix Vector

Download Full-text

Selecting optimal SpMV realizations for GPUs via machine learning

The International Journal of High Performance Computing Applications ◽

10.1177/1094342021990738 ◽

2021 ◽

pp. 109434202199073

Author(s):

Ernesto Dufrechou ◽

Pablo Ezzatti ◽

Enrique S Quintana-Ortí

Keyword(s):

Machine Learning ◽

Sparse Matrix ◽

Machine Learning Techniques ◽

Optimal Method ◽

Learning Techniques ◽

General Rules ◽

Machine Learning Approach ◽

The Matrix ◽

Time And Energy ◽

Matrix Vector

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.

Download Full-text

Microwave Assisted Multi-Stage Countercurrent Extraction of Dihydromyricetin from Ampelopsis grossedentataa

International Journal of Food Engineering ◽

10.2202/1556-3758.2137 ◽

2011 ◽

Vol 7 (4) ◽

Cited By ~ 3

Author(s):

Wei Li ◽

Cheng Zheng ◽

Jian Zhao ◽

Zhengxiang Ning

Keyword(s):

Extraction Efficiency ◽

Extraction Process ◽

Extraction Time ◽

Countercurrent Extraction ◽

Bioactive Substances ◽

Extraction Solvent ◽

Microwave Assisted ◽

Multi Stage ◽

The Matrix ◽

Substantial Concentration

A novel microwave assisted multi-stage countercurrent extraction (MAMCE) technique was developed for the extraction of dihydromyricetin from Chinese rattan tea, Ampelopsis grossedentata. The technique combined the advantages of microwave heating and dynamic multi-stage countercurrent extraction and achieved marked improvement in extraction efficiency over microwave assisted batch extraction. Analysis of dihydromyricetin concentrations in the solvent and matrix throughout the extraction process showed that by dividing the extraction into multiple stages and exchanging of solvents between stages, steady and substantial concentration gradients were established between the matrix and solvent, thus enabling the achievement of high extraction efficiency. The yield of dihydromyricetin was significantly affected by temperature, pH, solvent/material ratio and extraction time, and optimal extraction conditions were found to be 80-100°C, at acidic pH with a solvent/material ratio of 25-30 to 1 and extraction time of 5-10 min. With the high extraction efficiency and low usage of extraction solvent, MAMCE could prove to be a promising extraction technique which can be applied to the extraction of dihydromyricentin and other bioactive substances from natural materials.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Tactile Occupant Detection Sensor for Automotive Airbag

Energies ◽

10.3390/en14175288 ◽

2021 ◽

Vol 14 (17) ◽

pp. 5288

Author(s):

Naveen Shirur ◽

Christian Birkner ◽

Roman Henze ◽

Thomas M. Deserno

Keyword(s):

High Speed ◽

Tactile Sensors ◽

Low Velocity ◽

Speed Test ◽

Multi Stage ◽

The Matrix ◽

Contact Data ◽

Single Sensor ◽

Matrix Sensor ◽

Contact Position

Automotive airbags protect occupants from crash forces during severe vehicle collisions. They absorb energy and restrain the occupants by providing a soft cushion effect known as the restraint effect. Modern airbags offer partial restraint effect control by controlling the bag’s vent holes and providing multi-stage deployment. Full restraint effect control is still a challenge because the closed-loop restraint control system needs airbag–occupant contact and interaction feedback. In this work, we have developed novel single and matrix capacitive tactile sensors to measure the occupant’s contact data. They can be integrated with the airbag surface and folded to follow the dynamic airbag shape during the deployment. The sensors are tested under a low-velocity pendulum impact and benchmarked with high-speed test videos. The results reveal that the single sensor can successfully measure occupant–airbag contact time and estimate the area, while the contact position is additionally identified from the matrix sensor.

Download Full-text

Random Fourier Features via Fast Surrogate Leverage Weighted Sampling

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5920 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4844-4851

Author(s):

Fanghui Liu ◽

Xiaolin Huang ◽

Yudong Chen ◽

Jie Yang ◽

Johan Suykens

Keyword(s):

State Of The Art ◽

Sampling Strategy ◽

Prediction Performance ◽

Current State ◽

Kernel Approximation ◽

Sampling Process ◽

New Strategy ◽

The Matrix ◽

Benchmark Datasets ◽

Random Fourier Features

In this paper, we propose a fast surrogate leverage weighted sampling strategy to generate refined random Fourier features for kernel approximation. Compared to the current state-of-the-art method that uses the leverage weighted scheme (Li et al. 2019), our new strategy is simpler and more effective. It uses kernel alignment to guide the sampling process and it can avoid the matrix inversion operator when we compute the leverage function. Given n observations and s random features, our strategy can reduce the time complexity for sampling from O(ns2+s3) to O(ns2), while achieving comparable (or even slightly better) prediction performance when applied to kernel ridge regression (KRR). In addition, we provide theoretical guarantees on the generalization performance of our approach, and in particular characterize the number of random features required to achieve statistical guarantees in KRR. Experiments on several benchmark datasets demonstrate that our algorithm achieves comparable prediction performance and takes less time cost when compared to (Li et al. 2019).

Download Full-text

Optimization Learning: Perspective, Method, and Applications

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/728 ◽

2020 ◽

Author(s):

Risheng Liu

Keyword(s):

Inverse Problems ◽

Theoretical Analysis ◽

Iterative Methods ◽

State Of The Art ◽

Learning Paradigm ◽

Rigorous Analysis ◽

The Core ◽

Globally Convergent ◽

Ill Posed ◽

Art Performance

Numerous tasks at the core of statistics, learning, and vision areas are specific cases of ill-posed inverse problems. Recently, learning-based (e.g., deep) iterative methods have been empirically shown to be useful for these problems. Nevertheless, integrating learnable structures into iterations is still a laborious process, which can only be guided by intuitions or empirical insights. Moreover, there is a lack of rigorous analysis of the convergence behaviors of these reimplemented iterations, and thus the significance of such methods is a little bit vague. We move beyond these limits and propose a theoretically guaranteed optimization learning paradigm, a generic and provable paradigm for nonconvex inverse problems, and develop a series of convergent deep models. Our theoretical analysis reveals that the proposed optimization learning paradigm allows us to generate globally convergent trajectories for learning-based iterative methods. Thanks to the superiority of our framework, we achieve state-of-the-art performance on different real applications.

Download Full-text

A Review of Hydraulic Fracturing Simulation

Archives of Computational Methods in Engineering ◽

10.1007/s11831-021-09653-z ◽

2021 ◽

Author(s):

Bin Chen ◽

Beatriz Ramos Barboza ◽

Yanan Sun ◽

Jie Bai ◽

Hywel R Thomas ◽

...

Keyword(s):

Hydraulic Fracturing ◽

State Of The Art ◽

Numerical Models ◽

Gas Production ◽

Physical Processes ◽

Horizontal Drilling ◽

Improve Treatment ◽

Multi Stage ◽

Pros And Cons ◽

Treatment Designs

AbstractAlong with horizontal drilling techniques, multi-stage hydraulic fracturing has improved shale gas production significantly in past decades. In order to understand the mechanism of hydraulic fracturing and improve treatment designs, it is critical to conduct modelling to predict stimulated fractures. In this paper, related physical processes in hydraulic fracturing are firstly discussed and their effects on hydraulic fracturing processes are analysed. Then historical and state of the art numerical models for hydraulic fracturing are reviewed, to highlight the pros and cons of different numerical methods. Next, commercially available software for hydraulic fracturing design are discussed and key features are summarised. Finally, we draw conclusions from the previous discussions in relation to physics, method and applications and provide recommendations for further research.

Download Full-text

Asymmetric Distribution Measure for Few-shot Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/409 ◽

2020 ◽

Author(s):

Wenbin Li ◽

Lei Wang ◽

Jing Huo ◽

Yinghuan Shi ◽

Yang Gao ◽

...

Keyword(s):

State Of The Art ◽

Asymmetric Distribution ◽

Query Image ◽

Local Descriptor ◽

Innovative Design ◽

Feature Representations ◽

The Core ◽

Measure Function ◽

Asymmetric Relation ◽

Core Idea

The core idea of metric-based few-shot image classification is to directly measure the relations between query images and support classes to learn transferable feature embeddings. Previous work mainly focuses on image-level feature representations, which actually cannot effectively estimate a class's distribution due to the scarcity of samples. Some recent work shows that local descriptor based representations can achieve richer representations than image-level based representations. However, such works are still based on a less effective instance-level metric, especially a symmetric metric, to measure the relation between a query image and a support class. Given the natural asymmetric relation between a query image and a support class, we argue that an asymmetric measure is more suitable for metric-based few-shot learning. To that end, we propose a novel Asymmetric Distribution Measure (ADM) network for few-shot learning by calculating a joint local and global asymmetric measure between two multivariate local distributions of a query and a class. Moreover, a task-aware Contrastive Measure Strategy (CMS) is proposed to further enhance the measure function. On popular miniImageNet and tieredImageNet, ADM can achieve the state-of-the-art results, validating our innovative design of asymmetric distribution measures for few-shot learning. The source code can be downloaded from https://github.com/WenbinLee/ADM.git.

Download Full-text