Efficient Neural Architecture Search via Proximal Iterations

Neural architecture search (NAS) attracts much research attention because of its ability to identify better architectures than handcrafted ones. Recently, differentiable search methods become the state-of-the-arts on NAS, which can obtain high-performance architectures in several days. However, they still suffer from huge computation costs and inferior performance due to the construction of the supernet. In this paper, we propose an efficient NAS method based on proximal iterations (denoted as NASP). Different from previous works, NASP reformulates the search process as an optimization problem with a discrete constraint on architectures and a regularizer on model complexity. As the new objective is hard to solve, we further propose an efficient algorithm inspired by proximal iterations for optimization. In this way, NASP is not only much faster than existing differentiable search methods, but also can find better architectures and balance the model complexity. Finally, extensive experiments on various tasks demonstrate that NASP can obtain high-performance architectures with more than 10 times speedup over the state-of-the-arts.

Download Full-text

The State of the Arts, 1999-2000: School Year 1999-00

PsycEXTRA Dataset ◽

10.1037/e573782006-001 ◽

2001 ◽

Keyword(s):

The State ◽

School Year ◽

The Arts

Download Full-text

About Measures of the State Support for Technical Equipment of Agricultural Producers

Economy of agricultural and processing enterprises ◽

10.31442/0235-2494-2020-0-9-20-27 ◽

2020 ◽

pp. 20-27

Author(s):

A.Ya. Kibirov ◽

Keyword(s):

Statistical Analysis ◽

Energy Saving ◽

High Performance ◽

The State ◽

Agricultural Products ◽

Technical Equipment ◽

Agricultural Machinery ◽

State Support ◽

Agricultural Producers ◽

Technical Base

The article uses methods of statistical analysis, deduction and analogy to consider programs at the Federal, regional and economic levels, which provide for measures aimed at improving the technical equipment of agricultural producers. Particular attention is paid to the acquisition of energy-saving, high-performance agricultural machinery and equipment used in the production and processing of agricultural products. An assessment of the effectiveness of state support for updating the material and technical base of agriculture is given. Based on the results of the study, conclusions and recommendations were formulated.

Download Full-text

A Report to the Minister for Communications, the Information Economy and the Arts on the State of Competition in Australian Telecommunications Services One Year After Deregulation

SSRN Electronic Journal ◽

10.2139/ssrn.972477 ◽

2007 ◽

Cited By ~ 1

Author(s):

Gregory Gregory Sidak

Keyword(s):

The State ◽

Information Economy ◽

The Arts ◽

Telecommunications Services ◽

One Year

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text

An Adaptive Throughput-First Packet Scheduling Algorithm for DPDK-Based Packet Processing Systems

Future Internet ◽

10.3390/fi13030078 ◽

2021 ◽

Vol 13 (3) ◽

pp. 78

Author(s):

Chuanhong Li ◽

Lei Song ◽

Xuewen Zeng

Keyword(s):

Packet Loss ◽

High Performance ◽

Packet Scheduling ◽

Scheduling Algorithm ◽

Processing System ◽

System Throughput ◽

Packet Processing ◽

Research Attention ◽

Continuous Increase ◽

Packet Scheduling Algorithm

The continuous increase in network traffic has sharply increased the demand for high-performance packet processing systems. For a high-performance packet processing system based on multi-core processors, the packet scheduling algorithm is critical because of the significant role it plays in load distribution, which is related to system throughput, attracting intensive research attention. However, it is not an easy task since the canonical flow-level packet scheduling algorithm is vulnerable to traffic locality, while the packet-level packet scheduling algorithm fails to maintain cache affinity. In this paper, we propose an adaptive throughput-first packet scheduling algorithm for DPDK-based packet processing systems. Combined with the feature of DPDK burst-oriented packet receiving and transmitting, we propose using Subflow as the scheduling unit and the adjustment unit making the proposed algorithm not only maintain the advantages of flow-level packet scheduling algorithms when the adjustment does not happen but also avoid packet loss as much as possible when the target core may be overloaded Experimental results show that the proposed method outperforms Round-Robin, HRW (High Random Weight), and CRC32 on system throughput and packet loss rate.

Download Full-text

Application of High Performance Parallel Computing Based on GPU

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.585 ◽

2013 ◽

Vol 411-414 ◽

pp. 585-588

Author(s):

Liu Yang ◽

Tie Ying Liu

Keyword(s):

Particle Swarm Optimization ◽

Parallel Computing ◽

Parallel Computation ◽

High Performance ◽

Search Process ◽

Search Rate ◽

Swarm Optimization ◽

Path Search ◽

Parallel Feature ◽

Time And Space Complexity

This paper introduces parallel feature of the GPU, which will help GPU parallel computation methods to achieve the parallelization of PSO parallel path search process; and reduce the increasingly high problem of PSO (PSO: Particle Swarm Optimization) in time and space complexity. The experimental results show: comparing with CPU mode, GPU platform calculation improves the search rate and shortens the calculation time.

Download Full-text

Thermodynamic analyses and optimization for thermoelectric devices: The state of the arts

Science China Technological Sciences ◽

10.1007/s11431-015-5970-5 ◽

2016 ◽

Vol 59 (3) ◽

pp. 442-455 ◽

Cited By ~ 87

Author(s):

LinGen Chen ◽

FanKai Meng ◽

FengRui Sun

Keyword(s):

The State ◽

Thermoelectric Devices ◽

The Arts

Download Full-text

A NEW LAYOUT DESIGN SYSTEM FOR MULTICHIP MODULES

International Journal of High Speed Electronics and Systems ◽

10.1142/s0129156495000171 ◽

1995 ◽

Vol 06 (03) ◽

pp. 509-538 ◽

Cited By ~ 1

Author(s):

BERNHARD M. RIESS ◽

ANDREAS A. SCHOENE

Keyword(s):

Efficient Algorithm ◽

State Of The Art ◽

Layout Design ◽

The State ◽

Design System ◽

Multichip Modules ◽

Analytical Technique ◽

Cut Method ◽

First Time ◽

Three Components

A new layout design system for multichip modules (MCMs) consisting of three components is described. It includes a k-way partitioning approach, an algorithm for pin assignment, and a placement package. For partitioning, we propose an analytical technique combined with a problem-specific multi-way ratio cut method. This method considers fixed module-level pad positions and assigns the cells to regularly arranged chips on the MCM substrate. In the subsequent pin assignment step the chip-level pads resulting from cut nets are positioned on the chip borders. Pin assignment is performed by an efficient algorithm, which profits from the cell coordinates generated by the analytical technique. Global and final placement for each chip is computed by the state-of-the-art placement tools GORDIANL and DOMINO. For the first time, results for MCM layout designs of benchmark circuits with up to 100,000 cells are presented. They show a small number of required chip-level pads, which is the most restricted resource in MCM design, and short total wire lengths.

Download Full-text

Advancing the state of the art in high-performance logic and array technology

IBM Journal of Research and Development ◽

10.1147/rd.365.0821 ◽

1992 ◽

Vol 36 (5) ◽

pp. 821-828 ◽

Cited By ~ 9

Author(s):

K. H. Brown ◽

D. A. Grose ◽

R. C. Lange ◽

T. H. Ning ◽

P. A. Totta

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Array Technology

Download Full-text

BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3467476 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1-28

Author(s):

Tao Yang ◽

Zhezhi He ◽

Tengchuan Kou ◽

Qingzheng Li ◽

Qi Han ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Optimization Approach ◽

Quantization Scheme ◽

Model Accuracy ◽

Sparsity Pattern ◽

Computing Platform ◽

Energy Efficiency Improvement ◽

Mixed Precision

Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.

Download Full-text