Device Placement Optimization for Deep Neural Networks via One-shot Model and Reinforcement Learning

Author(s):  
Zixiang Ding ◽  
Yaran Chen ◽  
Nannan Li ◽  
Dongbin Zhao
2021 ◽  
Vol 2 (1) ◽  
pp. 1-25
Author(s):  
Yongsen Ma ◽  
Sheheryar Arshad ◽  
Swetha Muniraju ◽  
Eric Torkildson ◽  
Enrico Rantala ◽  
...  

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Tiago Pereira ◽  
Maryam Abbasi ◽  
Bernardete Ribeiro ◽  
Joel P. Arrais

AbstractIn this work, we explore the potential of deep learning to streamline the process of identifying new potential drugs through the computational generation of molecules with interesting biological properties. Two deep neural networks compose our targeted generation framework: the Generator, which is trained to learn the building rules of valid molecules employing SMILES strings notation, and the Predictor which evaluates the newly generated compounds by predicting their affinity for the desired target. Then, the Generator is optimized through Reinforcement Learning to produce molecules with bespoken properties. The innovation of this approach is the exploratory strategy applied during the reinforcement training process that seeks to add novelty to the generated compounds. This training strategy employs two Generators interchangeably to sample new SMILES: the initially trained model that will remain fixed and a copy of the previous one that will be updated during the training to uncover the most promising molecules. The evolution of the reward assigned by the Predictor determines how often each one is employed to select the next token of the molecule. This strategy establishes a compromise between the need to acquire more information about the chemical space and the need to sample new molecules, with the experience gained so far. To demonstrate the effectiveness of the method, the Generator is trained to design molecules with an optimized coefficient of partition and also high inhibitory power against the Adenosine $$A_{2A}$$ A 2 A and $$\kappa$$ κ opioid receptors. The results reveal that the model can effectively adjust the newly generated molecules towards the wanted direction. More importantly, it was possible to find promising sets of unique and diverse molecules, which was the main purpose of the newly implemented strategy.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 195608-195621
Author(s):  
Aleksey Staroverov ◽  
Dmitry A. Yudin ◽  
Ilya Belkin ◽  
Vasily Adeshkin ◽  
Yaroslav K. Solomentsev ◽  
...  

Author(s):  
Yusuke Taguchi ◽  
Hideitsu Hino ◽  
Keisuke Kameyama

AbstractThere are many situations in supervised learning where the acquisition of data is very expensive and sometimes determined by a user’s budget. One way to address this limitation is active learning. In this study, we focus on a fixed budget regime and propose a novel active learning algorithm for the pool-based active learning problem. The proposed method performs active learning with a pre-trained acquisition function so that the maximum performance can be achieved when the number of data that can be acquired is fixed. To implement this active learning algorithm, the proposed method uses reinforcement learning based on deep neural networks as as a pre-trained acquisition function tailored for the fixed budget situation. By using the pre-trained deep Q-learning-based acquisition function, we can realize the active learner which selects a sample for annotation from the pool of unlabeled samples taking the fixed-budget situation into account. The proposed method is experimentally shown to be comparable with or superior to existing active learning methods, suggesting the effectiveness of the proposed approach for the fixed-budget active learning.


Author(s):  
Nicholas D. Kullman ◽  
Martin Cousineau ◽  
Justin C. Goodson ◽  
Jorge E. Mendoza

We consider the problem of an operator controlling a fleet of electric vehicles for use in a ride-hailing service. The operator, seeking to maximize profit, must assign vehicles to requests as they arise as well as recharge and reposition vehicles in anticipation of future requests. To solve this problem, we employ deep reinforcement learning, developing policies whose decision making uses [Formula: see text]-value approximations learned by deep neural networks. We compare these policies against a reoptimization-based policy and against dual bounds on the value of an optimal policy, including the value of an optimal policy with perfect information, which we establish using a Benders-based decomposition. We assess performance on instances derived from real data for the island of Manhattan in New York City. We find that, across instances of varying size, our best policy trained with deep reinforcement learning outperforms the reoptimization approach. We also provide evidence that this policy may be effectively scaled and deployed on larger instances without retraining.


Sign in / Sign up

Export Citation Format

Share Document