Dynamic selective auditory attention detection using RNN and reinforcement learning

AbstractThe cocktail party phenomenon describes the ability of the human brain to focus auditory attention on a particular stimulus while ignoring other acoustic events. Selective auditory attention detection (SAAD) is an important issue in the development of brain-computer interface systems and cocktail party processors. This paper proposes a new dynamic attention detection system to process the temporal evolution of the input signal. The proposed dynamic SAAD is modeled as a sequential decision-making problem, which is solved by recurrent neural network (RNN) and reinforcement learning methods of Q-learning and deep Q-learning. Among different dynamic learning approaches, the evaluation results show that the deep Q-learning approach with RNN as agent provides the highest classification accuracy (94.2%) with the least detection delay. The proposed SAAD system is advantageous, in the sense that the detection of attention is performed dynamically for the sequential inputs. Also, the system has the potential to be used in scenarios, where the attention of the listener might be switched in time in the presence of various acoustic events.

Download Full-text

Selective auditory attention detection using dynamic learning systems: The study of RNN and reinforcement learning

10.1101/2021.02.18.431748 ◽

2021 ◽

Author(s):

Masoud Geravanchizadeh ◽

Hossein Roushan

Keyword(s):

Reinforcement Learning ◽

Detection System ◽

Auditory Attention ◽

Final Decision ◽

Learning Approaches ◽

Cocktail Party ◽

Dynamic Learning ◽

Learning Stage ◽

Q Learning ◽

Markov Decision

AbstractThe cocktail party phenomenon describes the ability of the human brain to focus auditory attention on a particular stimulus while ignoring other acoustic events. Selective auditory attention detection (SAAD) is an important issue in the development of brain-computer interface systems and cocktail party processors. This paper proposes a new dynamic attention detection system to process the temporal evolution of the input signal. In the proposed dynamic system, after preprocessing of the input signals, the probabilistic state space of the system is formed. Then, in the learning stage, different dynamic learning methods, including recurrent neural network (RNN) and reinforcement learning (Markov decision process (MDP) and deep Q-learning) are applied to make the final decision as to the attended speech. Among different dynamic learning approaches, the evaluation results show that the deep Q-learning approach (MDP+RNN) provides the highest classification accuracy (94.2%) with the least detection delay. The proposed SAAD system is advantageous, in the sense that the detection of attention is performed dynamically for the sequential inputs. Also, the system has the potential to be used in scenarios, where the attention of the listener might be switched in time in the presence of various acoustic events.

Download Full-text

Selective network discovery via deep reinforcement learning on embedded spaces

Applied Network Science ◽

10.1007/s41109-021-00365-8 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter Morales ◽

Rajmonda Sulo Caceres ◽

Tina Eliassi-Rad

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Sequential Decision ◽

Network Discovery ◽

Learning Tasks ◽

Partially Observed ◽

Decision Making Problem ◽

Resource Collection ◽

Improved Performance ◽

Discovery Algorithms

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.

Download Full-text

Deep Reinforcement Learning for Energy Microgrids Management Considering Flexible Energy Sources

EPJ Web of Conferences ◽

10.1051/epjconf/201921701016 ◽

2019 ◽

Vol 217 ◽

pp. 01016 ◽

Cited By ~ 2

Author(s):

Nikita Tomin ◽

Alexey Zhukov ◽

Alexander Domyshev

Keyword(s):

Reinforcement Learning ◽

Electricity Consumption ◽

Energy Sources ◽

Sequential Decision ◽

Time Step ◽

Long Term Storage ◽

Decision Making Problem ◽

Short And Long Term ◽

Term Storage

The problem of optimally activating the flexible energy sources (short- and long-term storage capacities) of electricity microgrid is formulated as a sequential decision making problem under uncertainty where, at every time-step, the uncertainty comes from the lack of knowledge about future electricity consumption and weather dependent PV production. This paper proposes to address this problem using deep reinforcement learning. To this purpose, a specific deep learning architecture has been used in order to extract knowledge from past consumption and production time series as well as any available forecasts. The approach is empirically illustrated in the case of off-grid microgrids located in Belgium and Russia.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

An Efficient DenseNet-Based Deep Learning Model for Malware Detection

Entropy ◽

10.3390/e23030344 ◽

2021 ◽

Vol 23 (3) ◽

pp. 344

Author(s):

Jeyaprakash Hemalatha ◽

S. Abijah Roseline ◽

Subbiah Geetha ◽

Seifedine Kadry ◽

Robertas Damaševičius

Keyword(s):

Deep Learning ◽

Detection System ◽

Malware Detection ◽

Class Imbalance ◽

Learning Model ◽

Computational Time ◽

Learning Approaches ◽

Security Threat ◽

Static And Dynamic Analysis ◽

Deep Learning Model

Recently, there has been a huge rise in malware growth, which creates a significant security threat to organizations and individuals. Despite the incessant efforts of cybersecurity research to defend against malware threats, malware developers discover new ways to evade these defense techniques. Traditional static and dynamic analysis methods are ineffective in identifying new malware and pose high overhead in terms of memory and time. Typical machine learning approaches that train a classifier based on handcrafted features are also not sufficiently potent against these evasive techniques and require more efforts due to feature-engineering. Recent malware detectors indicate performance degradation due to class imbalance in malware datasets. To resolve these challenges, this work adopts a visualization-based method, where malware binaries are depicted as two-dimensional images and classified by a deep learning model. We propose an efficient malware detection system based on deep learning. The system uses a reweighted class-balanced loss function in the final classification layer of the DenseNet model to achieve significant performance improvements in classifying malware by handling imbalanced data issues. Comprehensive experiments performed on four benchmark malware datasets show that the proposed approach can detect new malware samples with higher accuracy (98.23% for the Malimg dataset, 98.46% for the BIG 2015 dataset, 98.21% for the MaleVis dataset, and 89.48% for the unseen Malicia dataset) and reduced false-positive rates when compared with conventional malware mitigation techniques while maintaining low computational time. The proposed malware detection solution is also reliable and effective against obfuscation attacks.

Download Full-text

Goal-driven active learning

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-021-09527-5 ◽

2021 ◽

Vol 35 (2) ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Process ◽

Real World ◽

Imitation Learning ◽

Learning Approaches ◽

Wide Range ◽

Fixed Set ◽

Complex Decision Making ◽

Complex Decision

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.

Download Full-text

Adaptive and Reinforcement Learning Approaches for Online Network Monitoring and Analysis

IEEE Transactions on Network and Service Management ◽

10.1109/tnsm.2020.3037486 ◽

2020 ◽

pp. 1-1

Author(s):

Sarah Wassermann ◽

Thibaut Cuvelier ◽

Pavol Mulinka ◽

Pedro Casas

Keyword(s):

Reinforcement Learning ◽

Network Monitoring ◽

Learning Approaches

Download Full-text

IoT Botnet Anomaly Detection Using Unsupervised Deep Learning

Electronics ◽

10.3390/electronics10161876 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1876

Author(s):

Ioana Apostol ◽

Marius Preda ◽

Constantin Nila ◽

Ion Bica

Keyword(s):

Deep Learning ◽

Detection System ◽

False Positive Rate ◽

Empirical Evaluation ◽

Learning Approaches ◽

Promising Alternative ◽

Positive Rate ◽

Unsupervised Deep Learning ◽

Attack Surface ◽

Iot Devices

The Internet of Things has become a cutting-edge technology that is continuously evolving in size, connectivity, and applicability. This ecosystem makes its presence felt in every aspect of our lives, along with all other emerging technologies. Unfortunately, despite the significant benefits brought by the IoT, the increased attack surface built upon it has become more critical than ever. Devices have limited resources and are not typically created with security features. Lately, a trend of botnet threats transitioning to the IoT environment has been observed, and an army of infected IoT devices can expand quickly and be used for effective attacks. Therefore, identifying proper solutions for securing IoT systems is currently an important and challenging research topic. Machine learning-based approaches are a promising alternative, allowing the identification of abnormal behaviors and the detection of attacks. This paper proposes an anomaly-based detection solution that uses unsupervised deep learning techniques to identify IoT botnet activities. An empirical evaluation of the proposed method is conducted on both balanced and unbalanced datasets to assess its threat detection capability. False-positive rate reduction and its impact on the detection system are also analyzed. Furthermore, a comparison with other unsupervised learning approaches is included. The experimental results reveal the performance of the proposed detection method.

Download Full-text

Reinforcement Learning Approaches in Social Robotics

Sensors ◽

10.3390/s21041292 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1292

Author(s):

Neziha Akalin ◽

Amy Loutfi

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Social Robotics ◽

Research Field ◽

Social Robots ◽

Learning Approaches ◽

Reward Function ◽

Optimal Behavior ◽

Learning Challenges ◽

Starting Point

This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.

Download Full-text

Reinforcement Learning approaches to Economic Dispatch problem

International Journal of Electrical Power & Energy Systems ◽

10.1016/j.ijepes.2010.12.008 ◽

2011 ◽

Vol 33 (4) ◽

pp. 836-845 ◽

Cited By ~ 28

Author(s):

E.A. Jasmin ◽

T.P. Imthias Ahamed ◽

V.P. Jagathy Raj

Keyword(s):

Reinforcement Learning ◽

Economic Dispatch ◽

Learning Approaches ◽

Economic Dispatch Problem

Download Full-text