An Overview of Audio Event Detection Methods from Feature Extraction to Classification

2017 ◽  
Vol 31 (9-10) ◽  
pp. 661-714 ◽  
Author(s):  
Elham Babaee ◽  
Nor Badrul Anuar ◽  
Ainuddin Wahid Abdul Wahab ◽  
Shahaboddin Shamshirband ◽  
Anthony T. Chronopoulos
Author(s):  
M. N. Favorskaya ◽  
L. C. Jain

Introduction:Saliency detection is a fundamental task of computer vision. Its ultimate aim is to localize the objects of interest that grab human visual attention with respect to the rest of the image. A great variety of saliency models based on different approaches was developed since 1990s. In recent years, the saliency detection has become one of actively studied topic in the theory of Convolutional Neural Network (CNN). Many original decisions using CNNs were proposed for salient object detection and, even, event detection.Purpose:A detailed survey of saliency detection methods in deep learning era allows to understand the current possibilities of CNN approach for visual analysis conducted by the human eyes’ tracking and digital image processing.Results:A survey reflects the recent advances in saliency detection using CNNs. Different models available in literature, such as static and dynamic 2D CNNs for salient object detection and 3D CNNs for salient event detection are discussed in the chronological order. It is worth noting that automatic salient event detection in durable videos became possible using the recently appeared 3D CNN combining with 2D CNN for salient audio detection. Also in this article, we have presented a short description of public image and video datasets with annotated salient objects or events, as well as the often used metrics for the results’ evaluation.Practical relevance:This survey is considered as a contribution in the study of rapidly developed deep learning methods with respect to the saliency detection in the images and videos.


2018 ◽  
Author(s):  
Pankaj Joshi ◽  
Digvijaysingh Gautam ◽  
Ganesh Ramakrishnan ◽  
Preethi Jyothi

2021 ◽  
Author(s):  
Hansi Hettiarachchi ◽  
Mariam Adedoyin-Olowe ◽  
Jagdev Bhogal ◽  
Mohamed Medhat Gaber

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5272
Author(s):  
Nicole Zahradka ◽  
Khushboo Verma ◽  
Ahad Behboodi ◽  
Barry Bodt ◽  
Henry Wright ◽  
...  

Video- and sensor-based gait analysis systems are rapidly emerging for use in ‘real world’ scenarios outside of typical instrumented motion analysis laboratories. Unlike laboratory systems, such systems do not use kinetic data from force plates, rather, gait events such as initial contact (IC) and terminal contact (TC) are estimated from video and sensor signals. There are, however, detection errors inherent in kinematic gait event detection methods (GEDM) and comparative study between classic laboratory and video/sensor-based systems is warranted. For this study, three kinematic methods: coordinate based treadmill algorithm (CBTA), shank angular velocity (SK), and foot velocity algorithm (FVA) were compared to ‘gold standard’ force plate methods (GS) for determining IC and TC in adults (n = 6), typically developing children (n = 5) and children with cerebral palsy (n = 6). The root mean square error (RMSE) values for CBTA, SK, and FVA were 27.22, 47.33, and 78.41 ms, respectively. On average, GED was detected earlier in CBTA and SK (CBTA: −9.54 ± 0.66 ms, SK: −33.41 ± 0.86 ms) and delayed in FVA (21.00 ± 1.96 ms). The statistical model demonstrated insensitivity to variations in group, side, and individuals. Out of three kinematic GEDMs, SK GEDM can best be used for sensor-based gait event detection.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5037
Author(s):  
Hisham ElMoaqet ◽  
Mohammad Eid ◽  
Martin Glos ◽  
Mutaz Ryalat ◽  
Thomas Penzel

Sleep apnea is a common sleep disorder that causes repeated breathing interruption during sleep. The performance of automated apnea detection methods based on respiratory signals depend on the signals considered and feature extraction methods. Moreover, feature engineering techniques are highly dependent on the experts’ experience and their prior knowledge about different physiological signals and conditions of the subjects. To overcome these problems, a novel deep recurrent neural network (RNN) framework is developed for automated feature extraction and detection of apnea events from single respiratory channel inputs. Long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) are investigated to develop the proposed deep RNN model. The proposed framework is evaluated over three respiration signals: Oronasal thermal airflow (FlowTh), nasal pressure (NPRE), and abdominal respiratory inductance plethysmography (ABD). To demonstrate our results, we use polysomnography (PSG) data of 17 patients with obstructive, central, and mixed apnea events. Our results indicate the effectiveness of the proposed framework in automatic extraction for temporal features and automated detection of apneic events over the different respiratory signals considered in this study. Using a deep BiLSTM-based detection model, the NPRE signal achieved the highest overall detection results with true positive rate (sensitivity) = 90.3%, true negative rate (specificity) = 83.7%, and area under receiver operator characteristic curve = 92.4%. The present results contribute a new deep learning approach for automated detection of sleep apnea events from single channel respiration signals that can potentially serve as a helpful and alternative tool for the traditional PSG method.


2016 ◽  
Author(s):  
Milan Flach ◽  
Fabian Gans ◽  
Alexander Brenning ◽  
Joachim Denzler ◽  
Markus Reichstein ◽  
...  

Abstract. Today, many processes at the Earth's surface are constantly monitored by multiple data streams. These observations have become central to advance our understanding of e.g. vegetation dynamics in response to climate or land use change. Another set of important applications is monitoring effects of climatic extreme events, other disturbances such as fires, or abrupt land transitions. One important methodological question is how to reliably detect anomalies in an automated and generic way within multivariate data streams, which typically vary seasonally and are interconnected across variables. Although many algorithms have been proposed for detecting anomalies in multivariate data, only few have been investigated in the context of Earth system science applications. In this study, we systematically combine and compare feature extraction and anomaly detection algorithms for detecting anomalous events. Our aim is to identify suitable workflows for automatically detecting anomalous patterns in multivariate Earth system data streams. We rely on artificial data that mimic typical properties and anomalies in multivariate spatiotemporal Earth observations. This artificial experiment is needed as there is no 'gold standard' for the identification of anomalies in real Earth observations. Our results show that a well chosen feature extraction step (e.g. subtracting seasonal cycles, or dimensionality reduction) is more important than the choice of a particular anomaly detection algorithm. Nevertheless, we identify 3 detection algorithms (k-nearest neighbours mean distance, kernel density estimation, a recurrence approach) and their combinations (ensembles) that outperform other multivariate approaches as well as univariate extreme event detection methods. Our results therefore provide an effective workflow to automatically detect anomalies in Earth system science data.


Processes ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 122
Author(s):  
Yang Li ◽  
Fangyuan Ma ◽  
Cheng Ji ◽  
Jingde Wang ◽  
Wei Sun

Feature extraction plays a key role in fault detection methods. Most existing methods focus on comprehensive and accurate feature extraction of normal operation data to achieve better detection performance. However, discriminative features based on historical fault data are usually ignored. Aiming at this point, a global-local marginal discriminant preserving projection (GLMDPP) method is proposed for feature extraction. Considering its comprehensive consideration of global and local features, global-local preserving projection (GLPP) is used to extract the inherent feature of the data. Then, multiple marginal fisher analysis (MMFA) is introduced to extract the discriminative feature, which can better separate normal data from fault data. On the basis of fisher framework, GLPP and MMFA are integrated to extract inherent and discriminative features of the data simultaneously. Furthermore, fault detection methods based on GLMDPP are constructed and applied to the Tennessee Eastman (TE) process. Compared with the PCA and GLPP method, the effectiveness of the proposed method in fault detection is validated with the result of TE process.


Entropy ◽  
2020 ◽  
Vol 22 (2) ◽  
pp. 249
Author(s):  
Weiguo Zhang ◽  
Chenggang Zhao ◽  
Yuxing Li

The quality and efficiency of generating face-swap images have been markedly strengthened by deep learning. For instance, the face-swap manipulations by DeepFake are so real that it is tricky to distinguish authenticity through automatic or manual detection. To augment the efficiency of distinguishing face-swap images generated by DeepFake from real facial ones, a novel counterfeit feature extraction technique was developed based on deep learning and error level analysis (ELA). It is related to entropy and information theory such as cross-entropy loss function in the final softmax layer. The DeepFake algorithm is only able to generate limited resolutions. Therefore, this algorithm results in two different image compression ratios between the fake face area as the foreground and the original area as the background, which would leave distinctive counterfeit traces. Through the ELA method, we can detect whether there are different image compression ratios. Convolution neural network (CNN), one of the representative technologies of deep learning, can extract the counterfeit feature and detect whether images are fake. Experiments show that the training efficiency of the CNN model can be significantly improved by the ELA method. In addition, the proposed technique can accurately extract the counterfeit feature, and therefore achieves outperformance in simplicity and efficiency compared with direct detection methods. Specifically, without loss of accuracy, the amount of computation can be significantly reduced (where the required floating-point computing power is reduced by more than 90%).


Sign in / Sign up

Export Citation Format

Share Document