Open set recognition algorithm based on Conditional Gaussian Encoder

<abstract> <p>For the existing Closed Set Recognition (CSR) methods mistakenly identify unknown jamming signals as a known class, a Conditional Gaussian Encoder (CG-Encoder) for 1-dimensional signal Open Set Recognition (OSR) is designed. The network retains the original form of the signal as much as possible and deep neural network is used to extract useful information. CG-Encoder adopts residual network structure and a new Kullback-Leibler (KL) divergence is defined. In the training phase, the known classes are approximated to different Gaussian distributions in the latent space and the discrimination between classes is increased to improve the recognition performance of the known classes. In the testing phase, a specific and effective OSR algorithm flow is designed. Simulation experiments are carried out on 9 jamming types. The results show that the CSR and OSR performance of CG-Encoder is better than that of the other three kinds of network structures. When the openness is the maximum, the open set average accuracy of CG-Encoder is more than 70%, which is about 30% higher than the worst algorithm, and about 20% higher than the better one. When the openness is the minimum, the average accuracy of OSR is more than 95%.</p> </abstract>

Download Full-text

Recognition of Synthetic Speech by Hearing-Impaired Elderly Listeners

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3405.1180 ◽

1991 ◽

Vol 34 (5) ◽

pp. 1180-1184 ◽

Cited By ~ 27

Author(s):

Larry E. Humes ◽

Kathleen J. Nelson ◽

David B. Pisoni

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Recognition Performance ◽

The Elderly ◽

Hearing Impaired ◽

The Other ◽

Natural Speech ◽

Synthetic Speech ◽

Closed Set ◽

Open Set

The Modified Rhyme Test (MRT), recorded using natural speech and two forms of synthetic speech, DECtalk and Votrax, was used to measure both open-set and closed-set speech-recognition performance. Performance of hearing-impaired elderly listeners was compared to two groups of young normal-hearing adults, one listening in quiet, and the other listening in a background of spectrally shaped noise designed to simulate the peripheral hearing loss of the elderly. Votrax synthetic speech yielded significant decrements in speech recognition compared to either natural or DECtalk synthetic speech for all three subject groups. There were no differences in performance between natural speech and DECtalk speech for the elderly hearing-impaired listeners or the young listeners with simulated hearing loss. The normal-hearing young adults listening in quiet out-performed both of the other groups, but there were no differences in performance between the young listeners with simulated hearing loss and the elderly hearing-impaired listeners. When the closed-set identification of synthetic speech was compared to its open-set recognition, the hearing-impaired elderly gained as much from the reduction in stimulus/response uncertainty as the two younger groups. Finally, among the elderly hearing-impaired listeners, speech-recognition performance was correlated negatively with hearing sensitivity, but scores were correlated positively among the different talker conditions. Those listeners with the greatest hearing loss had the most difficulty understanding speech and those having the most trouble understanding natural speech also had the greatest difficulty with synthetic speech.

Download Full-text

A Picture Identification Task as an Estimate of the Word-Recognition Performance of Nonverbal Adults

Journal of Speech and Hearing Disorders ◽

10.1044/jshd.4502.223 ◽

1980 ◽

Vol 45 (2) ◽

pp. 223-238 ◽

Cited By ~ 13

Author(s):

Richard H. Wilson ◽

June K. Antablin

Keyword(s):

Word Recognition ◽

Recognition Performance ◽

Test Word ◽

Stimulus Presentation ◽

Identification Task ◽

Closed Set ◽

Open Set ◽

Northwestern University ◽

Monosyllabic Words ◽

Picture Identification

The Picture Identification Task was developed to estimate the word-recognition performance of nonverbal adults. Four lists of 50 monosyllabic words each were assembled and recorded. Each test word and three rhyming alternatives were illustrated and photographed in a quadrant arrangement. The task of the patient was to point to the picture representing the recorded word that was presented through the earphone. In the first experiment with young adults, no significant differences were found between the Picture Identification Task and the Northwestern University Auditory Test No. 6 materials in an open-set response paradigm. In the second experiment, the Picture Identification Task with the picture-pointing response was compared with the Northwestern University Auditory Test No. 6 in both an open-set and a closed-set response paradigm. The results from this experiment demonstrated significant differences among the three response tasks. The easiest task was a closed-set response to words, the next was a closed-set response to pictures, and the most difficult task was an open-set response. At high stimulus-presentation levels, however, the three tasks produced similar results. Finally, the clinical use of the Picture Identification Task is described along with preliminary results obtained from 30 patients with various communicative impairments.

Download Full-text

Multi-modal Open World User Identification

ACM Transactions on Human-Robot Interaction ◽

10.1145/3477963 ◽

2022 ◽

Vol 11 (1) ◽

pp. 1-50

Author(s):

Bahar Irfan ◽

Michael Garcia Ortiz ◽

Natalia Lyubova ◽

Tony Belpaeme

Keyword(s):

Face Recognition ◽

Recognition Performance ◽

Human Robot Interaction ◽

User Identification ◽

Closed Set ◽

Identification Rate ◽

Open World ◽

Open Set ◽

User Recognition

User identification is an essential step in creating a personalised long-term interaction with robots. This requires learning the users continuously and incrementally, possibly starting from a state without any known user. In this article, we describe a multi-modal incremental Bayesian network with online learning, which is the first method that can be applied in such scenarios. Face recognition is used as the primary biometric, and it is combined with ancillary information, such as gender, age, height, and time of interaction to improve the recognition. The Multi-modal Long-term User Recognition Dataset is generated to simulate various human-robot interaction (HRI) scenarios and evaluate our approach in comparison to face recognition, soft biometrics, and a state-of-the-art open world recognition method (Extreme Value Machine). The results show that the proposed methods significantly outperform the baselines, with an increase in the identification rate up to 47.9% in open-set and closed-set scenarios, and a significant decrease in long-term recognition performance loss. The proposed models generalise well to new users, provide stability, improve over time, and decrease the bias of face recognition. The models were applied in HRI studies for user recognition, personalised rehabilitation, and customer-oriented service, which showed that they are suitable for long-term HRI in the real world.

Download Full-text

Open Set Audio Classification Using Autoencoders Trained on Few Data

Sensors ◽

10.3390/s20133741 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3741 ◽

Cited By ~ 1

Author(s):

Javier Naranjo-Alcazar ◽

Sergi Perez-Castanos ◽

Pedro Zuccarello ◽

Fabio Antonacci ◽

Maximo Cobos

Keyword(s):

Recognition System ◽

Superior Performance ◽

Learning Problem ◽

Open Set ◽

Latent Space ◽

Audio Data ◽

Baseline System ◽

Few Data ◽

High Level ◽

Open Set Recognition

Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training. It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes). Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of positive samples for training a recognition system. Taking these two limitations into account, a new dataset for OSR and FSL for audio data was recently released to promote research on solutions aimed at addressing both limitations. This paper proposes an audio OSR/FSL system divided into three steps: a high-level audio representation, feature embedding using two different autoencoder architectures and a multi-layer perceptron (MLP) trained on latent space representations to detect known classes and reject unwanted ones. An extensive set of experiments is carried out considering multiple combinations of openness factors (OSR condition) and number of shots (FSL condition), showing the validity of the proposed approach and confirming superior performance with respect to a baseline system based on transfer learning.

Download Full-text

Cochlear Implants in Children, Adolescents, and Prelinguistically Deafened Adults

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3502.401 ◽

1992 ◽

Vol 35 (2) ◽

pp. 401-417 ◽

Cited By ~ 68

Author(s):

Pam W. Dawson ◽

Peter J. Blamey ◽

Louise C. Rowland ◽

Shani J. Dettman ◽

Graeme M. Clark ◽

...

Keyword(s):

Cochlear Implant ◽

Cochlear Implants ◽

Hearing Aids ◽

Age At Onset ◽

Closed Set ◽

Open Set ◽

Profound Deafness ◽

Monosyllabic Words ◽

Open Set Recognition ◽

The University

A group of 10 children, adolescents, and prelinguistically deafened adults were implanted with the 22-electrode cochlear implant (Cochlear Pty Ltd) at the University of Melbourne Cochlear Implant Clinic and have used the prosthesis for periods from 12 to 65 months. Postoperative performance on the majority of closed-set speech perception tests was significantly greater than chance, and significantly better than preoperative performance for all of the patients. Five of the children have achieved substantial scores on open-set speech tests using hearing without lipreading. Phoneme scores in monosyllabic words ranged from 30% to 72%; word scores in sentences ranged from 26% to 74%. Four of these 5 children were implanted during preadolescence (aged 5:5 to 10:2 years) and the fifth, who had a progressive loss, was implanted during adolescence (aged 14:8 years). The duration of profound deafness before implantation varied from 2 to 8 years. Improvements were also noted over postoperative data collection times for the younger children. The remaining 5 patients who did not demonstrate open-set recognition were implanted after a longer duration of profound deafness (aged 13:11 to 20:1 years). The results are discussed with reference to variables that may affect implant performance, such as age at onset of loss, duration of profound loss, age at implantation, and duration of implantation. They are compared with results for similar groups of children using hearing aids and cochlear implants.

Download Full-text

ON A NEW CLASS OF SEMI GENERALIZED CLOSED SETS IN STRONG GENERALIZED TOPOLOGICAL SPACES

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.11.41 ◽

2020 ◽

Vol 9 (11) ◽

pp. 9353-9360

Author(s):

G. Selvi ◽

I. Rajasekaran

Keyword(s):

Topological Spaces ◽

Closed Set ◽

Basic Properties ◽

Closed Sets ◽

New Class ◽

Open Set ◽

Continuous Maps

This paper deals with the concepts of semi generalized closed sets in strong generalized topological spaces such as $sg^{\star \star}_\mu$-closed set, $sg^{\star \star}_\mu$-open set, $g^{\star \star}_\mu$-closed set, $g^{\star \star}_\mu$-open set and studied some of its basic properties included with $sg^{\star \star}_\mu$-continuous maps, $sg^{\star \star}_\mu$-irresolute maps and $T_\frac{1}{2}$-space in strong generalized topological spaces.

Download Full-text

CorrNet: Fine-Grained Emotion Recognition for Video Watching Using Wearable Physiological Sensors

Sensors ◽

10.3390/s21010052 ◽

2020 ◽

Vol 21 (1) ◽

pp. 52

Author(s):

Tianyi Zhang ◽

Abdallah El Ali ◽

Chen Wang ◽

Alan Hanjalic ◽

Pablo Cesar

Keyword(s):

Emotion Recognition ◽

Electrodermal Activity ◽

Short Form ◽

Recognition Performance ◽

Binary Classification ◽

Wearable Sensors ◽

Recognition Algorithm ◽

Fine Grained ◽

Physiological Sensors ◽

Valence And Arousal

Recognizing user emotions while they watch short-form videos anytime and anywhere is essential for facilitating video content customization and personalization. However, most works either classify a single emotion per video stimuli, or are restricted to static, desktop environments. To address this, we propose a correlation-based emotion recognition algorithm (CorrNet) to recognize the valence and arousal (V-A) of each instance (fine-grained segment of signals) using only wearable, physiological signals (e.g., electrodermal activity, heart rate). CorrNet takes advantage of features both inside each instance (intra-modality features) and between different instances for the same video stimuli (correlation-based features). We first test our approach on an indoor-desktop affect dataset (CASE), and thereafter on an outdoor-mobile affect dataset (MERCA) which we collected using a smart wristband and wearable eyetracker. Results show that for subject-independent binary classification (high-low), CorrNet yields promising recognition accuracies: 76.37% and 74.03% for V-A on CASE, and 70.29% and 68.15% for V-A on MERCA. Our findings show: (1) instance segment lengths between 1–4 s result in highest recognition accuracies (2) accuracies between laboratory-grade and wearable sensors are comparable, even under low sampling rates (≤64 Hz) (3) large amounts of neutral V-A labels, an artifact of continuous affect annotation, result in varied recognition performance.

Download Full-text

Discovery of novel chemical reactions by deep generative recurrent neural network

Scientific Reports ◽

10.1038/s41598-021-81889-y ◽

2021 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

William Bort ◽

Igor I. Baskin ◽

Timur Gimadiev ◽

Artem Mukanov ◽

Ramil Nugmanov ◽

...

Keyword(s):

Chemical Reactions ◽

Short Term Memory ◽

De Novo ◽

Molecular Structures ◽

Topographic Map ◽

Short Term ◽

Suzuki Reactions ◽

Class A ◽

Latent Space ◽

Long Short Term Memory

AbstractThe “creativity” of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that “creative” AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed “SMILES/CGR” strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.

Download Full-text

Synthetic speech detection through short-term and long-term prediction traces

EURASIP Journal on Information Security ◽

10.1186/s13635-021-00116-3 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Clara Borrelli ◽

Paolo Bestagini ◽

Fabio Antonacci ◽

Augusto Sarti ◽

Stefano Tubaro

Keyword(s):

Deep Learning ◽

Speech Processing ◽

Synthetic Speech ◽

Opinion Formation ◽

Closed Set ◽

Speech Detection ◽

Open Set ◽

Technological Advances ◽

Speech Generation ◽

Long Term Prediction

AbstractSeveral methods for synthetic audio speech generation have been developed in the literature through the years. With the great technological advances brought by deep learning, many novel synthetic speech techniques achieving incredible realistic results have been recently proposed. As these methods generate convincing fake human voices, they can be used in a malicious way to negatively impact on today’s society (e.g., people impersonation, fake news spreading, opinion formation). For this reason, the ability of detecting whether a speech recording is synthetic or pristine is becoming an urgent necessity. In this work, we develop a synthetic speech detector. This takes as input an audio recording, extracts a series of hand-crafted features motivated by the speech-processing literature, and classify them in either closed-set or open-set. The proposed detector is validated on a publicly available dataset consisting of 17 synthetic speech generation algorithms ranging from old fashioned vocoders to modern deep learning solutions. Results show that the proposed method outperforms recently proposed detectors in the forensics literature.

Download Full-text

Mosquito species identification using convolutional neural networks with a multitiered ensemble model for novel species detection

Scientific Reports ◽

10.1038/s41598-021-92891-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Adam Goodwin ◽

Sanket Padmanabhan ◽

Sanchit Hira ◽

Margaret Glancey ◽

Monet Slinowsky ◽

...

Keyword(s):

Neural Networks ◽

Species Identification ◽

Convolutional Neural Networks ◽

Novel Species ◽

Novelty Detection ◽

Mosquito Species ◽

Image Database ◽

Closed Set ◽

Open Set

AbstractWith over 3500 mosquito species described, accurate species identification of the few implicated in disease transmission is critical to mosquito borne disease mitigation. Yet this task is hindered by limited global taxonomic expertise and specimen damage consistent across common capture methods. Convolutional neural networks (CNNs) are promising with limited sets of species, but image database requirements restrict practical implementation. Using an image database of 2696 specimens from 67 mosquito species, we address the practical open-set problem with a detection algorithm for novel species. Closed-set classification of 16 known species achieved 97.04 ± 0.87% accuracy independently, and 89.07 ± 5.58% when cascaded with novelty detection. Closed-set classification of 39 species produces a macro F1-score of 86.07 ± 1.81%. This demonstrates an accurate, scalable, and practical computer vision solution to identify wild-caught mosquitoes for implementation in biosurveillance and targeted vector control programs, without the need for extensive image database development for each new target region.

Download Full-text