scholarly journals Enrichment of Oesophageal Speech: Voice Conversion with Duration–Matched Synthetic Speech as Target

2021 ◽  
Vol 11 (13) ◽  
pp. 5940
Author(s):  
Sneha Raman ◽  
Xabier Sarasola ◽  
Eva Navas ◽  
Inma Hernaez

Pathological speech such as Oesophageal Speech (OS) is difficult to understand due to the presence of undesired artefacts and lack of normal healthy speech characteristics. Modern speech technologies and machine learning enable us to transform pathological speech to improve intelligibility and quality. We have used a neural network based voice conversion method with the aim of improving the intelligibility and reducing the listening effort (LE) of four OS speakers of varying speaking proficiency. The novelty of this method is the use of synthetic speech matched in duration with the source OS as the target, instead of parallel aligned healthy speech. We evaluated the converted samples from this system using a collection of Automatic Speech Recognition systems (ASR), an objective intelligibility metric (STOI) and a subjective test. ASR evaluation shows that the proposed system had significantly better word recognition accuracy compared to unprocessed OS, and baseline systems which used aligned healthy speech as the target. There was an improvement of at least 15% on STOI scores indicating a higher intelligibility for the proposed system compared to unprocessed OS, and a higher target similarity in the proposed system compared to baseline systems. The subjective test reveals a significant preference for the proposed system compared to unprocessed OS for all OS speakers, except one who was the least proficient OS speaker in the data set.

2021 ◽  
pp. 1-13
Author(s):  
Shikhar Tyagi ◽  
Bhavya Chawla ◽  
Rupav Jain ◽  
Smriti Srivastava

Single biometric modalities like facial features and vein patterns despite being reliable characteristics show limitations that restrict them from offering high performance and robustness. Multimodal biometric systems have gained interest due to their ability to overcome the inherent limitations of the underlying single biometric modalities and generally have been shown to improve the overall performance for identification and recognition purposes. This paper proposes highly accurate and robust multimodal biometric identification as well as recognition systems based on fusion of face and finger vein modalities. The feature extraction for both face and finger vein is carried out by exploiting deep convolutional neural networks. The fusion process involves combining the extracted relevant features from the two modalities at score level. The experimental results over all considered public databases show a significant improvement in terms of identification and recognition accuracy as well as equal error rates.


2021 ◽  
Vol 25 ◽  
pp. 233121652110093
Author(s):  
Patrycja Książek ◽  
Adriana A. Zekveld ◽  
Dorothea Wendt ◽  
Lorenz Fiedler ◽  
Thomas Lunner ◽  
...  

In hearing research, pupillometry is an established method of studying listening effort. The focus of this study was to evaluate several pupil measures extracted from the Task-Evoked Pupil Responses (TEPRs) in speech-in-noise test. A range of analysis approaches was applied to extract these pupil measures, namely (a) pupil peak dilation (PPD); (b) mean pupil dilation (MPD); (c) index of pupillary activity; (d) growth curve analysis (GCA); and (e) principal component analysis (PCA). The effect of signal-to-noise ratio (SNR; Data Set A: –20 dB, –10 dB, +5 dB SNR) and luminance (Data Set B: 0.1 cd/m2, 360 cd/m2) on the TEPRs were investigated. Data Sets A and B were recorded during a speech-in-noise test and included TEPRs from 33 and 27 normal-hearing native Dutch speakers, respectively. The main results were as follows: (a) A significant effect of SNR was revealed for all pupil measures extracted in the time domain (PPD, MPD, GCA, PCA); (b) Two time series analysis approaches (GCA, PCA) provided modeled temporal profiles of TEPRs (GCA); and time windows spanning subtasks performed in a speech-in-noise test (PCA); and (c) All pupil measures revealed a significant effect of luminance. In conclusion, multiple pupil measures showed similar effects of SNR, suggesting that effort may be reflected in multiple aspects of TEPR. Moreover, a direct analysis of the pupil time course seems to provide a more holistic view of TEPRs, yet further research is needed to understand and interpret its measures. Further research is also required to find pupil measures less sensitive to changes in luminance.


2021 ◽  
Vol 39 (1B) ◽  
pp. 1-10
Author(s):  
Iman H. Hadi ◽  
Alia K. Abdul-Hassan

Speaker recognition depends on specific predefined steps. The most important steps are feature extraction and features matching. In addition, the category of the speaker voice features has an impact on the recognition process. The proposed speaker recognition makes use of biometric (voice) attributes to recognize the identity of the speaker. The long-term features were used such that maximum frequency, pitch and zero crossing rate (ZCR).  In features matching step, the fuzzy inner product was used between feature vectors to compute the matching value between a claimed speaker voice utterance and test voice utterances. The experiments implemented using (ELSDSR) data set. These experiments showed that the recognition accuracy is 100% when using text dependent speaker recognition.


Geofluids ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Dongsheng Wang ◽  
Jun Feng ◽  
Xinpeng Zhao ◽  
Yeping Bai ◽  
Yujie Wang ◽  
...  

It is difficult to form a method for recognizing the degree of infiltration of a tunnel lining. To solve this problem, we propose a recognition method by using a deep convolutional neural network. We carry out laboratory tests, prepare cement mortar specimens with different saturation levels, simulate different degrees of infiltration of tunnel concrete linings, and establish an infrared thermal image data set with different degrees of infiltration. Then, based on a deep learning method, the data set is trained using the Faster R-CNN+ResNet101 network, and a recognition model is established. The experiments show that the recognition model established by the deep learning method can be used to select cement mortar specimens with different degrees of infiltration by using an accurately minimized rectangular outer frame. This model shows that the classification recognition model for tunnel concrete lining infiltration established by the indoor experimental method has high recognition accuracy.


2020 ◽  
Vol 7 ◽  
Author(s):  
James Garforth ◽  
Barbara Webb

Forests present one of the most challenging environments for computer vision due to traits, such as complex texture, rapidly changing lighting, and high dynamicity. Loop closure by place recognition is a crucial part of successfully deploying robotic systems to map forests for the purpose of automating conservation. Modern CNN-based place recognition systems like NetVLAD have reported promising results, but the datasets used to train and test them are primarily of urban scenes. In this paper, we investigate how well NetVLAD generalizes to forest environments and find that it out performs state of the art loop closure approaches. Finally, integrating NetVLAD with ORBSLAM2 and evaluating on a novel forest data set, we find that, although suitable locations for loop closure can be identified, the SLAM system is unable to resolve matched places with feature correspondences. We discuss additional considerations to be addressed in future to deal with this challenging problem.


Author(s):  
F. ROLI

Recently, a kind of structured neural networks (SNNs) explicitly devoted to multisensor image recognition and aimed at allowing the interpretation of the "network behavior" was presented in Ref. 1. Experiments reported in Ref. 1 pointed out that SNNs provide a trade-off between recognition accuracy and interpretation of the network behavior. In this paper, the combination of multiple SNNs, each of which has been trained on the same data set, is proposed as a means to improve recognition results, while keeping the possibility of interpreting the network behavior. A simple method for interpreting the "collective behaviors" of such SNN ensembles is described. Such an interpretation method can be used to understand the different kinds of "solutions" learned by the SNNs belonging to an ensemble. In addition, as compared with the interpretation method presented in Ref. 1, it is shown that the knowledge embodied in an SNN can be translated into a set of understandable "recognition rules". Experimental results on the recognition of multisensor remote-sensing images (optical and radar images) are reported in terms of both recognition accuracy and network-behavior interpretation. An additional experiment on a multisource remote-sensing data set is described to show that SNNs can also be effectively used for multisource recognition tasks.


Author(s):  
B. H. Shekar ◽  
S. S. Bhat ◽  
A. Maysuradze

<p><strong>Abstract.</strong> Iris code matching is an important stage of iris biometric systems which compares the input iris code with stored patterns of enrolled iris codes and classifies the code into one of classes so that, the claim is accepted or rejected. Several classifier based approaches are proposed by the researchers to improve the recognition accuracy. In this paper, we discuss the factors affecting an iris classifier’s performance and we propose a reliability index for iris matching techniques to quantitatively measure the extent of system reliability, based on false acceptance rate and false rejection rates using Monte Carlo Simulation. Experiments are carried out on benchmark databases such as, IITD, MMU v-2, CASIA v-4 Distance and UBIRIS v.2.</p>


2021 ◽  
Vol 15 ◽  
Author(s):  
Lixing Huang ◽  
Jietao Diao ◽  
Hongshan Nie ◽  
Wei Wang ◽  
Zhiwei Li ◽  
...  

The memristor-based convolutional neural network (CNN) gives full play to the advantages of memristive devices, such as low power consumption, high integration density, and strong network recognition capability. Consequently, it is very suitable for building a wearable embedded application system and has broad application prospects in image classification, speech recognition, and other fields. However, limited by the manufacturing process of memristive devices, high-precision weight devices are currently difficult to be applied in large-scale. In the same time, high-precision neuron activation function also further increases the complexity of network hardware implementation. In response to this, this paper proposes a configurable full-binary convolutional neural network (CFB-CNN) architecture, whose inputs, weights, and neurons are all binary values. The neurons are proportionally configured to two modes for different non-ideal situations. The architecture performance is verified based on the MNIST data set, and the influence of device yield and resistance fluctuations under different neuron configurations on network performance is also analyzed. The results show that the recognition accuracy of the 2-layer network is about 98.2%. When the yield rate is about 64% and the hidden neuron mode is configured as −1 and +1, namely ±1 MD, the CFB-CNN architecture achieves about 91.28% recognition accuracy. Whereas the resistance variation is about 26% and the hidden neuron mode configuration is 0 and 1, namely 01 MD, the CFB-CNN architecture gains about 93.43% recognition accuracy. Furthermore, memristors have been demonstrated as one of the most promising devices in neuromorphic computing for its synaptic plasticity. Therefore, the CFB-CNN architecture based on memristor is SNN-compatible, which is verified using the number of pulses to encode pixel values in this paper.


2018 ◽  
Vol 119 (9/10) ◽  
pp. 529-544 ◽  
Author(s):  
Ihab Zaqout ◽  
Mones Al-Hanjori

Purpose The face recognition problem has a long history and a significant practical perspective and one of the practical applications of the theory of pattern recognition, to automatically localize the face in the image and, if necessary, identify the person in the face. Interests in the procedures underlying the process of localization and individual’s recognition are quite significant in connection with the variety of their practical application in such areas as security systems, verification, forensic expertise, teleconferences, computer games, etc. This paper aims to recognize facial images efficiently. An averaged-feature based technique is proposed to reduce the dimensions of the multi-expression facial features. The classifier model is generated using a supervised learning algorithm called a back-propagation neural network (BPNN), implemented on a MatLab R2017. The recognition rate and accuracy of the proposed methodology is comparable with other methods such as the principle component analysis and linear discriminant analysis with the same data set. In total, 150 faces subjects are selected from the Olivetti Research Laboratory (ORL) data set, resulting 95.6 and 85 per cent recognition rate and accuracy, respectively, and 165 faces subjects from the Yale data set, resulting 95.5 and 84.4 per cent recognition rate and accuracy, respectively. Design/methodology/approach Averaged-feature based approach (dimension reduction) and BPNN (generate supervised classifier). Findings The recognition rate is 95.6 per cent and recognition accuracy is 85 per cent for the ORL data set, whereas the recognition rate is 95.5 per cent and recognition accuracy is 84.4 per cent for the Yale data set. Originality/value Averaged-feature based method.


Sign in / Sign up

Export Citation Format

Share Document