Evaluation of Feature Learning Methods for Voice Disorder Detection

2019 ◽  
Vol 13 (04) ◽  
pp. 453-470
Author(s):  
Hongzhao Guan ◽  
Alexander Lerch

Voice disorder is a frequently encountered health issue. Many people, however, either cannot afford to visit a professional doctor or neglect to take good care of their voice. In order to give a patient a preliminary diagnosis without using professional medical devices, previous research has shown that the detection of voice disorders can be carried out by utilizing machine learning and acoustic features extracted from voice recordings. Considering the increasing popularity of deep learning, feature learning and transfer learning, this study explores the possibilities of using these methods to assign voice recordings into one of two classes—Normal and Pathological. While the results show the general viability of deep learning and feature learning for the automatic recognition of voice disorders, they also lead to discussions on how to choose a pre-trained model when using transfer learning for this task. Furthermore, the results demonstrate the shortcomings of the existing datasets for voice disorder detection such as insufficient dataset size and lack of generality.

2020 ◽  
Vol 2020 ◽  
pp. 1-16
Author(s):  
Hao Zhang ◽  
Qiang Zhang ◽  
Siyu Shao ◽  
Tianlin Niu ◽  
Xinyu Yang ◽  
...  

Deep learning has a strong feature learning ability, which has proved its effectiveness in fault prediction and remaining useful life prediction of rotatory machine. However, training a deep network from scratch requires a large amount of training data and is time-consuming. In the practical model training process, it is difficult for the deep model to converge when the parameter initialization is inappropriate, which results in poor prediction performance. In this paper, a novel deep learning framework is proposed to predict the remaining useful life of rotatory machine with high accuracy. Firstly, model parameters and feature learning ability of the pretrained model are transferred to the new network by means of transfer learning to achieve reasonable initialization. Then, the specific sensor signals are converted to RGB image as the specific task data to fine-tune the parameters of the high-level network structure. The features extracted from the pretrained network are the input into the Bidirectional Long Short-Term Memory to obtain the RUL prediction results. The ability of LSTM to model sequence signals and the dynamic learning ability of bidirectional propagation to time information contribute to accurate RUL prediction. Finally, the deep model proposed in this paper is tested on the sensor signal dataset of bearing and gearbox. The high accuracy prediction results show the superiority of the transfer learning-based sequential network in RUL prediction.


2022 ◽  
Vol 22 (1) ◽  
pp. 1-16
Author(s):  
Laura Verde ◽  
Nadia Brancati ◽  
Giuseppe De Pietro ◽  
Maria Frucci ◽  
Giovanna Sannino

Edge Analytics and Artificial Intelligence are important features of the current smart connected living community. In a society where people, homes, cities, and workplaces are simultaneously connected through various devices, primarily through mobile devices, a considerable amount of data is exchanged, and the processing and storage of these data are laborious and difficult tasks. Edge Analytics allows the collection and analysis of such data on mobile devices, such as smartphones and tablets, without involving any cloud-centred architecture that cannot guarantee real-time responsiveness. Meanwhile, Artificial Intelligence techniques can constitute a valid instrument to process data, limiting the computation time, and optimising decisional processes and predictions in several sectors, such as healthcare. Within this field, in this article, an approach able to evaluate the voice quality condition is proposed. A fully automatic algorithm, based on Deep Learning, classifies a voice as healthy or pathological by analysing spectrogram images extracted by means of the recording of vowel /a/, in compliance with the traditional medical protocol. A light Convolutional Neural Network is embedded in a mobile health application in order to provide an instrument capable of assessing voice disorders in a fast, easy, and portable way. Thus, a straightforward mobile device becomes a screening tool useful for the early diagnosis, monitoring, and treatment of voice disorders. The proposed approach has been tested on a broad set of voice samples, not limited to the most common voice diseases but including all the pathologies present in three different databases achieving F1-scores, over the testing set, equal to 80%, 90%, and 73%. Although the proposed network consists of a reduced number of layers, the results are very competitive compared to those of other “cutting edge” approaches constructed using more complex neural networks, and compared to the classic deep neural networks, for example, VGG-16 and ResNet-50.


Diagnostics ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1972
Author(s):  
Abul Bashar ◽  
Ghazanfar Latif ◽  
Ghassen Ben Brahim ◽  
Nazeeruddin Mohammad ◽  
Jaafar Alghazo

It became apparent that mankind has to learn to live with and adapt to COVID-19, especially because the developed vaccines thus far do not prevent the infection but rather just reduce the severity of the symptoms. The manual classification and diagnosis of COVID-19 pneumonia requires specialized personnel and is time consuming and very costly. On the other hand, automatic diagnosis would allow for real-time diagnosis without human intervention resulting in reduced costs. Therefore, the objective of this research is to propose a novel optimized Deep Learning (DL) approach for the automatic classification and diagnosis of COVID-19 pneumonia using X-ray images. For this purpose, a publicly available dataset of chest X-rays on Kaggle was used in this study. The dataset was developed over three stages in a quest to have a unified COVID-19 entities dataset available for researchers. The dataset consists of 21,165 anterior-to-posterior and posterior-to-anterior chest X-ray images classified as: Normal (48%), COVID-19 (17%), Lung Opacity (28%) and Viral Pneumonia (6%). Data Augmentation was also applied to increase the dataset size to enhance the reliability of results by preventing overfitting. An optimized DL approach is implemented in which chest X-ray images go through a three-stage process. Image Enhancement is performed in the first stage, followed by Data Augmentation stage and in the final stage the results are fed to the Transfer Learning algorithms (AlexNet, GoogleNet, VGG16, VGG19, and DenseNet) where the images are classified and diagnosed. Extensive experiments were performed under various scenarios, which led to achieving the highest classification accuracy of 95.63% through the application of VGG16 transfer learning algorithm on the augmented enhanced dataset with freeze weights. This accuracy was found to be better as compared to the results reported by other methods in the recent literature. Thus, the proposed approach proved superior in performance as compared with that of other similar approaches in the extant literature, and it made a valuable contribution to the body of knowledge. Although the results achieved so far are promising, further work is planned to correlate the results of the proposed approach with clinical observations to further enhance the efficiency and accuracy of COVID-19 diagnosis.


2017 ◽  
Vol 2 (3) ◽  
pp. 49-56
Author(s):  
Jana Childes ◽  
Alissa Acker ◽  
Dana Collins

Pediatric voice disorders are typically a low-incidence population in the average caseload of clinicians working within school and general clinic settings. This occurs despite evidence of a fairly high prevalence of childhood voice disorders and the multiple impacts the voice disorder may have on a child's social development, the perception of the child by others, and the child's academic success. There are multiple barriers that affect the identification of children with abnormal vocal qualities and their access to services. These include: the reliance on school personnel, the ability of parents and caretakers to identify abnormal vocal qualities and signs of misuse, the access to specialized medical services for appropriate diagnosis, and treatment planning and issues related to the Speech-Language Pathologists' perception of their skills and competence regarding voice management for pediatric populations. These barriers and possible solutions to them are discussed with perspectives from the school, clinic and university settings.


2020 ◽  
Author(s):  
Pathikkumar Patel ◽  
Bhargav Lad ◽  
Jinan Fiaidhi

During the last few years, RNN models have been extensively used and they have proven to be better for sequence and text data. RNNs have achieved state-of-the-art performance levels in several applications such as text classification, sequence to sequence modelling and time series forecasting. In this article we will review different Machine Learning and Deep Learning based approaches for text data and look at the results obtained from these methods. This work also explores the use of transfer learning in NLP and how it affects the performance of models on a specific application of sentiment analysis.


Sign in / Sign up

Export Citation Format

Share Document