BACKGROUND
Dysphonia influences the quality of life by interfering with communication. However, laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis.
OBJECTIVE
This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence.
METHODS
We collected 29 normal voice samples and 527 samples of individuals with voice disorders, including vocal atrophy (n=210), unilateral vocal paralysis (n=43), organic vocal fold lesions (n=244), and adductor spasmodic dysphonia (n=30). The 556 samples were divided into two sets: 440 samples as the training set and 116 samples as the testing set. A convolutional neural network approach was applied to train the model and findings were compared with human specialists.
RESULTS
The convolutional neural network model achieved a sensitivity of 0.70, a specificity of 0.90, and an overall accuracy of 65.5% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared to human specialists, the overall accuracy was 58.6% and 49.1% for the two laryngologists, and 38.8% and 34.5% for the two general ear, nose, and throat doctors.
CONCLUSIONS
We developed an artificial intelligence-based screening tool for common vocal fold diseases, which possessed high specificity after training with our Mandarin pathological voice database. This approach has clinical potential to use artificial intelligence for general vocal fold disease screening via voice and includes a quick survey during a general health examination. It can be applied in telemedicine for areas that lack laryngoscopic abilities in primary care units.