speaker segmentation
Recently Published Documents


TOTAL DOCUMENTS

102
(FIVE YEARS 6)

H-INDEX

11
(FIVE YEARS 1)

2021 ◽  
pp. 1-17
Author(s):  
Sethuram V ◽  
Ande Prasad ◽  
R. Rajeswara Rao

In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and Speaker Segmentation and Clustering process. Initially, from the input audio stream (Telugu language) collected, the Mel Frequency Cepstral coefficient (MFCC) based features are extracted. Subsequently, in Speech Activity Detection (SAD), the music and silence signals are removed. Then, the acquired speech signals are segmented for each individual speaker. Finally, the segmented signals are subjected to the speaker clustering process, where the Optimized Convolutional Neural Network (CNN) is used. To make the clustering more appropriate, the weight and activation function of CNN are fine-tuned by a new Self Adaptive Sea Lion Algorithm (SA-SLnO). Finally, a comparative analysis is made to exhibit the superiority of the proposed speaker diarization work. Accordingly, the accuracy of the proposed method is 0.8073, which is 5.255, 2.45%, and 0.075, superior to the existing works.


2020 ◽  
Vol 92 (1) ◽  
pp. 12-15
Author(s):  
Ronan Peter Daniel O'Malley ◽  
Bahman Mirheidari ◽  
Kirsty Harkness ◽  
Markus Reuber ◽  
Annalena Venneri ◽  
...  

IntroductionRecent years have seen an almost sevenfold rise in referrals to specialist memory clinics. This has been associated with an increased proportion of patients referred with functional cognitive disorder (FCD), that is, non-progressive cognitive complaints. These patients are likely to benefit from a range of interventions (eg, psychotherapy) distinct from the requirements of patients with neurodegenerative cognitive disorders. We have developed a fully automated system, ‘CognoSpeak’, which enables risk stratification at the primary–secondary care interface and ongoing monitoring of patients with memory concerns.MethodsWe recruited 15 participants to each of four groups: Alzheimer’s disease (AD), mild cognitive impairment (MCI), FCD and healthy controls. Participants responded to 12 questions posed by a computer-presented talking head. Automatic analysis of the audio and speech data involved speaker segmentation, automatic speech recognition and machine learning classification.ResultsCognoSpeak could distinguish between participants in the AD or MCI groups and those in the FCD or healthy control groups with a sensitivity of 86.7%. Patients with MCI were identified with a sensitivity of 80%.DiscussionOur fully automated system achieved levels of accuracy comparable to currently available, manually administered assessments. Greater accuracy should be achievable through further system training with a greater number of users, the inclusion of verbal fluency tasks and repeat assessments. The current data supports CognoSpeak’s promise as a screening and monitoring tool for patients with MCI. Pending confirmation of these findings, it may allow clinicians to offer patients at low risk of dementia earlier reassurance and relieve pressures on specialist memory services.


Author(s):  
Yi Xin Sun ◽  
Yong Ma ◽  
Kai Bo Shi ◽  
Jiang Ping Hu ◽  
Yi Yi Zhao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document