Robust speech detection and segmentation for real-time ASR applications

Author(s):  
I. Shafran ◽  
R. Rose
Keyword(s):  
2019 ◽  
Author(s):  
Chi-Te Wang ◽  
Ji-Yan Han ◽  
Shih-Hau Fang ◽  
Ying-Hui Lai

BACKGROUND Voice disorders mainly result from chronic overuse or abuse, particularly in occupational voice users such as teachers. Previous studies proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, along with the lack of real-time processing, has limited its clinical application. OBJECTIVE This study aims to (1) propose an automatic speech detection system using wireless microphones for real-time ambulatory voice monitoring, (2) examine the detection accuracy under controlled environment and noisy conditions, and (3) report the results of the phonation ratio in practical scenarios. METHODS We designed an adaptive threshold function to detect the presence of speech based on the energy envelope. We invited 10 teachers to participate in this study and tested the performance of the proposed automatic speech detection system regarding detection accuracy and phonation ratio. Moreover, we investigated whether the unsupervised noise reduction algorithm (ie, log minimum mean square error) can overcome the influence of environmental noise in the proposed system. RESULTS The proposed system exhibited an average accuracy of speech detection of 89.9%, ranging from 81.0% (67,357/83,157 frames) to 95.0% (199,201/209,685 frames). Subsequent analyses revealed a phonation ratio between 44.0% (33,019/75,044 frames) and 78.0% (68,785/88,186 frames) during teaching sessions of 40-60 minutes; the durations of most of the phonation segments were less than 10 seconds. The presence of background noise reduced the accuracy of the automatic speech detection system, and an adjuvant noise reduction function could effectively improve the accuracy, especially under stable noise conditions. CONCLUSIONS This study demonstrated an average detection accuracy of 89.9% in the proposed automatic speech detection system with wireless microphones. The preliminary results for the phonation ratio were comparable to those of previous studies. Although the wireless microphones are susceptible to background noise, an additional noise reduction function can alleviate this limitation. These results indicate that the proposed system can be applied for ambulatory voice monitoring in occupational voice users.


10.2196/16746 ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. e16746
Author(s):  
Chi-Te Wang ◽  
Ji-Yan Han ◽  
Shih-Hau Fang ◽  
Ying-Hui Lai

Background Voice disorders mainly result from chronic overuse or abuse, particularly in occupational voice users such as teachers. Previous studies proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, along with the lack of real-time processing, has limited its clinical application. Objective This study aims to (1) propose an automatic speech detection system using wireless microphones for real-time ambulatory voice monitoring, (2) examine the detection accuracy under controlled environment and noisy conditions, and (3) report the results of the phonation ratio in practical scenarios. Methods We designed an adaptive threshold function to detect the presence of speech based on the energy envelope. We invited 10 teachers to participate in this study and tested the performance of the proposed automatic speech detection system regarding detection accuracy and phonation ratio. Moreover, we investigated whether the unsupervised noise reduction algorithm (ie, log minimum mean square error) can overcome the influence of environmental noise in the proposed system. Results The proposed system exhibited an average accuracy of speech detection of 89.9%, ranging from 81.0% (67,357/83,157 frames) to 95.0% (199,201/209,685 frames). Subsequent analyses revealed a phonation ratio between 44.0% (33,019/75,044 frames) and 78.0% (68,785/88,186 frames) during teaching sessions of 40-60 minutes; the durations of most of the phonation segments were less than 10 seconds. The presence of background noise reduced the accuracy of the automatic speech detection system, and an adjuvant noise reduction function could effectively improve the accuracy, especially under stable noise conditions. Conclusions This study demonstrated an average detection accuracy of 89.9% in the proposed automatic speech detection system with wireless microphones. The preliminary results for the phonation ratio were comparable to those of previous studies. Although the wireless microphones are susceptible to background noise, an additional noise reduction function can alleviate this limitation. These results indicate that the proposed system can be applied for ambulatory voice monitoring in occupational voice users.


1979 ◽  
Vol 44 ◽  
pp. 41-47
Author(s):  
Donald A. Landman

This paper describes some recent results of our quiescent prominence spectrometry program at the Mees Solar Observatory on Haleakala. The observations were made with the 25 cm coronagraph/coudé spectrograph system using a silicon vidicon detector. This detector consists of 500 contiguous channels covering approximately 6 or 80 Å, depending on the grating used. The instrument is interfaced to the Observatory’s PDP 11/45 computer system, and has the important advantages of wide spectral response, linearity and signal-averaging with real-time display. Its principal drawback is the relatively small target size. For the present work, the aperture was about 3″ × 5″. Absolute intensity calibrations were made by measuring quiet regions near sun center.


Author(s):  
Alan S. Rudolph ◽  
Ronald R. Price

We have employed cryoelectron microscopy to visualize events that occur during the freeze-drying of artificial membranes by employing real time video capture techniques. Artificial membranes or liposomes which are spherical structures within internal aqueous space are stabilized by water which provides the driving force for spontaneous self-assembly of these structures. Previous assays of damage to these structures which are induced by freeze drying reveal that the two principal deleterious events that occur are 1) fusion of liposomes and 2) leakage of contents trapped within the liposome [1]. In the past the only way to access these events was to examine the liposomes following the dehydration event. This technique allows the event to be monitored in real time as the liposomes destabilize and as water is sublimed at cryo temperatures in the vacuum of the microscope. The method by which liposomes are compromised by freeze-drying are largely unknown. This technique has shown that cryo-protectants such as glycerol and carbohydrates are able to maintain liposomal structure throughout the drying process.


Author(s):  
R.P. Goehner ◽  
W.T. Hatfield ◽  
Prakash Rao

Computer programs are now available in various laboratories for the indexing and simulation of transmission electron diffraction patterns. Although these programs address themselves to the solution of various aspects of the indexing and simulation process, the ultimate goal is to perform real time diffraction pattern analysis directly off of the imaging screen of the transmission electron microscope. The program to be described in this paper represents one step prior to real time analysis. It involves the combination of two programs, described in an earlier paper(l), into a single program for use on an interactive basis with a minicomputer. In our case, the minicomputer is an INTERDATA 70 equipped with a Tektronix 4010-1 graphical display terminal and hard copy unit.A simplified flow diagram of the combined program, written in Fortran IV, is shown in Figure 1. It consists of two programs INDEX and TEDP which index and simulate electron diffraction patterns respectively. The user has the option of choosing either the indexing or simulating aspects of the combined program.


Author(s):  
R. Rajesh ◽  
R. Droopad ◽  
C. H. Kuo ◽  
R. W. Carpenter ◽  
G. N. Maracas

Knowledge of material pseudodielectric functions at MBE growth temperatures is essential for achieving in-situ, real time growth control. This allows us to accurately monitor and control thicknesses of the layers during growth. Undesired effusion cell temperature fluctuations during growth can thus be compensated for in real-time by spectroscopic ellipsometry. The accuracy in determining pseudodielectric functions is increased if one does not require applying a structure model to correct for the presence of an unknown surface layer such as a native oxide. Performing these measurements in an MBE reactor on as-grown material gives us this advantage. Thus, a simple three phase model (vacuum/thin film/substrate) can be used to obtain thin film data without uncertainties arising from a surface oxide layer of unknown composition and temperature dependence.In this study, we obtain the pseudodielectric functions of MBE-grown AlAs from growth temperature (650°C) to room temperature (30°C). The profile of the wavelength-dependent function from the ellipsometry data indicated a rough surface after growth of 0.5 μm of AlAs at a substrate temperature of 600°C, which is typical for MBE-growth of GaAs.


Author(s):  
K. Harada ◽  
T. Matsuda ◽  
J.E. Bonevich ◽  
M. Igarashi ◽  
S. Kondo ◽  
...  

Previous observations of magnetic flux-lines (vortex lattices) in superconductors, such as the field distribution of a flux-line, and flux-line dynamics activated by heat and current, have employed the high spatial resolution and magnetic sensitivity of electron holography. And recently, the 2-D static distribution of vortices was also observed by this technique. However, real-time observations of the vortex lattice, in spite of scientific and technological interest, have not been possible due to experimental difficulties. Here, we report the real-time observation of vortex lattices in a thin superconductor, by means of Lorentz microscopy using a 300 kV field emission electron microscope. This technique allows us to observe the dynamic motion of individual vortices and record the events on a VTR system.The experimental arrangement is shown in Fig. 1. A Nb thin film for transmission observation was prepared by chemical etching. The grain size of the film was increased by annealing, and single crystals were observed with a thickness of 50∼90 nm.


2001 ◽  
Vol 7 (S2) ◽  
pp. 1012-1013
Author(s):  
Uyen Tram ◽  
William Sullivan

Embryonic development is a dynamic event and is best studied in live animals in real time. Much of our knowledge of the early events of embryogenesis, however, comes from immunofluourescent analysis of fixed embryos. While these studies provide an enormous amount of information about the organization of different structures during development, they can give only a static glimpse of a very dynamic event. More recently real-time fluorescent studies of living embryos have become much more routine and have given new insights to how different structures and organelles (chromosomes, centrosomes, cytoskeleton, etc.) are coordinately regulated. This is in large part due to the development of commercially available fluorescent probes, GFP technology, and newly developed sensitive fluorescent microscopes. For example, live confocal fluorescent analysis proved essential in determining the primary defect in mutations that disrupt early nuclear divisions in Drosophila melanogaster. For organisms in which GPF transgenics is not available, fluorescent probes that label DNA, microtubules, and actin are available for microinjection.


2019 ◽  
Vol 4 (2) ◽  
pp. 356-362
Author(s):  
Jennifer W. Means ◽  
Casey McCaffrey

Purpose The use of real-time recording technology for clinical instruction allows student clinicians to more easily collect data, self-reflect, and move toward independence as supervisors continue to provide continuation of supportive methods. This article discusses how the use of high-definition real-time recording, Bluetooth technology, and embedded annotation may enhance the supervisory process. It also reports results of graduate students' perception of the benefits and satisfaction with the types of technology used. Method Survey data were collected from graduate students about their use and perceived benefits of advanced technology to support supervision during their 1st clinical experience. Results Survey results indicate that students found the use of their video recordings useful for self-evaluation, data collection, and therapy preparation. The students also perceived an increase in self-confidence through the use of the Bluetooth headsets as their supervisors could provide guidance and encouragement without interrupting the flow of their therapy sessions by entering the room to redirect them. Conclusions The use of video recording technology can provide opportunities for students to review: videos of prospective clients they will be treating, their treatment videos for self-assessment purposes, and for additional data collection. Bluetooth technology provides immediate communication between the clinical educator and the student. Students reported that the result of that communication can improve their self-confidence, perceived performance, and subsequent shift toward independence.


Sign in / Sign up

Export Citation Format

Share Document