scholarly journals A Flexible Analysis Tool for the Quantitative Acoustic Assessment of Infant Cry

2013 ◽  
Vol 56 (5) ◽  
pp. 1416-1428 ◽  
Author(s):  
Brian Reggiannini ◽  
Stephen J. Sheinkopf ◽  
Harvey F. Silverman ◽  
Xiaoxue Li ◽  
Barry M. Lester

Purpose In this article, the authors describe and validate the performance of a modern acoustic analyzer specifically designed for infant cry analysis. Method Utilizing known algorithms, the authors developed a method to extract acoustic parameters describing infant cries from standard digital audio files. They used a frame rate of 25 ms with a frame advance of 12.5 ms. Cepstral-based acoustic analysis proceeded in 2 phases, computing frame-level data and then organizing and summarizing this information within cry utterances. Using signal detection methods, the authors evaluated the accuracy of the automated system to determine voicing and to detect fundamental frequency (F 0 ) as compared to voiced segments and pitch periods manually coded from spectrogram displays. Results The system detected F 0 with 88% to 95% accuracy, depending on tolerances set at 10 to 20 Hz. Receiver operating characteristic analyses demonstrated very high accuracy at detecting voicing characteristics in the cry samples. Conclusions This article describes an automated infant cry analyzer with high accuracy to detect important acoustic features of cry. A unique and important aspect of this work is the rigorous testing of the system's accuracy as compared to ground-truth manual coding. The resulting system has implications for basic and applied research on infant cry development.

2011 ◽  
Vol 21 (2) ◽  
pp. 44-54
Author(s):  
Kerry Callahan Mandulak

Spectral moment analysis (SMA) is an acoustic analysis tool that shows promise for enhancing our understanding of normal and disordered speech production. It can augment auditory-perceptual analysis used to investigate differences across speakers and groups and can provide unique information regarding specific aspects of the speech signal. The purpose of this paper is to illustrate the utility of SMA as a clinical measure for both clinical speech production assessment and research applications documenting speech outcome measurements. Although acoustic analysis has become more readily available and accessible, clinicians need training with, and exposure to, acoustic analysis methods in order to integrate them into traditional methods used to assess speech production.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Xiang Li ◽  
Jianzheng Liu ◽  
Jessica Baron ◽  
Khoa Luu ◽  
Eric Patterson

AbstractRecent attention to facial alignment and landmark detection methods, particularly with application of deep convolutional neural networks, have yielded notable improvements. Neither these neural-network nor more traditional methods, though, have been tested directly regarding performance differences due to camera-lens focal length nor camera viewing angle of subjects systematically across the viewing hemisphere. This work uses photo-realistic, synthesized facial images with varying parameters and corresponding ground-truth landmarks to enable comparison of alignment and landmark detection techniques relative to general performance, performance across focal length, and performance across viewing angle. Recently published high-performing methods along with traditional techniques are compared in regards to these aspects.


AI Magazine ◽  
2014 ◽  
Vol 35 (1) ◽  
pp. 53
Author(s):  
Hadrien Cambazard ◽  
Barry O'Sullivan ◽  
Helmut Simonis

We describe a constraint-based timetabling system that was developed for the dental school based at Cork University Hospital in Ireland. This sy stem has been deployed since 2010. Dental school timetabling differs from other university course scheduling in that certain clinic sessions can be used by multiple courses at the same time, provided a limit on room capacity is satisfied. Starting from a constraint programming solution using a web interface, we have moved to a mixed integer programming-based solver to deal with multiple objective functions, along with a dedicated Java application, which provides a rich user interface. Solutions for the years 2010, 2011 and 2012 have been used in the dental school, replacing a manual timetabling process, which could no longer cope with increasing student numbers and resulting resource bottlenecks. The use of the automated system allowed the dental school to increase the number of students enrolled to the maximum possible given the available resources. It also provides the school with a valuable “what-if” analysis tool.


2021 ◽  
Vol 4 ◽  
Author(s):  
Rolando Coto-Solano ◽  
James N. Stanford ◽  
Sravana K. Reddy

In recent decades, computational approaches to sociophonetic vowel analysis have been steadily increasing, and sociolinguists now frequently use semi-automated systems for phonetic alignment and vowel formant extraction, including FAVE (Forced Alignment and Vowel Extraction, Rosenfelder et al., 2011; Evanini et al., Proceedings of Interspeech, 2009), Penn Aligner (Yuan and Liberman, J. Acoust. Soc. America, 2008, 123, 3878), and DARLA (Dartmouth Linguistic Automation), (Reddy and Stanford, DARLA Dartmouth Linguistic Automation: Online Tools for Linguistic Research, 2015a). Yet these systems still have a major bottleneck: manual transcription. For most modern sociolinguistic vowel alignment and formant extraction, researchers must first create manual transcriptions. This human step is painstaking, time-consuming, and resource intensive. If this manual step could be replaced with completely automated methods, sociolinguists could potentially tap into vast datasets that have previously been unexplored, including legacy recordings that are underutilized due to lack of transcriptions. Moreover, if sociolinguists could quickly and accurately extract phonetic information from the millions of hours of new audio content posted on the Internet every day, a virtual ocean of speech from newly created podcasts, videos, live-streams, and other audio content would now inform research. How close are the current technological tools to achieving such groundbreaking changes for sociolinguistics? Prior work (Reddy et al., Proceedings of the North American Association for Computational Linguistics 2015 Conference, 2015b, 71–75) showed that an HMM-based Automated Speech Recognition system, trained with CMU Sphinx (Lamere et al., 2003), was accurate enough for DARLA to uncover evidence of the US Southern Vowel Shift without any human transcription. Even so, because that automatic speech recognition (ASR) system relied on a small training set, it produced numerous transcription errors. Six years have passed since that study, and since that time numerous end-to-end automatic speech recognition (ASR) algorithms have shown considerable improvement in transcription quality. One example of such a system is the RNN/CTC-based DeepSpeech from Mozilla (Hannun et al., 2014). (RNN stands for recurrent neural networks, the learning mechanism for DeepSpeech. CTC stands for connectionist temporal classification, the mechanism to merge phones into words). The present paper combines DeepSpeech with DARLA to push the technological envelope and determine how well contemporary ASR systems can perform in completely automated vowel analyses with sociolinguistic goals. Specifically, we used these techniques on audio recordings from 352 North American English speakers in the International Dialects of English Archive (IDEA1), extracting 88,500 tokens of vowels in stressed position from spontaneous, free speech passages. With this large dataset we conducted acoustic sociophonetic analyses of the Southern Vowel Shift and the Northern Cities Chain Shift in the North American IDEA speakers. We compared the results using three different sources of transcriptions: 1) IDEA’s manual transcriptions as the baseline “ground truth”, 2) the ASR built on CMU Sphinx used by Reddy et al. (Proceedings of the North American Association for Computational Linguistics 2015 Conference, 2015b, 71–75), and 3) the latest publicly available Mozilla DeepSpeech system. We input these three different transcriptions to DARLA, which automatically aligned and extracted the vowel formants from the 352 IDEA speakers. Our quantitative results show that newer ASR systems like DeepSpeech show considerable promise for sociolinguistic applications like DARLA. We found that DeepSpeech’s automated transcriptions had significantly fewer character error rates than those from the prior Sphinx system (from 46 to 35%). When we performed the sociolinguistic analysis of the extracted vowel formants from DARLA, we found that the automated transcriptions from DeepSpeech matched the results from the ground truth for the Southern Vowel Shift (SVS): five vowels showed a shift in both transcriptions, and two vowels didn’t show a shift in either transcription. The Northern Cities Shift (NCS) was more difficult to detect, but ground truth and DeepSpeech matched for four vowels: One of the vowels showed a clear shift, and three showed no shift in either transcription. Our study therefore shows how technology has made progress toward greater automation in vowel sociophonetics, while also showing what remains to be done. Our statistical modeling provides a quantified view of both the abilities and the limitations of a completely “hands-free” analysis of vowel shifts in a large dataset. Naturally, when comparing a completely automated system against a semi-automated system involving human manual work, there will always be a tradeoff between accuracy on the one hand versus speed and replicability on the other hand [Kendall and Joseph, Towards best practices in sociophonetics (with Marianna DiPaolo), 2014]. The amount of “noise” that can be tolerated for a given study will depend on the particular research goals and researchers’ preferences. Nonetheless, our study shows that, for certain large-scale applications and research goals, a completely automated approach using publicly available ASR can produce meaningful sociolinguistic results across large datasets, and these results can be generated quickly, efficiently, and with full replicability.


2021 ◽  
Vol 11 (17) ◽  
pp. 7877
Author(s):  
Daehyeon Lee ◽  
Woosung Shim ◽  
Munyong Lee ◽  
Seunghyun Lee ◽  
Kye-Dong Jung ◽  
...  

Recently, the development of 3D graphics technology has led to various technologies being combined with reality, where a new reality is defined or studied; they are typically named by combining the name of the technology with “reality”. Representative “reality” includes Augmented Reality, Virtual Reality, Mixed Reality, and eXtended Reality (XR). In particular, research on XR in the web environment is actively being conducted. The Web eXtended Reality Device Application Programming Interface (WebXR Device API), released in 2018, allows instant deployment of XR services to any XR platform requiring only an active web browser. However, the currently released tentative version has poor stability. Therefore, in this study, the performance evaluation of WebXR Device API is performed using three experiments. A camera trajectory experiment is analyzed using ground truth, we checked the standard deviation between the ground truth and WebXR for the X, Y, and Z axes. The difference image experiment is conducted for the front, left, and right directions, which resulted in a visible difference image for each image of ground truth and WebXR, small mean absolute error, and high match rate. In the experiment for measuring the 3D rendering speed, a frame rate similar to that of real-time is obtained.


2017 ◽  
Vol 10 (3) ◽  
pp. 285-289 ◽  
Author(s):  
Katrina L Ruedinger ◽  
David R Rutkowski ◽  
Sebastian Schafer ◽  
Alejandro Roldán-Alzate ◽  
Erick L Oberstar ◽  
...  

Background and purposeSafe and effective use of newly developed devices for aneurysm treatment requires the ability to make accurate measurements in the angiographic suite. Our purpose was to determine the parameters that optimize the geometric accuracy of three-dimensional (3D) vascular reconstructions.MethodsAn in vitro flow model consisting of a peristaltic pump, plastic tubing, and 3D printed patient-specific aneurysm models was used to simulate blood flow in an intracranial aneurysm. Flow rates were adjusted to match values reported in the literature for the internal carotid artery. 3D digital subtraction angiography acquisitions were obtained using a commercially available biplane angiographic system. Reconstructions were done using Edge Enhancement (EE) or Hounsfield Unit (HU) kernels and a Normal or Smooth image characteristic. Reconstructed images were analyzed using the vendor's aneurysm analysis tool. Ground truth measurements were derived from metrological scans of the models with a microCT. Aneurysm volume, surface area, dome height, minimum and maximum ostium diameter were determined for the five models.ResultsIn all cases, measurements made with the EE kernel most closely matched ground truth values. Differences in values derived from reconstructions displayed with Smooth or Normal image characteristics were small and had only little impact on the geometric parameters considered.ConclusionsReconstruction parameters impact the accuracy of measurements made using the aneurysm analysis tool of a commercially available angiographic system. Absolute differences between measurements made using reconstruction parameters determined as optimal in this study were, overall, very small. The significance of these differences, if any, will depend on the details of each individual case.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1737 ◽  
Author(s):  
Tae-young Ko ◽  
Seung-ho Lee

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.


Author(s):  
Huaqi Zhang ◽  
Guanglei Wang ◽  
Yan Li ◽  
Feng Lin ◽  
Yechen Han ◽  
...  

Coronary optical coherence tomography (OCT) is a new high-resolution intravascular imaging technology that clearly depicts coronary artery stenosis and plaque information. Study of coronary OCT images is of significance in the diagnosis of coronary atherosclerotic heart disease (CAD). We introduce a new method based on the convolutional neural network (CNN) and an improved random walk (RW) algorithm for the recognition and segmentation of calcified, lipid and fibrotic plaque in coronary OCT images. First, we design CNN with three different depths (2, 4 or 6 convolutional layers) to perform the automatic recognition and select the optimal CNN model. Then, we device an improved RW algorithm. According to the gray-level distribution characteristics of coronary OCT images, the weights of intensity and texture term in the weight function of RW algorithm are adjusted by an adaptive weight. Finally, we apply mathematical morphology in combination with two RWs to accurately segment the plaque area. Compared with the ground truth of clinical segmentation results, the Jaccard similarity coefficient (JSC) of calcified and lipid plaque segmentation results is 0.864, the average symmetric contour distance (ASCD) is 0.375[Formula: see text]mm, the JSC and ASCD reliabilities are 88.33% and 92.50% respectively. The JSC of fibrotic plaque is 0.876, the ASCD is 0.349[Formula: see text]mm, the JSC and ASCD reliabilities are 90.83% and 95.83% respectively. In addition, the average segmentation time (AST) does not exceed 5 s. Reliable and significantly improved results have been achieved in this study. Compared with the CNN, traditional RW algorithm and other methods. The proposed method has the advantages of fast segmentation, high accuracy and reliability, and holds promise as an aid to doctors in the diagnosis of CAD.


2019 ◽  
Vol 136 ◽  
pp. 04076 ◽  
Author(s):  
Shuwei Xu ◽  
Shan Zhang ◽  
Shuwei Xu

This paper presents a method of extracting traffic lines from image images by GAN. Compared with the traditional image detection methods, the counter neural network does not need repeated sampling of Markov chain and adopts the method of backward propagation. Therefore, when detecting the image, GAN do not need to be updated with samples; it can produce better quality samples, express more clearly. Experimental results show that the method has strong generalization ability, fast recognition speed and high accuracy.


Sign in / Sign up

Export Citation Format

Share Document