Acoustic model training with detecting transcription errors in the training data

Abstract Background Correctly identifying views acquired in a 2D echocardiographic examination is paramount to post-processing and quantification steps often performed as part of most clinical workflows. In many exams, particularly in stress echocardiography, microbubble contrast is used which greatly affects the appearance of the cardiac views. Here we present a bespoke, fully automated convolutional neural network (CNN) which identifies apical 2, 3, and 4 chamber, and short axis (SAX) views acquired with and without contrast. The CNN was tested in a completely independent, external dataset with the data acquired in a different country than that used to train the neural network. Methods Training data comprised of 2D echocardiograms was taken from 1014 subjects from a prospective multisite, multi-vendor, UK trial with the number of frames in each view greater than 17,500. Prior to view classification model training, images were processed using standard techniques to ensure homogenous and normalised image inputs to the training pipeline. A bespoke CNN was built using the minimum number of convolutional layers required with batch normalisation, and including dropout for reducing overfitting. Before processing, the data was split into 90% for model training (211,958 frames), and 10% used as a validation dataset (23,946 frames). Image frames from different subjects were separated out entirely amongst the training and validation datasets. Further, a separate trial dataset of 240 studies acquired in the USA was used as an independent test dataset (39,401 frames). Results Figure 1 shows the confusion matrices for both validation data (left) and independent test data (right), with an overall accuracy of 96% and 95% for the validation and test datasets respectively. The accuracy for the non-contrast cardiac views of >99% exceeds that seen in other works. The combined datasets included images acquired across ultrasound manufacturers and models from 12 clinical sites. Conclusion We have developed a CNN capable of automatically accurately identifying all relevant cardiac views used in “real world” echo exams, including views acquired with contrast. Use of the CNN in a routine clinical workflow could improve efficiency of quantification steps performed after image acquisition. This was tested on an independent dataset acquired in a different country to that used to train the model and was found to perform similarly thus indicating the generalisability of the model. Figure 1. Confusion matrices Funding Acknowledgement Type of funding source: Private company. Main funding source(s): Ultromics Ltd.

Download Full-text

Lithuanian Broadcast Speech Transcription Using Semi-supervised Acoustic Model Training

Procedia Computer Science ◽

10.1016/j.procs.2016.04.037 ◽

2016 ◽

Vol 81 ◽

pp. 107-113 ◽

Cited By ~ 6

Author(s):

Rasa Lileikytė ◽

Arseniy Gorin ◽

Lori Lamel ◽

Jean-Luc Gauvain ◽

Thiago Fraga-Silva

Keyword(s):

Acoustic Model ◽

Model Training ◽

Speech Transcription

Download Full-text

Investigating lightly supervised acoustic model training

2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221) ◽

10.1109/icassp.2001.940871 ◽

2002 ◽

Cited By ~ 15

Author(s):

L. Lamel ◽

J.L. Gauvain ◽

G. Adda

Keyword(s):

Acoustic Model ◽

Model Training

Download Full-text

On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

eLife ◽

10.7554/elife.59780 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 1

Author(s):

Dennis Segebarth ◽

Matthias Griebel ◽

Nikolai Stein ◽

Cora R von Collenberg ◽

Corinna Martin ◽

...

Keyword(s):

Deep Learning ◽

Signal To Noise Ratio ◽

Biological Effects ◽

Reliability And Validity ◽

Ground Truth ◽

Training Data ◽

Model Organisms ◽

Data Annotation ◽

Bioimage Analysis ◽

Model Training

Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.

Download Full-text

Language diarization for semi-supervised bilingual acoustic model training

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) ◽

10.1109/asru.2017.8268921 ◽

2017 ◽

Cited By ~ 1

Author(s):

Emre Yilmaz ◽

Mitchell McLaren ◽

Henk van den Heuvel ◽

David A. van Leeuwen

Keyword(s):

Acoustic Model ◽

Model Training

Download Full-text

Broadening volcanic eruption forecasting using transfer machine learning

10.5194/egusphere-egu21-970 ◽

2021 ◽

Author(s):

David Dempsey ◽

Shane Cronin ◽

Andreas Kempa-Liehr ◽

Martin Letourneur

Keyword(s):

Machine Learning ◽

Seismic Station ◽

Feature Space ◽

Forecast Model ◽

Linear Interpolation ◽

Lessons Learned ◽

Training Data ◽

Single Station ◽

Data Driven Approach ◽

Model Training

Sudden steam-driven eruptions at tourist volcanoes were the cause of 63 deaths at Mt Ontake (Japan) in 2014, and 22 deaths at Whakaari (New Zealand) in 2019. Warning systems that can anticipate these eruptions could provide crucial hours for evacuation or sheltering but these require reliable forecasting. Recently, machine learning has been used to extract eruption precursors from observational data and train forecasting models. However, a weakness of this data-driven approach is its reliance on long observational records that span multiple eruptions. As many volcano datasets may only record one or no eruptions, there is a need to extend these techniques to data-poor locales.Transfer machine learning is one approach for generalising lessons learned at data-rich volcanoes and applying them to data-poor ones. Here, we tackle two problems: (1) generalising time series features between seismic stations at Whakaari to address recording gaps, and (2) training a forecasting model for Mt Ruapehu augmented using data from Whakaari. This required that we standardise data records at different stations for direct comparisons, devise an interpolation scheme to fill in missing eruption data, and combine volcano-specific feature matrices prior to model training.We trained a forecast model for Whakaari using tremor data from three eruptions recorded at one seismic station (WSRZ) and augmented by data from two other eruptions recorded at a second station (WIZ). First, the training data from both stations were standardised to a unit normal distribution in log space. Then, linear interpolation in feature space was used to infer missing eruption features at WSRZ. Under pseudo-prospective testing, the augmented model had similar forecasting skill to one trained using all five eruptions recorded at a single station (WIZ). However, extending this approach to Ruapehu, we saw reduced performance indicating that more work is needed in standardisation and feature selection.

Download Full-text