scholarly journals Improving Temporal Stability and Accuracy for Endoscopic Video Tissue Classification Using Recurrent Neural Networks

Sensors ◽  
2020 ◽  
Vol 20 (15) ◽  
pp. 4133
Author(s):  
Tim Boers ◽  
Joost van der Putten ◽  
Maarten Struyvenberg ◽  
Kiki Fockens ◽  
Jelmer Jukema ◽  
...  

Early Barrett’s neoplasia are often missed due to subtle visual features and inexperience of the non-expert endoscopist with such lesions. While promising results have been reported on the automated detection of this type of early cancer in still endoscopic images, video-based detection using the temporal domain is still open. The temporally stable nature of video data in endoscopic examinations enables to develop a framework that can diagnose the imaged tissue class over time, thereby yielding a more robust and improved model for spatial predictions. We show that the introduction of Recurrent Neural Network nodes offers a more stable and accurate model for tissue classification, compared to classification on individual images. We have developed a customized Resnet18 feature extractor with four types of classifiers: Fully Connected (FC), Fully Connected with an averaging filter (FC Avg (n = 5)), Long Short Term Memory (LSTM) and a Gated Recurrent Unit (GRU). Experimental results are based on 82 pullback videos of the esophagus with 46 high-grade dysplasia patients. Our results demonstrate that the LSTM classifier outperforms the FC, FC Avg (n = 5) and GRU classifier with an average accuracy of 85.9% compared to 82.2%, 83.0% and 85.6%, respectively. The benefit of our novel implementation for endoscopic tissue classification is the inclusion of spatio-temporal information for improved and robust decision making, and it is the first step towards full temporal learning of esophageal cancer detection in endoscopic video.

2021 ◽  
Vol 2 ◽  
Author(s):  
Yongliang Qiao ◽  
Cameron Clark ◽  
Sabrina Lomax ◽  
He Kong ◽  
Daobilige Su ◽  
...  

Individual cattle identification is a prerequisite and foundation for precision livestock farming. Existing methods for cattle identification require radio frequency or visual ear tags, all of which are prone to loss or damage. Here, we propose and implement a new unified deep learning approach to cattle identification using video analysis. The proposed deep learning framework is composed of a Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) with a self-attention mechanism. More specifically, the Inception-V3 CNN was used to extract features from a cattle video dataset taken in a feedlot with rear-view. Extracted features were then fed to a BiLSTM layer to capture spatio-temporal information. Then, self-attention was employed to provide a different focus on the features captured by BiLSTM for the final step of cattle identification. We used a total of 363 rear-view videos from 50 cattle at three different times with an interval of 1 month between data collection periods. The proposed method achieved 93.3% identification accuracy using a 30-frame video length, which outperformed current state-of-the-art methods (Inception-V3, MLP, SimpleRNN, LSTM, and BiLSTM). Furthermore, two different attention schemes, namely, additive and multiplicative attention mechanisms were compared. Our results show that the additive attention mechanism achieved 93.3% accuracy and 91.0% recall, greater than multiplicative attention mechanism with 90.7% accuracy and 87.0% recall. Video length also impacted accuracy, with video sequence length up to 30-frames enhancing identification performance. Overall, our approach can capture key spatio-temporal features to improve cattle identification accuracy, enabling automated cattle identification for precision livestock farming.


Atmosphere ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 238
Author(s):  
Pablo Contreras ◽  
Johanna Orellana-Alvear ◽  
Paul Muñoz ◽  
Jörg Bendix ◽  
Rolando Célleri

The Random Forest (RF) algorithm, a decision-tree-based technique, has become a promising approach for applications addressing runoff forecasting in remote areas. This machine learning approach can overcome the limitations of scarce spatio-temporal data and physical parameters needed for process-based hydrological models. However, the influence of RF hyperparameters is still uncertain and needs to be explored. Therefore, the aim of this study is to analyze the sensitivity of RF runoff forecasting models of varying lead time to the hyperparameters of the algorithm. For this, models were trained by using (a) default and (b) extensive hyperparameter combinations through a grid-search approach that allow reaching the optimal set. Model performances were assessed based on the R2, %Bias, and RMSE metrics. We found that: (i) The most influencing hyperparameter is the number of trees in the forest, however the combination of the depth of the tree and the number of features hyperparameters produced the highest variability-instability on the models. (ii) Hyperparameter optimization significantly improved model performance for higher lead times (12- and 24-h). For instance, the performance of the 12-h forecasting model under default RF hyperparameters improved to R2 = 0.41 after optimization (gain of 0.17). However, for short lead times (4-h) there was no significant model improvement (0.69 < R2 < 0.70). (iii) There is a range of values for each hyperparameter in which the performance of the model is not significantly affected but remains close to the optimal. Thus, a compromise between hyperparameter interactions (i.e., their values) can produce similar high model performances. Model improvements after optimization can be explained from a hydrological point of view, the generalization ability for lead times larger than the concentration time of the catchment tend to rely more on hyperparameterization than in what they can learn from the input data. This insight can help in the development of operational early warning systems.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Brett H. Hokr ◽  
Joel N. Bixler

AbstractDynamic, in vivo measurement of the optical properties of biological tissues is still an elusive and critically important problem. Here we develop a technique for inverting a Monte Carlo simulation to extract tissue optical properties from the statistical moments of the spatio-temporal response of the tissue by training a 5-layer fully connected neural network. We demonstrate the accuracy of the method across a very wide parameter space on a single homogeneous layer tissue model and demonstrate that the method is insensitive to parameter selection of the neural network model itself. Finally, we propose an experimental setup capable of measuring the required information in real time in an in vivo environment and demonstrate proof-of-concept level experimental results.


Author(s):  
Sophia Bano ◽  
Francisco Vasconcelos ◽  
Emmanuel Vander Poorten ◽  
Tom Vercauteren ◽  
Sebastien Ourselin ◽  
...  

Abstract Purpose Fetoscopic laser photocoagulation is a minimally invasive surgery for the treatment of twin-to-twin transfusion syndrome (TTTS). By using a lens/fibre-optic scope, inserted into the amniotic cavity, the abnormal placental vascular anastomoses are identified and ablated to regulate blood flow to both fetuses. Limited field-of-view, occlusions due to fetus presence and low visibility make it difficult to identify all vascular anastomoses. Automatic computer-assisted techniques may provide better understanding of the anatomical structure during surgery for risk-free laser photocoagulation and may facilitate in improving mosaics from fetoscopic videos. Methods We propose FetNet, a combined convolutional neural network (CNN) and long short-term memory (LSTM) recurrent neural network architecture for the spatio-temporal identification of fetoscopic events. We adapt an existing CNN architecture for spatial feature extraction and integrated it with the LSTM network for end-to-end spatio-temporal inference. We introduce differential learning rates during the model training to effectively utilising the pre-trained CNN weights. This may support computer-assisted interventions (CAI) during fetoscopic laser photocoagulation. Results We perform quantitative evaluation of our method using 7 in vivo fetoscopic videos captured from different human TTTS cases. The total duration of these videos was 5551 s (138,780 frames). To test the robustness of the proposed approach, we perform 7-fold cross-validation where each video is treated as a hold-out or test set and training is performed using the remaining videos. Conclusion FetNet achieved superior performance compared to the existing CNN-based methods and provided improved inference because of the spatio-temporal information modelling. Online testing of FetNet, using a Tesla V100-DGXS-32GB GPU, achieved a frame rate of 114 fps. These results show that our method could potentially provide a real-time solution for CAI and automating occlusion and photocoagulation identification during fetoscopic procedures.


2021 ◽  
Vol 14 (8) ◽  
pp. 1289-1297
Author(s):  
Ziquan Fang ◽  
Lu Pan ◽  
Lu Chen ◽  
Yuntao Du ◽  
Yunjun Gao

Traffic prediction has drawn increasing attention for its ubiquitous real-life applications in traffic management, urban computing, public safety, and so on. Recently, the availability of massive trajectory data and the success of deep learning motivate a plethora of deep traffic prediction studies. However, the existing neural-network-based approaches tend to ignore the correlations between multiple types of moving objects located in the same spatio-temporal traffic area, which is suboptimal for traffic prediction analytics. In this paper, we propose a multi-source deep traffic prediction framework over spatio-temporal trajectory data, termed as MDTP. The framework includes two phases: spatio-temporal feature modeling and multi-source bridging. We present an enhanced graph convolutional network (GCN) model combined with long short-term memory network (LSTM) to capture the spatial dependencies and temporal dynamics of traffic in the feature modeling phase. In the multi-source bridging phase, we propose two methods, Sum and Concat, to connect the learned features from different trajectory data sources. Extensive experiments on two real-life datasets show that MDTP i) has superior efficiency, compared with classical time-series methods, machine learning methods, and state-of-the-art neural-network-based approaches; ii) offers a significant performance improvement over the single-source traffic prediction approach; and iii) performs traffic predictions in seconds even on tens of millions of trajectory data. we develop MDTP + , a user-friendly interactive system to demonstrate traffic prediction analysis.


2001 ◽  
Vol 10 (04) ◽  
pp. 715-734 ◽  
Author(s):  
SHU-CHING CHEN ◽  
MEI-LING SHYU ◽  
CHENGCUI ZHANG ◽  
R. L. KASHYAP

The identification of the overlapped objects is a great challenge in object tracking and video data indexing. For this purpose, a backtrack-chain-updation split algorithm is proposed to assist an unsupervised video segmentation method called the "simultaneous partition and class parameter estimation" (SPCPE) algorithm to identify the overlapped objects in the video sequence. The backtrack-chain-updation split algorithm can identify the split segment (object) and use the information in the current frame to update the previous frames in a backtrack-chain manner. The split algorithm provides more accurate temporal and spatial information of the semantic objects so that the semantic objects can be indexed and modeled by multimedia input strings and the multimedia augmented transition network (MATN) model. The MATN model is based on the ATN model that has been used in artificial intelligence (AI) areas for natural language understanding systems, and its inputs are modeled by the multimedia input strings. In this paper, we will show that the SPCPE algorithm together with the backtrack-chain-updation split algorithm can significantly enhance the efficiency of spatio-temporal video indexing by improving the accuracy of multimedia database queries related to semantic objects.


Author(s):  
Tobias Ross ◽  
David Zimmerer ◽  
Anant Vemuri ◽  
Fabian Isensee ◽  
Manuel Wiesenfarth ◽  
...  

2021 ◽  
Author(s):  
Yu Rang Park ◽  
Sang Ho Hwang ◽  
Yeonsoo Yu ◽  
Jichul Kim ◽  
Taeyeop Lee ◽  
...  

BACKGROUND Early detection and intervention of developmental disabilities (DDs) are critical for improving the long-term outcomes of the afflicted children. Mobile-based applications are easily accessible and may thus help the early identification of DDs. OBJECTIVE We aimed to identify facial expression and head pose based on face landmark data extracted from face recording videos and to differentiate the characteristics between children with DDs and those without. METHODS Eighty-nine children (DD, n=33; typically developing, n=56) were included in the analysis. Using the mobile-based application, we extracted facial landmarks and head poses from the recorded videos and performed Long Short-Term Memory(LSTM)-based DD classification. RESULTS Stratified k-fold cross-validation showed that the average values of accuracy, precision, recall, and f1-score of the LSTM based deep learning model of DD children were 88%, 91%,72%, and 80%, respectively. Through the interpretation of prediction results using SHapley Additive exPlanations (SHAP), we confirmed that the nodding head angle variable was the most important variable. All of the top 10 variables of importance had significant differences in the distribution between children with DDs and those without (p<0.05). CONCLUSIONS Our results provide preliminary evidence that the deep-learning classification model using mobile-based children’s video data could be used for the early detection of children with DDs.


Author(s):  
HyunKi Lee ◽  
Tejas G. Puranik ◽  
Dimitri N. Mavris

Abstract The maintenance and improvement of safety are among the most critical concerns in civil aviation operations. Due to the increased availability of data and improvements in computing power, applying artificial intelligence technologies to reduce risk in aviation safety has gained momentum. In this paper, a framework is developed to build a predictive model of future aircraft trajectory that can be utilized online to assist air crews in their decision-making during approach. Flight data parameters from the approach phase between certain approach altitudes (also called gates) are utilized for training an offline model that predicts the aircraft’s ground speed at future points. This model is developed by combining convolutional neural networks (CNNs) and long short-term memory (LSTM) layers. Due to the myriad of model combinations possible, hyperband algorithm is used to automate the hyperparameter tuning process to choose the best possible model. The validated offline model can then be used to predict the aircraft’s future states and provide decision-support to air crews. The method is demonstrated using publicly available Flight Operations Quality Assurance (FOQA) data from the National Aeronautics and Space Administration (NASA). The developed model can predict the ground speed at an accuracy between 1.27% and 2.69% relative root-mean-square error. A safety score is also evaluated considering the upper and lower bounds of variation observed within the available data set. Thus, the developed model represents an improved performance over existing techniques in literature and shows significant promise for decision-support in aviation operations.


Sign in / Sign up

Export Citation Format

Share Document