scholarly journals Automatic Detection of the Pharyngeal Phase in Raw Videos for the Videofluoroscopic Swallowing Study Using Efficient Data Collection and 3D Convolutional Networks †

Sensors ◽  
2019 ◽  
Vol 19 (18) ◽  
pp. 3873 ◽  
Author(s):  
Jong Taek Lee ◽  
Eunhee Park ◽  
Tae-Du Jung

Videofluoroscopic swallowing study (VFSS) is a standard diagnostic tool for dysphagia. To detect the presence of aspiration during a swallow, a manual search is commonly used to mark the time intervals of the pharyngeal phase on the corresponding VFSS image. In this study, we present a novel approach that uses 3D convolutional networks to detect the pharyngeal phase in raw VFSS videos without manual annotations. For efficient collection of training data, we propose a cascade framework which no longer requires time intervals of the swallowing process nor the manual marking of anatomical positions for detection. For video classification, we applied the inflated 3D convolutional network (I3D), one of the state-of-the-art network for action classification, as a baseline architecture. We also present a modified 3D convolutional network architecture that is derived from the baseline I3D architecture. The classification and detection performance of these two architectures were evaluated for comparison. The experimental results show that the proposed model outperformed the baseline I3D model in the condition where both models are trained with random weights. We conclude that the proposed method greatly reduces the examination time of the VFSS images with a low miss rate.

2019 ◽  
Vol 4 (2) ◽  
pp. 57-62
Author(s):  
Julisa Bana Abraham

The convolutional neural network is commonly used for classification. However, convolutional networks can also be used for semantic segmentation using the fully convolutional network approach. U-Net is one example of a fully convolutional network architecture capable of producing accurate segmentation on biomedical images. This paper proposes to use U-Net for Plasmodium segmentation on thin blood smear images. The evaluation shows that U-Net can accurately perform Plasmodium segmentation on thin blood smear images, besides this study also compares the three loss functions, namely mean-squared error, binary cross-entropy, and Huber loss. The results show that Huber loss has the best testing metrics: 0.9297, 0.9715, 0.8957, 0.9096 for F1 score, positive predictive value (PPV), sensitivity (SE), and relative segmentation accuracy (RSA), respectively.


Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1131 ◽  
Author(s):  
Li ◽  
Zhao ◽  
Zhang ◽  
Hu

Recently, many researchers have attempted to use convolutional neural networks (CNNs) for wildfire smoke detection. However, the application of CNNs in wildfire smoke detection still faces several issues, e.g., the high false-alarm rate of detection and the imbalance of training data. To address these issues, we propose a novel framework integrating conventional methods into CNN for wildfire smoke detection, which consisted of a candidate smoke region segmentation strategy and an advanced network architecture, namely wildfire smoke dilated DenseNet (WSDD-Net). Candidate smoke region segmentation removed the complex backgrounds of the wildfire smoke images. The proposed WSDD-Net achieved multi-scale feature extraction by combining dilated convolutions with dense block. In order to solve the problem of the dataset imbalance, an improved cross entropy loss function, namely balanced cross entropy (BCE), was used instead of the original cross entropy loss function in the training process. The proposed WSDD-Net was evaluated according to two smoke datasets, i.e., WS and Yuan, and achieved a high AR (99.20%) and a low FAR (0.24%). The experimental results demonstrated that the proposed framework had better detection capabilities under different negative sample interferences.


Author(s):  
Hao Chen ◽  
Fuzhen Zhuang ◽  
Li Xiao ◽  
Ling Ma ◽  
Haiyan Liu ◽  
...  

Recently, Graph Convolutional Networks (GCNs) have proven to be a powerful mean for Computer Aided Diagnosis (CADx). This approach requires building a population graph to aggregate structural information, where the graph adjacency matrix represents the relationship between nodes. Until now, this adjacency matrix is usually defined manually based on phenotypic information. In this paper, we propose an encoder that automatically selects the appropriate phenotypic measures according to their spatial distribution, and uses the text similarity awareness mechanism to calculate the edge weights between nodes. The encoder can automatically construct the population graph using phenotypic measures which have a positive impact on the final results, and further realizes the fusion of multimodal information. In addition, a novel graph convolution network architecture using multi-layer aggregation mechanism is proposed. The structure can obtain deep structure information while suppressing over-smooth, and increase the similarity between the same type of nodes. Experimental results on two databases show that our method can significantly improve the diagnostic accuracy for Autism spectrum disorder and breast cancer, indicating its universality in leveraging multimodal data for disease prediction.


Author(s):  
Xun Liu ◽  
Fangyuan Lei ◽  
Guoqing Xia

AbstractGraph convolutional networks (GCNs) have become the de facto approaches and achieved state-of-the-art results for circumventing many real-world problems on graph-structured data. However, these networks are usually shallow due to the over-smoothing of GCNs with many layers, which limits the expressive power of learning graph representations. The current methods of solving the limitations have the bottlenecks of high complexity and many parameters. Although Simple Graph Convolution (SGC) reduces the complexity and parameters, it fails to distinguish the feature information of neighboring nodes at different distances. To tackle the limits, we propose MulStepNET, a stronger multi-step graph convolutional network architecture, that can capture more global information, by simultaneously combining multi-step neighborhoods information. When compared to existing methods such as GCN and MixHop, MulStepNET aggregates neighborhoods information at more distant distances via multi-power adjacency matrix while fitting fewest parameters and being computationally more efficient. Experiments on citation networks including Pubmed, Cora, and Citeseer demonstrate that the proposed MulStepNET model improves over SGC by 2.8, 3.3, and 2.1% respectively while keeping similar stability, and achieves better performance in terms of accuracy and stability compared to other baselines.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


Author(s):  
Haitham Baomar ◽  
Peter J. Bentley

AbstractWe describe the Intelligent Autopilot System (IAS), a fully autonomous autopilot capable of piloting large jets such as airliners by learning from experienced human pilots using Artificial Neural Networks. The IAS is capable of autonomously executing the required piloting tasks and handling the different flight phases to fly an aircraft from one airport to another including takeoff, climb, cruise, navigate, descent, approach, and land in simulation. In addition, the IAS is capable of autonomously landing large jets in the presence of extreme weather conditions including severe crosswind, gust, wind shear, and turbulence. The IAS is a potential solution to the limitations and robustness problems of modern autopilots such as the inability to execute complete flights, the inability to handle extreme weather conditions especially during approach and landing where the aircraft’s speed is relatively low, and the uncertainty factor is high, and the pilots shortage problem compared to the increasing aircraft demand. In this paper, we present the work done by collaborating with the aviation industry to provide training data for the IAS to learn from. The training data is used by Artificial Neural Networks to generate control models automatically. The control models imitate the skills of the human pilot when executing all the piloting tasks required to pilot an aircraft between two airports. In addition, we introduce new ANNs trained to control the aircraft’s elevators, elevators’ trim, throttle, flaps, and new ailerons and rudder ANNs to counter the effects of extreme weather conditions and land safely. Experiments show that small datasets containing single demonstrations are sufficient to train the IAS and achieve excellent performance by using clearly separable and traceable neural network modules which eliminate the black-box problem of large Artificial Intelligence methods such as Deep Learning. In addition, experiments show that the IAS can handle landing in extreme weather conditions beyond the capabilities of modern autopilots and even experienced human pilots. The proposed IAS is a novel approach towards achieving full control autonomy of large jets using ANN models that match the skills and abilities of experienced human pilots and beyond.


2021 ◽  
Vol 11 (15) ◽  
pp. 7148
Author(s):  
Bedada Endale ◽  
Abera Tullu ◽  
Hayoung Shi ◽  
Beom-Soo Kang

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2852
Author(s):  
Parvathaneni Naga Srinivasu ◽  
Jalluri Gnana SivaSai ◽  
Muhammad Fazal Ijaz ◽  
Akash Kumar Bhoi ◽  
Wonjoon Kim ◽  
...  

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.


2021 ◽  
Vol 4 (1) ◽  
pp. 3
Author(s):  
Parag Narkhede ◽  
Rahee Walambe ◽  
Shruti Mandaokar ◽  
Pulkit Chandel ◽  
Ketan Kotecha ◽  
...  

With the rapid industrialization and technological advancements, innovative engineering technologies which are cost effective, faster and easier to implement are essential. One such area of concern is the rising number of accidents happening due to gas leaks at coal mines, chemical industries, home appliances etc. In this paper we propose a novel approach to detect and identify the gaseous emissions using the multimodal AI fusion techniques. Most of the gases and their fumes are colorless, odorless, and tasteless, thereby challenging our normal human senses. Sensing based on a single sensor may not be accurate, and sensor fusion is essential for robust and reliable detection in several real-world applications. We manually collected 6400 gas samples (1600 samples per class for four classes) using two specific sensors: the 7-semiconductor gas sensors array, and a thermal camera. The early fusion method of multimodal AI, is applied The network architecture consists of a feature extraction module for individual modality, which is then fused using a merged layer followed by a dense layer, which provides a single output for identifying the gas. We obtained the testing accuracy of 96% (for fused model) as opposed to individual model accuracies of 82% (based on Gas Sensor data using LSTM) and 93% (based on thermal images data using CNN model). Results demonstrate that the fusion of multiple sensors and modalities outperforms the outcome of a single sensor.


Sign in / Sign up

Export Citation Format

Share Document