scholarly journals Testing the Ability of Convolutional Neural Networks to Learn Radiomic Features

Author(s):  
Ivan S. Klyuzhin ◽  
Yixi Xu ◽  
Anthony Ortiz ◽  
Juan M. Lavista Ferres ◽  
Ghassan Hamarneh ◽  
...  

Purpose: To test the ability of convolutional neural networks (CNNs) to effectively capture the intensity, shape, and texture properties of tumors as defined by standardized radiomic features. Methods: Standard 2D and 3D CNN architectures with an increasing number of convolutional layers (up to 9) were trained to predict the values of 16 standardized radiomic features from synthetic images of tumors, and tested. In addition, several ImageNet-pretrained state-of-the-art networks were tested. The synthetic images replicated the quality of real PET images. A total of 4000 images were used for training, 500 for validation, and 500 for testing. Results: Radiomic features quantifying tumor size and intensity were predicted with high accuracy, while shape irregularity features had very high prediction errors and generalized poorly between training and test sets. For example, mean normalized prediction error of tumor diameter (mean intensity) with a 5-layer 2D CNN was 4.23 ± 0.25 (1.88 ± 0.07), while the error for tumor sphericity was 15.64 ± 0.93. Similarly-high error values were found with other shape irregularity and heterogeneity features, both with standard and state-of-the-art networks. Conclusions: Standard CNN architectures and ImageNet-pretrained advanced networks have a significantly lower capacity to capture tumor shape and heterogeneity properties compared to other features. Our findings imply that CNNs trained end-to-end for clinical outcome prediction and other tasks may under-utilize tumor shape and texture information. We hypothesize, that to improve CNN performance, these radiomic features can be computed explicitly and added as auxiliary variables to the dense layers in the networks, or as additional input channels.

Author(s):  
Jorge F. Lazo ◽  
Aldo Marzullo ◽  
Sara Moccia ◽  
Michele Catellani ◽  
Benoit Rosa ◽  
...  

Abstract Purpose Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs). Methods The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks ($$m_1$$ m 1 ) and Mask-RCNN ($$m_2$$ m 2 ), which are fed with single still-frames I(t). The other two models ($$M_1$$ M 1 , $$M_2$$ M 2 ) are modifications of the former ones consisting on the addition of a stage which makes use of 3D convolutions to process temporal information. $$M_1$$ M 1 , $$M_2$$ M 2 are fed with triplets of frames ($$I(t-1)$$ I ( t - 1 ) , I(t), $$I(t+1)$$ I ( t + 1 ) ) to produce the segmentation for I(t). Results The proposed method was evaluated using a custom dataset of 11 videos (2673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods. Conclusion The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in the presence of poor visibility, occasional bleeding, or specular reflections.


Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 624
Author(s):  
Stefan Rohrmanstorfer ◽  
Mikhail Komarov ◽  
Felix Mödritscher

With the always increasing amount of image data, it has become a necessity to automatically look for and process information in these images. As fashion is captured in images, the fashion sector provides the perfect foundation to be supported by the integration of a service or application that is built on an image classification model. In this article, the state of the art for image classification is analyzed and discussed. Based on the elaborated knowledge, four different approaches will be implemented to successfully extract features out of fashion data. For this purpose, a human-worn fashion dataset with 2567 images was created, but it was significantly enlarged by the performed image operations. The results show that convolutional neural networks are the undisputed standard for classifying images, and that TensorFlow is the best library to build them. Moreover, through the introduction of dropout layers, data augmentation and transfer learning, model overfitting was successfully prevented, and it was possible to incrementally improve the validation accuracy of the created dataset from an initial 69% to a final validation accuracy of 84%. More distinct apparel like trousers, shoes and hats were better classified than other upper body clothes.


2020 ◽  
Vol 2 (1) ◽  
pp. 23-36
Author(s):  
Syed Aamir Ali Shah ◽  
Muhammad Asif Manzoor ◽  
Abdul Bais

Forest structure estimation is very important in geological, ecological and environmental studies. It provides the basis for the carbon stock estimation and effective means of sequestration of carbon sources and sinks. Multiple parameters are used to estimate the forest structure like above ground biomass, leaf area index and diameter at breast height. Among all these parameters, vegetation height has unique standing. In addition to forest structure estimation it provides the insight into long term historical changes and the estimates of stand age of the forests as well. There are multiple techniques available to estimate the canopy height. Light detection and ranging (LiDAR) based methods, being the accurate and useful ones, are very expensive to obtain and have no global coverage. There is a need to establish a mechanism to estimate the canopy height using freely available satellite imagery like Landsat images. Multiple studies are available which contribute in this area. The majority use Landsat images with random forest models. Although random forest based models are widely used in remote sensing applications, they lack the ability to utilize the spatial association of neighboring pixels in modeling process. In this research work, we define Convolutional Neural Network based model and analyze that model for three test configurations. We replicate the random forest based setup of Grant et al., which is a similar state-of-the-art study, and compare our results and show that the convolutional neural networks (CNN) based models not only capture the spatial association of neighboring pixels but also outperform the state-of-the-art.


2017 ◽  
Vol 25 (1) ◽  
pp. 93-98 ◽  
Author(s):  
Yuan Luo ◽  
Yu Cheng ◽  
Özlem Uzuner ◽  
Peter Szolovits ◽  
Justin Starren

Abstract We propose Segment Convolutional Neural Networks (Seg-CNNs) for classifying relations from clinical notes. Seg-CNNs use only word-embedding features without manual feature engineering. Unlike typical CNN models, relations between 2 concepts are identified by simultaneously learning separate representations for text segments in a sentence: preceding, concept1, middle, concept2, and succeeding. We evaluate Seg-CNN on the i2b2/VA relation classification challenge dataset. We show that Seg-CNN achieves a state-of-the-art micro-average F-measure of 0.742 for overall evaluation, 0.686 for classifying medical problem–treatment relations, 0.820 for medical problem–test relations, and 0.702 for medical problem–medical problem relations. We demonstrate the benefits of learning segment-level representations. We show that medical domain word embeddings help improve relation classification. Seg-CNNs can be trained quickly for the i2b2/VA dataset on a graphics processing unit (GPU) platform. These results support the use of CNNs computed over segments of text for classifying medical relations, as they show state-of-the-art performance while requiring no manual feature engineering.


Electronics ◽  
2019 ◽  
Vol 8 (6) ◽  
pp. 641 ◽  
Author(s):  
Miguel Rivera-Acosta ◽  
Susana Ortega-Cisneros ◽  
Jorge Rivera

This paper presents a platform that automatically generates custom hardware accelerators for convolutional neural networks (CNNs) implemented in field-programmable gate array (FPGA) devices. It includes a user interface for configuring and managing these accelerators. The herein-presented platform can perform all the processes necessary to design and test CNN accelerators from the CNN architecture description at both layer and internal parameter levels, training the desired architecture with any dataset and generating the configuration files required by the platform. With these files, it can synthesize the register-transfer level (RTL) and program the customized CNN accelerator into the FPGA device for testing, making it possible to generate custom CNN accelerators quickly and easily. All processes save the CNN architecture description are fully automatized and carried out by the platform, which manages third-party software to train the CNN and synthesize and program the generated RTL. The platform has been tested with the implementation of some of the CNN architectures found in the state-of-the-art for freely available datasets such as MNIST, CIFAR-10, and STL-10.


2019 ◽  
Vol 8 (6) ◽  
pp. 243 ◽  
Author(s):  
Yong Han ◽  
Shukang Wang ◽  
Yibin Ren ◽  
Cheng Wang ◽  
Peng Gao ◽  
...  

Predicting the passenger flow of metro networks is of great importance for traffic management and public safety. However, such predictions are very challenging, as passenger flow is affected by complex spatial dependencies (nearby and distant) and temporal dependencies (recent and periodic). In this paper, we propose a novel deep-learning-based approach, named STGCNNmetro (spatiotemporal graph convolutional neural networks for metro), to collectively predict two types of passenger flow volumes—inflow and outflow—in each metro station of a city. Specifically, instead of representing metro stations by grids and employing conventional convolutional neural networks (CNNs) to capture spatiotemporal dependencies, STGCNNmetro transforms the city metro network to a graph and makes predictions using graph convolutional neural networks (GCNNs). First, we apply stereogram graph convolution operations to seamlessly capture the irregular spatiotemporal dependencies along the metro network. Second, a deep structure composed of GCNNs is constructed to capture the distant spatiotemporal dependencies at the citywide level. Finally, we integrate three temporal patterns (recent, daily, and weekly) and fuse the spatiotemporal dependencies captured from these patterns to form the final prediction values. The STGCNNmetro model is an end-to-end framework which can accept raw passenger flow-volume data, automatically capture the effective features of the citywide metro network, and output predictions. We test this model by predicting the short-term passenger flow volume in the citywide metro network of Shanghai, China. Experiments show that the STGCNNmetro model outperforms seven well-known baseline models (LSVR, PCA-kNN, NMF-kNN, Bayesian, MLR, M-CNN, and LSTM). We additionally explore the sensitivity of the model to its parameters and discuss the distribution of prediction errors.


2020 ◽  
Vol 12 (7) ◽  
pp. 1070 ◽  
Author(s):  
Somayeh Nezami ◽  
Ehsan Khoramshahi ◽  
Olli Nevalainen ◽  
Ilkka Pölönen ◽  
Eija Honkavaara

Interest in drone solutions in forestry applications is growing. Using drones, datasets can be captured flexibly and at high spatial and temporal resolutions when needed. In forestry applications, fundamental tasks include the detection of individual trees, tree species classification, biomass estimation, etc. Deep neural networks (DNN) have shown superior results when comparing with conventional machine learning methods such as multi-layer perceptron (MLP) in cases of huge input data. The objective of this research is to investigate 3D convolutional neural networks (3D-CNN) to classify three major tree species in a boreal forest: pine, spruce, and birch. The proposed 3D-CNN models were employed to classify tree species in a test site in Finland. The classifiers were trained with a dataset of 3039 manually labelled trees. Then the accuracies were assessed by employing independent datasets of 803 records. To find the most efficient set of feature combination, we compare the performances of 3D-CNN models trained with hyperspectral (HS) channels, Red-Green-Blue (RGB) channels, and canopy height model (CHM), separately and combined. It is demonstrated that the proposed 3D-CNN model with RGB and HS layers produces the highest classification accuracy. The producer accuracy of the best 3D-CNN classifier on the test dataset were 99.6%, 94.8%, and 97.4% for pines, spruces, and birches, respectively. The best 3D-CNN classifier produced ~5% better classification accuracy than the MLP with all layers. Our results suggest that the proposed method provides excellent classification results with acceptable performance metrics for HS datasets. Our results show that pine class was detectable in most layers. Spruce was most detectable in RGB data, while birch was most detectable in the HS layers. Furthermore, the RGB datasets provide acceptable results for many low-accuracy applications.


2020 ◽  
Vol 34 (01) ◽  
pp. 303-311 ◽  
Author(s):  
Sicheng Zhao ◽  
Yunsheng Ma ◽  
Yang Gu ◽  
Jufeng Yang ◽  
Tengfei Xing ◽  
...  

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.


2019 ◽  
Vol 9 (11) ◽  
pp. 2347 ◽  
Author(s):  
Hannah Kim ◽  
Young-Seob Jeong

As the number of textual data is exponentially increasing, it becomes more important to develop models to analyze the text data automatically. The texts may contain various labels such as gender, age, country, sentiment, and so forth. Using such labels may bring benefits to some industrial fields, so many studies of text classification have appeared. Recently, the Convolutional Neural Network (CNN) has been adopted for the task of text classification and has shown quite successful results. In this paper, we propose convolutional neural networks for the task of sentiment classification. Through experiments with three well-known datasets, we show that employing consecutive convolutional layers is effective for relatively longer texts, and our networks are better than other state-of-the-art deep learning models.


Sign in / Sign up

Export Citation Format

Share Document