scholarly journals Detection and Segmentation of Mature Green Tomatoes Based on Mask R-CNN with Automatic Image Acquisition Approach

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7842
Author(s):  
Linlu Zu ◽  
Yanping Zhao ◽  
Jiuqin Liu ◽  
Fei Su ◽  
Yan Zhang ◽  
...  

Since the mature green tomatoes have color similar to branches and leaves, some are shaded by branches and leaves, and overlapped by other tomatoes, the accurate detection and location of these tomatoes is rather difficult. This paper proposes to use the Mask R-CNN algorithm for the detection and segmentation of mature green tomatoes. A mobile robot is designed to collect images round-the-clock and with different conditions in the whole greenhouse, thus, to make sure the captured dataset are not only objects with the interest of users. After the training process, RestNet50-FPN is selected as the backbone network. Then, the feature map is trained through the region proposal network to generate the region of interest (ROI), and the ROIAlign bilinear interpolation is used to calculate the target region, such that the corresponding region in the feature map is pooled to a fixed size based on the position coordinates of the preselection box. Finally, the detection and segmentation of mature green tomatoes is realized by the parallel actions of ROI target categories, bounding box regression and mask. When the Intersection over Union is equal to 0.5, the performance of the trained model is the best. The experimental results show that the F1-Score of bounding box and mask region all achieve 92.0%. The image acquisition processes are fully unobservable, without any user preselection, which are a highly heterogenic mix, the selected Mask R-CNN algorithm could also accurately detect mature green tomatoes. The performance of this proposed model in a real greenhouse harvesting environment is also evaluated, thus facilitating the direct application in a tomato harvesting robot.

Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1089 ◽  
Author(s):  
Ye Wang ◽  
Zhenyi Liu ◽  
Weiwen Deng

Region proposal network (RPN) based object detection, such as Faster Regions with CNN (Faster R-CNN), has gained considerable attention due to its high accuracy and fast speed. However, it has room for improvements when used in special application situations, such as the on-board vehicle detection. Original RPN locates multiscale anchors uniformly on each pixel of the last feature map and classifies whether an anchor is part of the foreground or background with one pixel in the last feature map. The receptive field of each pixel in the last feature map is fixed in the original faster R-CNN and does not coincide with the anchor size. Hence, only a certain part can be seen for large vehicles and too much useless information is contained in the feature for small vehicles. This reduces detection accuracy. Furthermore, the perspective projection results in the vehicle bounding box size becoming related to the bounding box position, thereby reducing the effectiveness and accuracy of the uniform anchor generation method. This reduces both detection accuracy and computing speed. After the region proposal stage, many regions of interest (ROI) are generated. The ROI pooling layer projects an ROI to the last feature map and forms a new feature map with a fixed size for final classification and box regression. The number of feature map pixels in the projected region can also influence the detection performance but this is not accurately controlled in former works. In this paper, the original faster R-CNN is optimized, especially for the on-board vehicle detection. This paper tries to solve these above-mentioned problems. The proposed method is tested on the KITTI dataset and the result shows a significant improvement without too many tricky parameter adjustments and training skills. The proposed method can also be used on other objects with obvious foreshortening effects, such as on-board pedestrian detection. The basic idea of the proposed method does not rely on concrete implementation and thus, most deep learning based object detectors with multiscale feature maps can be optimized with it.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2547 ◽  
Author(s):  
Wenxin Dai ◽  
Yuqing Mao ◽  
Rongao Yuan ◽  
Yijing Liu ◽  
Xuemei Pu ◽  
...  

Convolution neural network (CNN)-based detectors have shown great performance on ship detections of synthetic aperture radar (SAR) images. However, the performance of current models has not been satisfactory enough for detecting multiscale ships and small-size ones in front of complex backgrounds. To address the problem, we propose a novel SAR ship detector based on CNN, which consist of three subnetworks: the Fusion Feature Extractor Network (FFEN), Region Proposal Network (RPN), and Refine Detection Network (RDN). Instead of using a single feature map, we fuse feature maps in bottom–up and top–down ways and generate proposals from each fused feature map in FFEN. Furthermore, we further merge features generated by the region-of-interest (RoI) pooling layer in RDN. Based on the feature representation strategy, the CNN framework constructed can significantly enhance the location and semantics information for the multiscale ships, in particular for the small ships. On the other hand, the residual block is introduced to increase the network depth, through which the detection precision could be further improved. The public SAR ship dataset (SSDD) and China Gaofen-3 satellite SAR image are used to validate the proposed method. Our method shows excellent performance for detecting the multiscale and small-size ships with respect to some competitive models and exhibits high potential in practical application.


2021 ◽  
Vol 6 (1) ◽  
pp. e000898
Author(s):  
Andrea Peroni ◽  
Anna Paviotti ◽  
Mauro Campigotto ◽  
Luis Abegão Pinto ◽  
Carlo Alberto Cutolo ◽  
...  

ObjectiveTo develop and test a deep learning (DL) model for semantic segmentation of anatomical layers of the anterior chamber angle (ACA) in digital gonio-photographs.Methods and analysisWe used a pilot dataset of 274 ACA sector images, annotated by expert ophthalmologists to delineate five anatomical layers: iris root, ciliary body band, scleral spur, trabecular meshwork and cornea. Narrow depth-of-field and peripheral vignetting prevented clinicians from annotating part of each image with sufficient confidence, introducing a degree of subjectivity and features correlation in the ground truth. To overcome these limitations, we present a DL model, designed and trained to perform two tasks simultaneously: (1) maximise the segmentation accuracy within the annotated region of each frame and (2) identify a region of interest (ROI) based on local image informativeness. Moreover, our calibrated model provides results interpretability returning pixel-wise classification uncertainty through Monte Carlo dropout.ResultsThe model was trained and validated in a 5-fold cross-validation experiment on ~90% of available data, achieving ~91% average segmentation accuracy within the annotated part of each ground truth image of the hold-out test set. An appropriate ROI was successfully identified in all test frames. The uncertainty estimation module located correctly inaccuracies and errors of segmentation outputs.ConclusionThe proposed model improves the only previously published work on gonio-photographs segmentation and may be a valid support for the automatic processing of these images to evaluate local tissue morphology. Uncertainty estimation is expected to facilitate acceptance of this system in clinical settings.


2020 ◽  
Vol 2020 ◽  
pp. 1-16
Author(s):  
Zhuofu Deng ◽  
Binbin Wang ◽  
Zhiliang Zhu

Maxillary sinus segmentation plays an important role in the choice of therapeutic strategies for nasal disease and treatment monitoring. Difficulties in traditional approaches deal with extremely heterogeneous intensity caused by lesions, abnormal anatomy structures, and blurring boundaries of cavity. 2D and 3D deep convolutional neural networks have grown popular in medical image segmentation due to utilization of large labeled datasets to learn discriminative features. However, for 3D segmentation in medical images, 2D networks are not competent in extracting more significant spacial features, and 3D ones suffer from unbearable burden of computation, which results in great challenges to maxillary sinus segmentation. In this paper, we propose a deep neural network with an end-to-end manner to generalize a fully automatic 3D segmentation. At first, our proposed model serves a symmetrical encoder-decoder architecture for multitask of bounding box estimation and in-region 3D segmentation, which cannot reduce excessive computation requirements but eliminate false positives remarkably, promoting 3D segmentation applied in 3D convolutional neural networks. In addition, an overestimation strategy is presented to avoid overfitting phenomena in conventional multitask networks. Meanwhile, we introduce residual dense blocks to increase the depth of the proposed network and attention excitation mechanism to improve the performance of bounding box estimation, both of which bring little influence to computation cost. Especially, the structure of multilevel feature fusion in the pyramid network strengthens the ability of identification to global and local discriminative features in foreground and background achieving more advanced segmentation results. At last, to address problems of blurring boundary and class imbalance in medical images, a hybrid loss function is designed for multiple tasks. To illustrate the strength of our proposed model, we evaluated it against the state-of-the-art methods. Our model performed better significantly with an average Dice 0.947±0.031, VOE 10.23±5.29, and ASD 2.86±2.11, respectively, which denotes a promising technique with strong robust in practice.


Diagnostics ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 2184
Author(s):  
Roopa S. Rao ◽  
Divya B. Shivanna ◽  
Kirti S. Mahadevpur ◽  
Sinchana G. Shivaramegowda ◽  
Spoorthi Prakash ◽  
...  

Background: The goal of the study was to create a histopathology image classification automation system that could identify odontogenic keratocysts in hematoxylin and eosin-stained jaw cyst sections. Methods: From 54 odontogenic keratocysts, 23 dentigerous cysts, and 20 radicular cysts, about 2657 microscopic pictures with 400× magnification were obtained. The images were annotated by a pathologist and categorized into epithelium, cystic lumen, and stroma of keratocysts and non-keratocysts. Preprocessing was performed in two steps; the first is data augmentation, as the Deep Learning techniques (DLT) improve their performance with increased data size. Secondly, the epithelial region was selected as the region of interest. Results: Four experiments were conducted using the DLT. In the first, a pre-trained VGG16 was employed to classify after-image augmentation. In the second, DenseNet-169 was implemented for image classification on the augmented images. In the third, DenseNet-169 was trained on the two-step preprocessed images. In the last experiment, two and three results were averaged to obtain an accuracy of 93% on OKC and non-OKC images. Conclusions: The proposed algorithm may fit into the automation system of OKC and non-OKC diagnosis. Utmost care was taken in the manual process of image acquisition (minimum 28–30 images/slide at 40× magnification covering the entire stretch of epithelium and stromal component). Further, there is scope to improve the accuracy rate and make it human bias free by using a whole slide imaging scanner for image acquisition from slides.


Author(s):  
Zhenzhong Chen ◽  
Wanjie Sun

Predicting scanpath when a certain stimulus is presented plays an important role in modeling visual attention and search. This paper presents a model that integrates convolutional neural network and long short-term memory (LSTM) to generate realistic scanpaths. The core part of the proposed model is a dual LSTM unit, i.e., an inhibition of return LSTM (IOR-LSTM) and a region of interest LSTM (ROI-LSTM), capturing IOR dynamics and gaze shift behavior simultaneously. IOR-LSTM simulates the visual working memory to adaptively integrate and forget scene information. ROI-LSTM is responsible for predicting the next ROI given the inhibited image features. Experimental results indicate that the proposed architecture can achieve superior performance in predicting scanpaths.


2020 ◽  
Vol 64 (2) ◽  
pp. 20507-1-20507-10 ◽  
Author(s):  
Hee-Jin Yu ◽  
Chang-Hwan Son ◽  
Dong Hyuk Lee

Abstract Traditional approaches for the identification of leaf diseases involve the use of handcrafted features such as colors and textures for feature extraction. Therefore, these approaches may have limitations in extracting abundant and discriminative features. Although deep learning approaches have been recently introduced to overcome the shortcomings of traditional approaches, existing deep learning models such as VGG and ResNet have been used in these approaches. This indicates that the approach can be further improved to increase the discriminative power because the spatial attention mechanism to predict the background and spot areas (i.e., local areas with leaf diseases) has not been considered. Therefore, a new deep learning architecture, which is hereafter referred to as region-of-interest-aware deep convolutional neural network (ROI-aware DCNN), is proposed to make deep features more discriminative and increase classification performance. The primary idea is that leaf disease symptoms appear in leaf area, whereas the background region does not contain useful information regarding leaf diseases. To realize this, two subnetworks are designed. One subnetwork is the ROI subnetwork to provide more discriminative features from the background, leaf areas, and spot areas in the feature map. The other subnetwork is the classification subnetwork to increase the classification accuracy. To train the ROI-aware DCNN, the ROI subnetwork is first learned with a new image set containing the ground truth images where the background, leaf area, and spot area are divided. Subsequently, the entire network is trained in an end-to-end manner to connect the ROI subnetwork with the classification subnetwork through a concatenation layer. The experimental results confirm that the proposed ROI-aware DCNN can increase the discriminative power by predicting the areas in the feature map that are more important for leaf diseases identification. The results prove that the proposed method surpasses conventional state-of-the-art methods such as VGG, ResNet, SqueezeNet, bilinear model, and multiscale-based deep feature extraction and pooling.


Author(s):  
Warinthorn Kiadtikornthaweeyot ◽  
Adrian R. L. Tatnall

High resolution satellite imaging is considered as the outstanding applicant to extract the Earth’s surface information. Extraction of a feature of an image is very difficult due to having to find the appropriate image segmentation techniques and combine different methods to detect the Region of Interest (ROI) most effectively. This paper proposes techniques to classify objects in the satellite image by using image processing methods on high-resolution satellite images. The systems to identify the ROI focus on forests, urban and agriculture areas. The proposed system is based on histograms of the image to classify objects using thresholding. The thresholding is performed by considering the behaviour of the histogram mapping to a particular region in the satellite image. The proposed model is based on histogram segmentation and morphology techniques. There are five main steps supporting each other; Histogram classification, Histogram segmentation, Morphological dilation, Morphological fill image area and holes and ROI management. The methods to detect the ROI of the satellite images based on histogram classification have been studied, implemented and tested. The algorithm is be able to detect the area of forests, urban and agriculture separately. The image segmentation methods can detect the ROI and reduce the size of the original image by discarding the unnecessary parts.


2020 ◽  
Vol 34 (01) ◽  
pp. 1161-1168
Author(s):  
Zeng Rui ◽  
Ge Zongyuan ◽  
Denman Simon ◽  
Sridharan Sridha ◽  
Fookes Clinton

We present a novel learning framework for vehicle recognition from a single RGB image. Unlike existing methods which only use attention mechanisms to locate 2D discriminative information, our work learns a novel 3D perspective feature representation of a vehicle, which is then fused with 2D appearance feature to predict the category. The framework is composed of a global network (GN), a 3D perspective network (3DPN), and a fusion network. The GN is used to locate the region of interest (RoI) and generate the 2D global feature. With the assistance of the RoI, the 3DPN estimates the 3D bounding box under the guidance of the proposed vanishing point loss, which provides a perspective geometry constraint. Then the proposed 3D representation is generated by eliminating the viewpoint variance of the 3D bounding box using perspective transformation. Finally, the 3D and 2D feature are fused to predict the category of the vehicle. We present qualitative and quantitative results on the vehicle classification and verification tasks in the BoxCars dataset. The results demonstrate that, by learning such a concise 3D representation, we can achieve superior performance to methods that only use 2D information while retain 3D meaningful information without the challenge of requiring a 3D CAD model.


Author(s):  
Nayana R. Shenoy ◽  
Anand Jatti

<p><span id="docs-internal-guid-cea63826-7fff-8080-83de-ad2ba4604953"><span>Thyroid nodule are fluid or solid lump that are formed within human’s gland and most thyroid nodule doesn’t show any symptom or any sign; moreover there are certain percentage of thyroid gland are cancerous and which could lead human into critical situation up to death. Hence, it is one of the important type of cancer and also it is important for detection of cancer. Ultrasound imaging is widely popular and frequently used tool for diagnosing thyroid cancer, however considering the wide application in clinical area such estimating size, shape and position of thyroid cancer. Further, it is important to design automatic and absolute segmentation for better detection and efficient diagnosis based on US-image. Segmentation of thyroid gland from the ultrasound image is quiet challenging task due to inhomogeneous structure and similar existence of intestine. Thyroid nodule can appear anywhere and have any kind of contrast, shape and size, hence segmentation process needs to designed carefully; several researcher have worked in designing the segmentation mechanism, however most of them were either semi-automatic or lack with performance metric, however it was suggested that U-Net possesses great accuracy. Hence, in this paper, we proposed improvised U-Net which focuses on shortcoming of U-Net, the main aim of this research work is to find the probable Region of interest and segment further. Furthermore, we develop High level and low-level feature map to avoid the low-resolution problem and information; later we develop dropout layer for further optimization. Moreover proposed model is evaluated considering the important metrics such as accuracy, Dice Coefficient, AUC, F1-measure and true positive; our proposed model performs better than the existing model. </span></span></p>


Sign in / Sign up

Export Citation Format

Share Document