scholarly journals Geometric property-based convolutional neural network for indoor object detection

2021 ◽  
Vol 18 (1) ◽  
pp. 172988142199332
Author(s):  
Xintao Ding ◽  
Boquan Li ◽  
Jinbao Wang

Indoor object detection is a very demanding and important task for robot applications. Object knowledge, such as two-dimensional (2D) shape and depth information, may be helpful for detection. In this article, we focus on region-based convolutional neural network (CNN) detector and propose a geometric property-based Faster R-CNN method (GP-Faster) for indoor object detection. GP-Faster incorporates geometric property in Faster R-CNN to improve the detection performance. In detail, we first use mesh grids that are the intersections of direct and inverse proportion functions to generate appropriate anchors for indoor objects. After the anchors are regressed to the regions of interest produced by a region proposal network (RPN-RoIs), we then use 2D geometric constraints to refine the RPN-RoIs, in which the 2D constraint of every classification is a convex hull region enclosing the width and height coordinates of the ground-truth boxes on the training set. Comparison experiments are implemented on two indoor datasets SUN2012 and NYUv2. Since the depth information is available in NYUv2, we involve depth constraints in GP-Faster and propose 3D geometric property-based Faster R-CNN (DGP-Faster) on NYUv2. The experimental results show that both GP-Faster and DGP-Faster increase the performance of the mean average precision.

2020 ◽  
Vol 10 (6) ◽  
pp. 2104
Author(s):  
Michał Tomaszewski ◽  
Paweł Michalski ◽  
Jakub Osuchowski

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.


2019 ◽  
Vol 11 (3) ◽  
pp. 286 ◽  
Author(s):  
Jiangqiao Yan ◽  
Hongqi Wang ◽  
Menglong Yan ◽  
Wenhui Diao ◽  
Xian Sun ◽  
...  

Recently, methods based on Faster region-based convolutional neural network (R-CNN)have been popular in multi-class object detection in remote sensing images due to their outstandingdetection performance. The methods generally propose candidate region of interests (ROIs) througha region propose network (RPN), and the regions with high enough intersection-over-union (IoU)values against ground truth are treated as positive samples for training. In this paper, we find thatthe detection result of such methods is sensitive to the adaption of different IoU thresholds. Specially,detection performance of small objects is poor when choosing a normal higher threshold, while alower threshold will result in poor location accuracy caused by a large quantity of false positives.To address the above issues, we propose a novel IoU-Adaptive Deformable R-CNN framework formulti-class object detection. Specially, by analyzing the different roles that IoU can play in differentparts of the network, we propose an IoU-guided detection framework to reduce the loss of small objectinformation during training. Besides, the IoU-based weighted loss is designed, which can learn theIoU information of positive ROIs to improve the detection accuracy effectively. Finally, the class aspectratio constrained non-maximum suppression (CARC-NMS) is proposed, which further improves theprecision of the results. Extensive experiments validate the effectiveness of our approach and weachieve state-of-the-art detection performance on the DOTA dataset.


2020 ◽  
Vol 21 (S1) ◽  
Author(s):  
Dina Abdelhafiz ◽  
Jinbo Bi ◽  
Reda Ammar ◽  
Clifford Yang ◽  
Sheida Nabavi

Abstract Background Automatic segmentation and localization of lesions in mammogram (MG) images are challenging even with employing advanced methods such as deep learning (DL) methods. We developed a new model based on the architecture of the semantic segmentation U-Net model to precisely segment mass lesions in MG images. The proposed end-to-end convolutional neural network (CNN) based model extracts contextual information by combining low-level and high-level features. We trained the proposed model using huge publicly available databases, (CBIS-DDSM, BCDR-01, and INbreast), and a private database from the University of Connecticut Health Center (UCHC). Results We compared the performance of the proposed model with those of the state-of-the-art DL models including the fully convolutional network (FCN), SegNet, Dilated-Net, original U-Net, and Faster R-CNN models and the conventional region growing (RG) method. The proposed Vanilla U-Net model outperforms the Faster R-CNN model significantly in terms of the runtime and the Intersection over Union metric (IOU). Training with digitized film-based and fully digitized MG images, the proposed Vanilla U-Net model achieves a mean test accuracy of 92.6%. The proposed model achieves a mean Dice coefficient index (DI) of 0.951 and a mean IOU of 0.909 that show how close the output segments are to the corresponding lesions in the ground truth maps. Data augmentation has been very effective in our experiments resulting in an increase in the mean DI and the mean IOU from 0.922 to 0.951 and 0.856 to 0.909, respectively. Conclusions The proposed Vanilla U-Net based model can be used for precise segmentation of masses in MG images. This is because the segmentation process incorporates more multi-scale spatial context, and captures more local and global context to predict a precise pixel-wise segmentation map of an input full MG image. These detected maps can help radiologists in differentiating benign and malignant lesions depend on the lesion shapes. We show that using transfer learning, introducing augmentation, and modifying the architecture of the original model results in better performance in terms of the mean accuracy, the mean DI, and the mean IOU in detecting mass lesion compared to the other DL and the conventional models.


2020 ◽  
Author(s):  
Dina Abdelhafiz ◽  
Jinbo Bi ◽  
Reda Ammar ◽  
Clifford Yang ◽  
Sheida Nabavi

AbstractBackgroundAutomatic segmentation and localization of lesions in mammogram (MG) images are challenging even with employing advanced methods such as deep learning (DL) methods. We developed a new model based on the architecture of the semantic segmentation U-Net model to precisely segment mass lesions in MG images. The proposed end-to-end convolutional neural network (CNN) based model extracts contextual information by combining low-level and high-level features. We trained the proposed model using huge publicly available databases, (CBIS-DDSM, BCDR-01, and INbreast), and a private database from the University of Connecticut Health Center (UCHC).ResultsWe compared the performance of the proposed model with those of the state-of-the-art DL models including the fully convolutional network (FCN), SegNet, Dilated-Net, original U-Net, and Faster R-CNN models and the conventional region growing (RG) method. The proposed Vanilla U-Net model outperforms the Faster R-CNN model significantly in terms of the runtime and the Intersection over Union metric (IOU). Training with digitized film-based and fully digitized MG images, the proposed Vanilla U-Net model achieves a mean test accuracy of 92.6%. The proposed model achieves a mean Dice coefficient index (DI) of 0.951 and a mean IOU of 0.909 that show how close the output segments are to the corresponding lesions in the ground truth maps. Data augmentation has been very effective in our experiments resulting in an increase in the mean DI and the mean IOU from 0.922 to 0.951 and 0.856 to 0.909, respectively.ConclusionsThe proposed Vanilla U-Net based model can be used for precise segmentation of masses in MG images. This is because the segmentation process incorporates more multi-scale spatial context, and captures more local and global context to predict a precise pixel-wise segmentation map of an input full MG image. These detected maps can help radiologists in differentiating benign and malignant lesions depend on the lesion shapes. We show that using transfer learning, introducing augmentation, and modifying the architecture of the original model results in better performance in terms of the mean accuracy, the mean DI, and the mean IOU in detecting mass lesion compared to the other DL and the conventional models.


2018 ◽  
Vol 10 (1) ◽  
pp. 57-64 ◽  
Author(s):  
Rizqa Raaiqa Bintana ◽  
Chastine Fatichah ◽  
Diana Purwitasari

Community-based question answering (CQA) is formed to help people who search information that they need through a community. One condition that may occurs in CQA is when people cannot obtain the information that they need, thus they will post a new question. This condition can cause CQA archive increased because of duplicated questions. Therefore, it becomes important problems to find semantically similar questions from CQA archive towards a new question. In this study, we use convolutional neural network methods for semantic modeling of sentence to obtain words that they represent the content of documents and new question. The result for the process of finding the same question semantically to a new question (query) from the question-answer documents archive using the convolutional neural network method, obtained the mean average precision value is 0,422. Whereas by using vector space model, as a comparison, obtained mean average precision value is 0,282. Index Terms—community-based question answering, convolutional neural network, question retrieval


Author(s):  
Liang Kim Meng ◽  
Azira Khalil ◽  
Muhamad Hanif Ahmad Nizar ◽  
Maryam Kamarun Nisham ◽  
Belinda Pingguan-Murphy ◽  
...  

Background: Bone Age Assessment (BAA) refers to a clinical procedure that aims to identify a discrepancy between biological and chronological age of an individual by assessing the bone age growth. Currently, there are two main methods of executing BAA which are known as Greulich-Pyle and Tanner-Whitehouse techniques. Both techniques involve a manual and qualitative assessment of hand and wrist radiographs, resulting in intra and inter-operator variability accuracy and time-consuming. An automatic segmentation can be applied to the radiographs, providing the physician with more accurate delineation of the carpal bone and accurate quantitative analysis. Methods: In this study, we proposed an image feature extraction technique based on image segmentation with the fully convolutional neural network with eight stride pixel (FCN-8). A total of 290 radiographic images including both female and the male subject of age ranging from 0 to 18 were manually segmented and trained using FCN-8. Results and Conclusion: The results exhibit a high training accuracy value of 99.68% and a loss rate of 0.008619 for 50 epochs of training. The experiments compared 58 images against the gold standard ground truth images. The accuracy of our fully automated segmentation technique is 0.78 ± 0.06, 1.56 ±0.30 mm and 98.02% in terms of Dice Coefficient, Hausdorff Distance, and overall qualitative carpal recognition accuracy, respectively.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Peter M. Maloca ◽  
Philipp L. Müller ◽  
Aaron Y. Lee ◽  
Adnan Tufail ◽  
Konstantinos Balaskas ◽  
...  

AbstractMachine learning has greatly facilitated the analysis of medical data, while the internal operations usually remain intransparent. To better comprehend these opaque procedures, a convolutional neural network for optical coherence tomography image segmentation was enhanced with a Traceable Relevance Explainability (T-REX) technique. The proposed application was based on three components: ground truth generation by multiple graders, calculation of Hamming distances among graders and the machine learning algorithm, as well as a smart data visualization (‘neural recording’). An overall average variability of 1.75% between the human graders and the algorithm was found, slightly minor to 2.02% among human graders. The ambiguity in ground truth had noteworthy impact on machine learning results, which could be visualized. The convolutional neural network balanced between graders and allowed for modifiable predictions dependent on the compartment. Using the proposed T-REX setup, machine learning processes could be rendered more transparent and understandable, possibly leading to optimized applications.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1737
Author(s):  
Wooseop Lee ◽  
Min-Hee Kang ◽  
Jaein Song ◽  
Keeyeon Hwang

As automated vehicles have been considered one of the important trends in intelligent transportation systems, various research is being conducted to enhance their safety. In particular, the importance of technologies for the design of preventive automated driving systems, such as detection of surrounding objects and estimation of distance between vehicles. Object detection is mainly performed through cameras and LiDAR, but due to the cost and limits of LiDAR’s recognition distance, the need to improve Camera recognition technique, which is relatively convenient for commercialization, is increasing. This study learned convolutional neural network (CNN)-based faster regions with CNN (Faster R-CNN) and You Only Look Once (YOLO) V2 to improve the recognition techniques of vehicle-mounted monocular cameras for the design of preventive automated driving systems, recognizing surrounding vehicles in black box highway driving videos and estimating distances from surrounding vehicles through more suitable models for automated driving systems. Moreover, we learned the PASCAL visual object classes (VOC) dataset for model comparison. Faster R-CNN showed similar accuracy, with a mean average precision (mAP) of 76.4 to YOLO with a mAP of 78.6, but with a Frame Per Second (FPS) of 5, showing slower processing speed than YOLO V2 with an FPS of 40, and a Faster R-CNN, which we had difficulty detecting. As a result, YOLO V2, which shows better performance in accuracy and processing speed, was determined to be a more suitable model for automated driving systems, further progressing in estimating the distance between vehicles. For distance estimation, we conducted coordinate value conversion through camera calibration and perspective transform, set the threshold to 0.7, and performed object detection and distance estimation, showing more than 80% accuracy for near-distance vehicles. Through this study, it is believed that it will be able to help prevent accidents in automated vehicles, and it is expected that additional research will provide various accident prevention alternatives such as calculating and securing appropriate safety distances, depending on the vehicle types.


Sign in / Sign up

Export Citation Format

Share Document