SILIC: A cross database framework for automatically extracting robust biodiversity information from soundscape recordings based on object detection and a tiny training dataset

2021 ◽  
pp. 101534
Author(s):  
Shih-Hung Wu ◽  
Hsueh-Wen Chang ◽  
Ruey-Shing Lin ◽  
Mao-Ning Tuanmu
2020 ◽  
Vol 10 (6) ◽  
pp. 2104
Author(s):  
Michał Tomaszewski ◽  
Paweł Michalski ◽  
Jakub Osuchowski

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.


2021 ◽  
Author(s):  
Sung Hyun Noh ◽  
Chansik An ◽  
Dain Kim ◽  
Seung Hyun Lee ◽  
Min-Yung Chang ◽  
...  

Abstract Background A computer algorithm that automatically detects sacroiliac joint abnormalities on plain radiograph would help radiologists avoid missing sacroiliitis. This study aimed to develop and validate a deep learning model to detect and diagnose sacroiliitis on plain radiograph in young patients with low back pain. Methods This Institutional Review Board-approved retrospective study included 478 and 468 plain radiographs from 241 and 433 young (< 40 years) patients who complained of low back pain with and without ankylosing spondylitis, respectively. They were randomly split into training and test datasets with a ratio of 8:2. Radiologists reviewed the images and labeled the coordinates of a bounding box and determined the presence or absence of sacroiliitis for each sacroiliac joint. We fine-tined and optimized the EfficientDet-D4 object detection model pre-trained on the COCO 2107 dataset on the training dataset and validated the final model on the test dataset. Results The mean average precision, an evaluation metric for object detection accuracy, was 0.918 at 0.5 intersection over union. In the diagnosis of sacroiliitis, the area under the curve, sensitivity, specificity, accuracy, and F1-score were 0.932 (95% confidence interval, 0.903–0.961), 96.9% (92.9–99.0), 86.8% (81.5–90.9), 91.1% (87.7–93.7), and 90.2% (85.0–93.9), respectively. Conclusions The EfficientDet, a deep learning-based object detection algorithm, could be used to automatically diagnose sacroiliitis on plain radiograph.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3594
Author(s):  
Hwiwon Lee ◽  
Sekyoung Youm

As many as 40% to 50% of patients do not adhere to long-term medications for managing chronic conditions, such as diabetes or hypertension. Limited opportunity for medication monitoring is a major problem from the perspective of health professionals. The availability of prompt medication error reports can enable health professionals to provide immediate interventions for patients. Furthermore, it can enable clinical researchers to modify experiments easily and predict health levels based on medication compliance. This study proposes a method in which videos of patients taking medications are recorded using a camera image sensor integrated into a wearable device. The collected data are used as a training dataset based on applying the latest convolutional neural network (CNN) technique. As for an artificial intelligence (AI) algorithm to analyze the medication behavior, we constructed an object detection model (Model 1) using the faster region-based CNN technique and a second model that uses the combined feature values to perform action recognition (Model 2). Moreover, 50,000 image data were collected from 89 participants, and labeling was performed on different data categories to train the algorithm. The experimental combination of the object detection model (Model 1) and action recognition model (Model 2) was newly developed, and the accuracy was 92.7%, which is significantly high for medication behavior recognition. This study is expected to enable rapid intervention for providers seeking to treat patients through rapid reporting of drug errors.


2021 ◽  
Vol 13 (5) ◽  
pp. 943
Author(s):  
Patrick J. Hennessy ◽  
Travis J. Esau ◽  
Aitazaz A. Farooque ◽  
Arnold W. Schumann ◽  
Qamar U. Zaman ◽  
...  

Deep learning convolutional neural networks (CNNs) are an emerging technology that provide an opportunity to increase agricultural efficiency through remote sensing and automatic inferencing of field conditions. This paper examined the novel use of CNNs to identify two weeds, hair fescue and sheep sorrel, in images of wild blueberry fields. Commercial herbicide sprayers provide a uniform application of agrochemicals to manage patches of these weeds. Three object-detection and three image-classification CNNs were trained to identify hair fescue and sheep sorrel using images from 58 wild blueberry fields. The CNNs were trained using 1280x720 images and were tested at four different internal resolutions. The CNNs were retrained with progressively smaller training datasets ranging from 3780 to 472 images to determine the effect of dataset size on accuracy. YOLOv3-Tiny was the best object-detection CNN, detecting at least one target weed per image with F1-scores of 0.97 for hair fescue and 0.90 for sheep sorrel at 1280 × 736 resolution. Darknet Reference was the most accurate image-classification CNN, classifying images containing hair fescue and sheep sorrel with F1-scores of 0.96 and 0.95, respectively at 1280 × 736. MobileNetV2 achieved comparable results at the lowest resolution, 864 × 480, with F1-scores of 0.95 for both weeds. Training dataset size had minimal effect on accuracy for all CNNs except Darknet Reference. This technology can be used in a smart sprayer to control target specific spray applications, reducing herbicide use. Future work will involve testing the CNNs for use on a smart sprayer and the development of an application to provide growers with field-specific information. Using CNNs to improve agricultural efficiency will create major cost-savings for wild blueberry producers.


2019 ◽  
Author(s):  
Safoora Yousefi ◽  
Yao Nie

ABSTRACTDespite significant recent success, modern computer vision techniques such as Convolutional Neural Networks (CNNs) are expensive to apply to cell-level prediction problems in histopathology images due to difficulties in providing cell-level supervision. This work explores the transferability of features learned by an object detection CNN (Faster R-CNN) to nucleus classification in histopathology images. We detect nuclei in these images using class-agnostic models trained on small annotated patches, and use the CNN representations of detected nuclei to cluster and classify them. We show that with a small training dataset, the proposed pipeline can achieve superior nucleus detection and classification performance, and generalizes well to unseen stain types.


Author(s):  
K. Kamiya ◽  
T. Fuse ◽  
M. Takahashi

Since satellite and aerial imageries are recently widely spread and frequently observed, combination of them are expected to complement spatial and temporal resolution each other. One of the prospective applications is traffic monitoring, where objects of interest, or vehicles, need to be recognized automatically. Techniques that employ &lt;i&gt;object detection&lt;/i&gt; before &lt;i&gt;object recognition&lt;/i&gt; can save a computational time and cost, and thus take a significant role. However, there is not enough knowledge whether object detection method can perform well on satellite and aerial imageries. In addition, it also has to be studied how characteristics of satellite and aerial imageries affect the object detection performance. This study employ binarized normed gradients (BING) method that runs significantly fast and is robust to rotation and noise. For our experiments, 11-bits BGR-IR satellite imageries from WorldView-3, and BGR-color aerial imageries are used respectively, and we create thousands of ground truth samples. We conducted several experiments to compare the performances with different images, to verify whether combination of different resolution images improved the performance, and to analyze the applicability of mixing satellite and aerial imageries. The results showed that infrared band had little effect on the detection rate, that 11-bit images performed less than 8-bit images and that the better spatial resolution brought the better performance. Another result might imply that mixing higher and lower resolution images for training dataset could help detection performance. Furthermore, we found that aerial images improved the detection performance on satellite images.


Electronics ◽  
2019 ◽  
Vol 8 (3) ◽  
pp. 329 ◽  
Author(s):  
Yong Li ◽  
Guofeng Tong ◽  
Huashuai Gao ◽  
Yuebin Wang ◽  
Liqiang Zhang ◽  
...  

Panoramic images have a wide range of applications in many fields with their ability to perceive all-round information. Object detection based on panoramic images has certain advantages in terms of environment perception due to the characteristics of panoramic images, e.g., lager perspective. In recent years, deep learning methods have achieved remarkable results in image classification and object detection. Their performance depends on the large amount of training data. Therefore, a good training dataset is a prerequisite for the methods to achieve better recognition results. Then, we construct a benchmark named Pano-RSOD for panoramic road scene object detection. Pano-RSOD contains vehicles, pedestrians, traffic signs and guiding arrows. The objects of Pano-RSOD are labelled by bounding boxes in the images. Different from traditional object detection datasets, Pano-RSOD contains more objects in a panoramic image, and the high-resolution images have 360-degree environmental perception, more annotations, more small objects and diverse road scenes. The state-of-the-art deep learning algorithms are trained on Pano-RSOD for object detection, which demonstrates that Pano-RSOD is a useful benchmark, and it provides a better panoramic image training dataset for object detection tasks, especially for small and deformed objects.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Lifei He ◽  
Ming Jiang ◽  
Ryutarou Ohbuchi ◽  
Takahiko Furuya ◽  
Min Zhang ◽  
...  

Object detection is one of the core tasks in computer vision. Object detection algorithms often have difficulty detecting objects with diverse scales, especially those with smaller scales. To cope with this issue, Lin et al. proposed feature pyramid networks (FPNs), which aim for a feature pyramid with higher semantic content at every scale level. The FPN consists of a bottom-up pyramid and a top-down pyramid. The bottom-up pyramid is induced by a convolutional neural network as its layers of feature maps. The top-down pyramid is formed by progressive up-sampling of a highly semantic yet low-resolution feature map at the top of the bottom-up pyramid. At each up-sampling step, feature maps of the bottom-up pyramid are fused with the top-down pyramid to produce highly semantic yet high-resolution feature maps in the top-down pyramid. Despite significant improvement, the FPN still misses small-scale objects. To further improve the detection of small-scale objects, this paper proposes scale adaptive feature pyramid networks (SAFPNs). The SAFPN employs weights chosen adaptively to each input image in fusing feature maps of the bottom-up pyramid and top-down pyramid. Scale adaptive weights are computed by using a scale attention module built into the feature map fusion computation. The scale attention module is trained end-to-end to adapt to the scale of objects contained in images of the training dataset. Experimental evaluation, using both the 2-stage detector faster R-CNN and 1-stage detector RetinaNet, demonstrated the proposed approach’s effectiveness.


2020 ◽  
Vol 2020 (16) ◽  
pp. 203-1-203-6
Author(s):  
Astrid Unger ◽  
Margrit Gelautz ◽  
Florian Seitner

With the growing demand for robust object detection algorithms in self-driving systems, it is important to consider the varying lighting and weather conditions in which cars operate all year round. The goal of our work is to gain a deeper understanding of meaningful strategies for selecting and merging training data from currently available databases and self-annotated videos in the context of automotive night scenes. We retrain an existing Convolutional Neural Network (YOLOv3) to study the influence of different training dataset combinations on the final object detection results in nighttime and low-visibility traffic scenes. Our evaluation shows that a suitable selection of training data from the GTSRD, VIPER, and BDD databases in conjunction with selfrecorded night scenes can achieve an mAP of 63,5% for ten object classes, which is an improvement of 16,7% when compared to the performance of the original YOLOv3 network on the same test set.


Sign in / Sign up

Export Citation Format

Share Document