scholarly journals Seeing the Trees from the Forest: Using Modern Methods to Identify Individual Objects in a Cluttered Environment for Robots

2021 ◽  
Author(s):  
◽  
Josh Prow

<p>Robotics and computer vision are areas of high growth across both industry and personal usage environments. Robots in industrial situations have been used to work in environments that are hazardous for humans or to perform basic tasks that require fine detail beyond that which human operators can reliably perform. These robotic solutions require a variety of sensors and cameras to navigate and identify objects within their working environment, as well as software and intelligent detection systems. These solutions generally require high definition depth cameras, laser range finders and computer vision algorithms, which are both expensive and require expensive graphics processors to run practically.  This thesis explores the option of a low-cost computer vision enabled robotic solution, which can operate within a forestry environment. Starting with the accuracy of camera technologies, testing two of the main cameras available for robotic vision, and demonstrating the benefits of the RealSense D435 by Intel over the Kinect for X-Box One. Followed by testing common object detection and recognition algorithms on different devices; considering the advantages and weaknesses of the determined models for the intended purpose of forestry.  These tests support other research on finding that the MobileNet Single Shot Detector has the fastest recognition speeds with accurate precision, however, it struggles where multiple objects were present, or the background was complex. In comparison, the Mask R-CNN had high accuracy and was able to identify objects consistently even with large numbers overlaid within a single frame.  A combined method based on the Faster R-CNN architecture with a MobileNet backbone and masking layers is proposed, developed and tested based on these findings. This method utilized the feature extraction and object detection abilities of the faster MobileNet in place of the traditionally ResNet based feature proposal networks, while still capitalizing on the benefits of the region of interest (ROI) align and masking from the Mask R-CNN architecture.  The results from this model did not meet the criteria required to recommend the model as an operational solution for the forestry environment. However, they do show that the model has higher performance and average precision than other models with similar frame rates on the non-CUDA enabled testing device. Demonstrating the technology and methodology has the potential to be the basis for a future solution to the problem of balancing accuracy and performance on a low performance or non GPU-enabled robotic unit.</p>

2021 ◽  
Author(s):  
◽  
Josh Prow

<p>Robotics and computer vision are areas of high growth across both industry and personal usage environments. Robots in industrial situations have been used to work in environments that are hazardous for humans or to perform basic tasks that require fine detail beyond that which human operators can reliably perform. These robotic solutions require a variety of sensors and cameras to navigate and identify objects within their working environment, as well as software and intelligent detection systems. These solutions generally require high definition depth cameras, laser range finders and computer vision algorithms, which are both expensive and require expensive graphics processors to run practically.  This thesis explores the option of a low-cost computer vision enabled robotic solution, which can operate within a forestry environment. Starting with the accuracy of camera technologies, testing two of the main cameras available for robotic vision, and demonstrating the benefits of the RealSense D435 by Intel over the Kinect for X-Box One. Followed by testing common object detection and recognition algorithms on different devices; considering the advantages and weaknesses of the determined models for the intended purpose of forestry.  These tests support other research on finding that the MobileNet Single Shot Detector has the fastest recognition speeds with accurate precision, however, it struggles where multiple objects were present, or the background was complex. In comparison, the Mask R-CNN had high accuracy and was able to identify objects consistently even with large numbers overlaid within a single frame.  A combined method based on the Faster R-CNN architecture with a MobileNet backbone and masking layers is proposed, developed and tested based on these findings. This method utilized the feature extraction and object detection abilities of the faster MobileNet in place of the traditionally ResNet based feature proposal networks, while still capitalizing on the benefits of the region of interest (ROI) align and masking from the Mask R-CNN architecture.  The results from this model did not meet the criteria required to recommend the model as an operational solution for the forestry environment. However, they do show that the model has higher performance and average precision than other models with similar frame rates on the non-CUDA enabled testing device. Demonstrating the technology and methodology has the potential to be the basis for a future solution to the problem of balancing accuracy and performance on a low performance or non GPU-enabled robotic unit.</p>


Sensors ◽  
2019 ◽  
Vol 19 (4) ◽  
pp. 866 ◽  
Author(s):  
Tanguy Ophoff ◽  
Kristof Van Beeck ◽  
Toon Goedemé

In this paper, we investigate whether fusing depth information on top of normal RGB data for camera-based object detection can help to increase the performance of current state-of-the-art single-shot detection networks. Indeed, depth sensing is easily acquired using depth cameras such as a Kinect or stereo setups. We investigate the optimal manner to perform this sensor fusion with a special focus on lightweight single-pass convolutional neural network (CNN) architectures, enabling real-time processing on limited hardware. For this, we implement a network architecture allowing us to parameterize at which network layer both information sources are fused together. We performed exhaustive experiments to determine the optimal fusion point in the network, from which we can conclude that fusing towards the mid to late layers provides the best results. Our best fusion models significantly outperform the baseline RGB network in both accuracy and localization of the detections.


2021 ◽  
Vol 38 (6) ◽  
pp. 1647-1655
Author(s):  
Qilin Bi ◽  
Minling Lai ◽  
Huiling Tang ◽  
Yanyao Guo ◽  
Jinyuan Li ◽  
...  

The precise inspection of geometric parameters is crucial for quality control in the context of Industry 4.0. The current technique of precise inspection depends on the operation of professional personnel, and the measuring accuracy is restricted by the proficiency of operators. To solve the defects, this paper proposes a precise inspection framework for the geometric parameters of polyvinyl chloride (PVC) pipe section (G-PVC), using low-cost visual sensors and high-precision computer vision algorithms. Firstly, a robust imaging system was built to acquire images of a PVC pipe section under irregular illumination changes. Next, an engineering semantic model was established to calculate G-PVC like inner diameter, outer diameter, wall thickness, and roundness. After that, a region-of-interest (ROI) extraction algorithm was combined with an improved edge operator to obtain the coordinates of measured points on PVC end-face image in a stable and precise manner. Finally, our framework was proved highly precise and robust through experiments.


Cryptography ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 9
Author(s):  
Mukhil Azhagan Mallaiyan Sathiaseelan ◽  
Olivia P. Paradis ◽  
Shayan Taheri ◽  
Navid Asadizanjani

In this paper, we present the need for specialized artificial intelligence (AI) for counterfeit and defect detection of PCB components. Popular computer vision object detection techniques are not sufficient for such dense, low inter-class/high intra-class variation, and limited-data hardware assurance scenarios in which accuracy is paramount. Hence, we explored the limitations of existing object detection methodologies, such as region based convolutional neural networks (RCNNs) and single shot detectors (SSDs), and compared them with our proposed method, the electronic component localization and detection network (ECLAD-Net). The results indicate that, of the compared methods, ECLAD-Net demonstrated the highest performance, with a precision of 87.2% and a recall of 98.9%. Though ECLAD-Net demonstrated decent performance, there is still much progress and collaboration needed from the hardware assurance, computer vision, and deep learning communities for automated, accurate, and scalable PCB assurance.


Author(s):  
R.J. Mount ◽  
R.V. Harrison

The sensory end organ of the ear, the organ of Corti, rests on a thin basilar membrane which lies between the bone of the central modiolus and the bony wall of the cochlea. In vivo, the organ of Corti is protected by the bony wall which totally surrounds it. In order to examine the sensory epithelium by scanning electron microscopy it is necessary to dissect away the protective bone and expose the region of interest (Fig. 1). This leaves the fragile organ of Corti susceptible to physical damage during subsequent handling. In our laboratory cochlear specimens, after dissection, are routinely prepared by the O-T- O-T-O technique, critical point dried and then lightly sputter coated with gold. This processing involves considerable specimen handling including several hours on a rotator during which the organ of Corti is at risk of being physically damaged. The following procedure uses low cost, readily available materials to hold the specimen during processing ,preventing physical damage while allowing an unhindered exchange of fluids.Following fixation, the cochlea is dehydrated to 70% ethanol then dissected under ethanol to prevent air drying. The holder is prepared by punching a hole in the flexible snap cap of a Wheaton vial with a paper hole punch. A small amount of two component epoxy putty is well mixed then pushed through the hole in the cap. The putty on the inner cap is formed into a “cup” to hold the specimen (Fig. 2), the putty on the outside is smoothed into a “button” to give good attachment even when the cap is flexed during handling (Fig. 3). The cap is submerged in the 70% ethanol, the bone at the base of the cochlea is seated into the cup and the sides of the cup squeezed with forceps to grip it (Fig.4). Several types of epoxy putty have been tried, most are either soluble in ethanol to some degree or do not set in ethanol. The only putty we find successful is “DUROtm MASTERMENDtm Epoxy Extra Strength Ribbon” (Loctite Corp., Cleveland, Ohio), this is a blue and yellow ribbon which is kneaded to form a green putty, it is available at many hardware stores.


2017 ◽  
Vol 2 (1) ◽  
pp. 80-87
Author(s):  
Puyda V. ◽  
◽  
Stoian. A.

Detecting objects in a video stream is a typical problem in modern computer vision systems that are used in multiple areas. Object detection can be done on both static images and on frames of a video stream. Essentially, object detection means finding color and intensity non-uniformities which can be treated as physical objects. Beside that, the operations of finding coordinates, size and other characteristics of these non-uniformities that can be used to solve other computer vision related problems like object identification can be executed. In this paper, we study three algorithms which can be used to detect objects of different nature and are based on different approaches: detection of color non-uniformities, frame difference and feature detection. As the input data, we use a video stream which is obtained from a video camera or from an mp4 video file. Simulations and testing of the algoritms were done on a universal computer based on an open-source hardware, built on the Broadcom BCM2711, quad-core Cortex-A72 (ARM v8) 64-bit SoC processor with frequency 1,5GHz. The software was created in Visual Studio 2019 using OpenCV 4 on Windows 10 and on a universal computer operated under Linux (Raspbian Buster OS) for an open-source hardware. In the paper, the methods under consideration are compared. The results of the paper can be used in research and development of modern computer vision systems used for different purposes. Keywords: object detection, feature points, keypoints, ORB detector, computer vision, motion detection, HSV model color


Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1718
Author(s):  
Chien-Hsing Chou ◽  
Yu-Sheng Su ◽  
Che-Ju Hsu ◽  
Kong-Chang Lee ◽  
Ping-Hsuan Han

In this study, we designed a four-dimensional (4D) audiovisual entertainment system called Sense. This system comprises a scene recognition system and hardware modules that provide haptic sensations for users when they watch movies and animations at home. In the scene recognition system, we used Google Cloud Vision to detect common scene elements in a video, such as fire, explosions, wind, and rain, and further determine whether the scene depicts hot weather, rain, or snow. Additionally, for animated videos, we applied deep learning with a single shot multibox detector to detect whether the animated video contained scenes of fire-related objects. The hardware module was designed to provide six types of haptic sensations set as line-symmetry to provide a better user experience. After the system considers the results of object detection via the scene recognition system, the system generates corresponding haptic sensations. The system integrates deep learning, auditory signals, and haptic sensations to provide an enhanced viewing experience.


2021 ◽  
Vol 11 (6) ◽  
pp. 522
Author(s):  
Feng-Yu Liu ◽  
Chih-Chi Chen ◽  
Chi-Tung Cheng ◽  
Cheng-Ta Wu ◽  
Chih-Po Hsu ◽  
...  

Automated detection of the region of interest (ROI) is a critical step in the two-step classification system in several medical image applications. However, key information such as model parameter selection, image annotation rules, and ROI confidence score are essential but usually not reported. In this study, we proposed a practical framework of ROI detection by analyzing hip joints seen on 7399 anteroposterior pelvic radiographs (PXR) from three diverse sources. We presented a deep learning-based ROI detection framework utilizing a single-shot multi-box detector with a customized head structure based on the characteristics of the obtained datasets. Our method achieved average intersection over union (IoU) = 0.8115, average confidence = 0.9812, and average precision with threshold IoU = 0.5 (AP50) = 0.9901 in the independent testing set, suggesting that the detected hip regions appropriately covered the main features of the hip joints. The proposed approach featured flexible loose-fitting labeling, customized model design, and heterogeneous data testing. We demonstrated the feasibility of training a robust hip region detector for PXRs. This practical framework has a promising potential for a wide range of medical image applications.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 343
Author(s):  
Kim Bjerge ◽  
Jakob Bonde Nielsen ◽  
Martin Videbæk Sepstrup ◽  
Flemming Helsing-Nielsen ◽  
Toke Thomas Høye

Insect monitoring methods are typically very time-consuming and involve substantial investment in species identification following manual trapping in the field. Insect traps are often only serviced weekly, resulting in low temporal resolution of the monitoring data, which hampers the ecological interpretation. This paper presents a portable computer vision system capable of attracting and detecting live insects. More specifically, the paper proposes detection and classification of species by recording images of live individuals attracted to a light trap. An Automated Moth Trap (AMT) with multiple light sources and a camera was designed to attract and monitor live insects during twilight and night hours. A computer vision algorithm referred to as Moth Classification and Counting (MCC), based on deep learning analysis of the captured images, tracked and counted the number of insects and identified moth species. Observations over 48 nights resulted in the capture of more than 250,000 images with an average of 5675 images per night. A customized convolutional neural network was trained on 2000 labeled images of live moths represented by eight different classes, achieving a high validation F1-score of 0.93. The algorithm measured an average classification and tracking F1-score of 0.71 and a tracking detection rate of 0.79. Overall, the proposed computer vision system and algorithm showed promising results as a low-cost solution for non-destructive and automatic monitoring of moths.


Sign in / Sign up

Export Citation Format

Share Document