appearance feature
Recently Published Documents


TOTAL DOCUMENTS

27
(FIVE YEARS 13)

H-INDEX

3
(FIVE YEARS 2)

Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6657
Author(s):  
Youssef Osman ◽  
Reed Dennis ◽  
Khalid Elgazzar

We present an end-to-end smart harvesting solution for precision agriculture. Our proposed pipeline begins with yield estimation that is done through the use of object detection and tracking to count fruit within a video. We use and train You Only Look Once model (YOLO) on video clips of apples, oranges and pumpkins. The bounding boxes obtained through objection detection are used as an input to our selected tracking model, DeepSORT. The original version of DeepSORT is unusable with fruit data, as the appearance feature extractor only works with people. We implement ResNet as DeepSORT's new feature extractor, which is lightweight, accurate and generically works on different fruits. Our yield estimation module shows accuracy between 91–95% on real footage of apple trees. Our modification successfully works for counting oranges and pumpkins, with an accuracy of 79% and 93.9% with no need for training. Our framework additionally includes a visualization of the yield. This is done through the incorporation of geospatial data. We also propose a mechanism to annotate a set of frames with a respective GPS coordinate. During counting, the count within the set of frames and the matching GPS coordinate are recorded, which we then visualize on a map. We leverage this information to propose an optimal container placement solution. Our proposed solution involves minimizing the number of containers to place across the field before harvest, based on a set of constraints. This acts as a decision support system for the farmer to make efficient plans for logistics, such as labor, equipment and gathering paths before harvest. Our work serves as a blueprint for future agriculture decision support systems that can aid in many other aspects of farming.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3476
Author(s):  
Zhitao Wang ◽  
Chunlei Xia ◽  
Jangmyung Lee

A parallel fish school tracking based on multiple-feature fish detection has been proposed in this paper to obtain accurate movement trajectories of a large number of zebrafish. Zebrafish are widely adapted in many fields as an excellent model organism. Due to the non-rigid body, similar appearance, rapid transition, and frequent occlusions, vision-based behavioral monitoring is still a challenge. A multiple appearance feature based fish detection scheme was developed by examining the fish head and center of the fish body based on shape index features. The proposed fish detection has the advantage of locating individual fishes from occlusions and estimating their motion states, which could ensure the stability of tracking multiple fishes. Moreover, a parallel tracking scheme was developed based on the SORT framework by fusing multiple features of individual fish and motion states. The proposed method was evaluated in seven video clips taken under different conditions. These videos contained various scales of fishes, different arena sizes, different frame rates, and various image resolutions. The maximal number of tracking targets reached 100 individuals. The correct tracking ratio was 98.60% to 99.86%, and the correct identification ratio ranged from 97.73% to 100%. The experimental results demonstrate that the proposed method is superior to advanced deep learning-based methods. Nevertheless, this method has real-time tracking ability, which can acquire online trajectory data without high-cost hardware configuration.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 1116-1123
Author(s):  
Zhenguo Ding ◽  
Sitong Liu ◽  
Min Li ◽  
Zhichao Lian ◽  
Hui Xu

2021 ◽  
Vol 163 ◽  
pp. 113771
Author(s):  
Wenfeng Zhang ◽  
Lei Huang ◽  
Zhiqiang Wei ◽  
Jie Nie

Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3126
Author(s):  
Jianyu Chen ◽  
Jun Kong ◽  
Hui Sun ◽  
Hui Xu ◽  
Xiaoli Liu ◽  
...  

Action recognition is a significant and challenging topic in the field of sensor and computer vision. Two-stream convolutional neural networks (CNNs) and 3D CNNs are two mainstream deep learning architectures for video action recognition. To combine them into one framework to further improve performance, we proposed a novel deep network, named the spatiotemporal interaction residual network with pseudo3D (STINP). The STINP possesses three advantages. First, the STINP consists of two branches constructed based on residual networks (ResNets) to simultaneously learn the spatial and temporal information of the video. Second, the STINP integrates the pseudo3D block into residual units for building the spatial branch, which ensures that the spatial branch can not only learn the appearance feature of the objects and scene in the video, but also capture the potential interaction information among the consecutive frames. Finally, the STINP adopts a simple but effective multiplication operation to fuse the spatial branch and temporal branch, which guarantees that the learned spatial and temporal representation can interact with each other during the entire process of training the STINP. Experiments were implemented on two classic action recognition datasets, UCF101 and HMDB51. The experimental results show that our proposed STINP can provide better performance for video recognition than other state-of-the-art algorithms.


2020 ◽  
Vol 34 (07) ◽  
pp. 12015-12022
Author(s):  
Guanglu Song ◽  
Yu Liu ◽  
Yuhang Zang ◽  
Xiaogang Wang ◽  
Biao Leng ◽  
...  

The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors. In this work, we find that the appearance feature of a generic face is discriminative enough for a tiny and shallow neural network to verify from the background. And the essential barriers behind us are 1) the vague definition of the face bounding box and 2) tricky design of anchor-boxes or receptive field. Unlike most top-down methods for joint face detection and alignment, the proposed KPNet detects small facial keypoints instead of the whole face by in the bottom-up manner. It first predicts the facial landmarks from a low-resolution image via the well-designed fine-grained scale approximation and scale adaptive soft-argmax operator. Finally, the precise face bounding boxes, no matter how we define it, can be inferred from the keypoints. Without any complex head architecture or meticulous network designing, the KPNet achieves state-of-the-art accuracy on generic face detection and alignment benchmarks with only ∼ 1M parameters, which runs at 1000fps on GPU and is easy to perform real-time on most modern front-end chips.


Sign in / Sign up

Export Citation Format

Share Document