Real-time bi-directional people counting using an RGB-D camera

Sensor Review ◽  
2021 ◽  
Vol 41 (4) ◽  
pp. 341-349
Author(s):  
Wahyu Rahmaniar ◽  
W.J. Wang ◽  
Chi-Wei Ethan Chiu ◽  
Noorkholis Luthfil Luthfil Hakim

Purpose The purpose of this paper is to propose a new framework and improve a bi-directional people counting technique using an RGB-D camera to obtain accurate results with fast computation time. Therefore, it can be used in real-time applications. Design/methodology/approach First, image calibration is proposed to obtain the ratio and shift values between the depth and the RGB image. In the depth image, a person is detected as foreground by removing the background. Then, the region of interest (ROI) of the detected people is registered based on their location and mapped to an RGB image. Registered people are tracked in RGB images based on the channel and spatial reliability. Finally, people were counted when they crossed the line of interest (LOI) and their displacement distance was more than 2 m. Findings It was found that the proposed people counting method achieves high accuracy with fast computation time to be used in PCs and embedded systems. The precision rate is 99% with a computation time of 35 frames per second (fps) using a PC and 18 fps using the NVIDIA Jetson TX2. Practical implications The precision rate is 99% with a computation time of 35 frames per second (fps) using a PC and 18 fps using the NVIDIA Jetson TX2. Originality/value The proposed method can count the number of people entering and exiting a room at the same time. If the previous systems were limited to only one to two people in a frame, this system can count many people in a frame. In addition, this system can handle some problems in people counting, such as people who are blocked by others, people moving in another direction suddenly, and people who are standing still.

Electronics ◽  
2019 ◽  
Vol 8 (12) ◽  
pp. 1373 ◽  
Author(s):  
Wahyu Rahmaniar ◽  
Wen-June Wang ◽  
Hsiang-Chieh Chen

Detection of moving objects by unmanned aerial vehicles (UAVs) is an important application in the aerial transportation system. However, there are many problems to be handled such as high-frequency jitter from UAVs, small size objects, low-quality images, computation time reduction, and detection correctness. This paper considers the problem of the detection and recognition of moving objects in a sequence of images captured from a UAV. A new and efficient technique is proposed to achieve the above objective in real time and in real environment. First, the feature points between two successive frames are found for estimating the camera movement to stabilize sequence of images. Then, region of interest (ROI) of the objects are detected as the moving object candidate (foreground). Furthermore, static and dynamic objects are classified based on the most motion vectors that occur in the foreground and background. Based on the experiment results, the proposed method achieves a precision rate of 94% and the computation time of 47.08 frames per second (fps). In comparison to other methods, the performance of the proposed method surpasses those of existing methods.


Author(s):  
Dinh-Son Tran ◽  
Ngoc-Huynh Ho ◽  
Hyung-Jeong Yang ◽  
Soo-Hyung Kim ◽  
Guee Sang Lee

AbstractA real-time fingertip-gesture-based interface is still challenging for human–computer interactions, due to sensor noise, changing light levels, and the complexity of tracking a fingertip across a variety of subjects. Using fingertip tracking as a virtual mouse is a popular method of interacting with computers without a mouse device. In this work, we propose a novel virtual-mouse method using RGB-D images and fingertip detection. The hand region of interest and the center of the palm are first extracted using in-depth skeleton-joint information images from a Microsoft Kinect Sensor version 2, and then converted into a binary image. Then, the contours of the hands are extracted and described by a border-tracing algorithm. The K-cosine algorithm is used to detect the fingertip location, based on the hand-contour coordinates. Finally, the fingertip location is mapped to RGB images to control the mouse cursor based on a virtual screen. The system tracks fingertips in real-time at 30 FPS on a desktop computer using a single CPU and Kinect V2. The experimental results showed a high accuracy level; the system can work well in real-world environments with a single CPU. This fingertip-gesture-based interface allows humans to easily interact with computers by hand.


2020 ◽  
Vol 17 (6) ◽  
pp. 811-821
Author(s):  
Janak D. Trivedi ◽  
Sarada Devi Mandalapu ◽  
Dhara H. Dave

Purpose The purpose of this paper is to find a real-time parking location for a four-wheeler. Design/methodology/approach Real-time parking availability using specific infrastructure requires a high cost of installation and maintenance cost, which is not affordable to all urban cities. The authors present statistical block matching algorithm (SBMA) for real-time parking management in small-town cities such as Bhavnagar using an in-built surveillance CCTV system, which is not installed for parking application. In particular, data from a camera situated in a mall was used to detect the parking status of some specific parking places using a region of interest (ROI). The method proposed computes the mean value of the pixels inside the ROI using blocks of different sizes (8 × 10 and 20 × 35), and the values were compared among different frames. When the difference between frames is more significant than a threshold, the process generates “no parking space for that place.” Otherwise, the method yields “parking place available.” Then, this information is used to print a bounding box on the parking places with the color green/red to show the availability of the parking place. Findings The real-time feedback loop (car parking positions) helps the presented model and dynamically refines the parking strategy and parking position to the users. A whole-day experiment/validation is shown in this paper, where the evaluation of the method is performed using pattern recognition metrics for classification: precision, recall and F1 score. Originality/value The authors found real-time parking availability for Himalaya Mall situated in Bhavnagar, Gujarat, for 18th June 2018 video using the SBMA method with accountable computational time for finding parking slots. The limitations of the presented method with future implementation are discussed at the end of this paper.


2020 ◽  
Author(s):  
Guoliang Liu

Full resolution depth is required in many realworld engineering applications. However, exist depth sensorsonly offer sparse depth sample points with limited resolutionand noise, e.g., LiDARs. We here propose a deep learningbased full resolution depth recovery method from monocularimages and corresponding sparse depth measurements of targetenvironment. The novelty of our idea is that the structure similarinformation between the RGB image and depth image is used torefine the dense depth estimation result. This important similarstructure information can be found using a correlation layerin the regression neural network. We show that the proposedmethod can achieve higher estimation accuracy compared tothe state of the art methods. The experiments conducted on theNYU Depth V2 prove the novelty of our idea.<br>


2020 ◽  
Vol 12 (7) ◽  
pp. 1142
Author(s):  
Jeonghoon Kwak ◽  
Yunsick Sung

To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point cloud. Thus, there is a need for a method for extracting 3D landmarks without manual supervision. Herein, an RGB image and a 3D point cloud are used to extract 3D landmarks. The 3D point cloud is utilized as the relative distance between a LiDAR and a user. Because it cannot contain all information the user’s entire body due to disparities, it cannot generate a dense depth image that provides the boundary of user’s body. Therefore, up-sampling is performed to increase the density of the depth image generated based on the 3D point cloud; the density depends on the 3D point cloud. This paper proposes a system for extracting 3D landmarks using 3D point clouds and RGB images without manual supervision. A depth image provides the boundary of a user’s motion and is generated by using 3D point cloud and RGB image collected by a LiDAR and an RGB camera, respectively. To extract 3D landmarks automatically, an encoder–decoder model is trained with the generated depth images, and the RGB images and 3D landmarks are extracted from these images with the trained encoder model. The method of extracting 3D landmarks using RGB depth (RGBD) images was verified experimentally, and 3D landmarks were extracted to evaluate the user’s motions with RGBD images. In this manner, landmarks could be extracted according to the user’s motions, rather than by extracting them using the RGB images. The depth images generated by the proposed method were 1.832 times denser than the up-sampling-based depth images generated with bilateral filtering.


2021 ◽  
Vol 11 (16) ◽  
pp. 7741
Author(s):  
Wooryong Park ◽  
Donghee Lee ◽  
Junhak Yi ◽  
Woochul Nam

Tracking a micro aerial vehicle (MAV) is challenging because of its small size and swift motion. A new model was developed by combining compact and adaptive search region (SR). The model can accurately and robustly track MAVs with a fast computation speed. A compact SR, which is slightly larger than a target MAV, is less likely to include a distracting background than a large SR; thus, it can accurately track the MAV. Moreover, the compact SR reduces the computation time because tracking can be conducted with a relatively shallow network. An optimal SR to MAV size ratio was obtained in this study. However, this optimal compact SR causes frequent tracking failures in the presence of the dynamic MAV motion. An adaptive SR is proposed to address this problem; it adaptively changes the location and size of the SR based on the size, location, and velocity of the MAV in the SR. The compact SR without adaptive strategy tracks the MAV with an accuracy of 0.613 and a robustness of 0.086, whereas the compact and adaptive SR has an accuracy of 0.811 and a robustness of 1.0. Moreover, online tracking is accomplished within approximately 400 frames per second, which is significantly faster than the real-time speed.


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2813
Author(s):  
Yun Peng ◽  
Shengyi Zhao ◽  
Jizhan Liu

Accurately extracting the grape cluster at the front of overlapping grape clusters is the primary problem of the grape-harvesting robot. To solve the difficult problem of identifying and segmenting the overlapping grape clusters in the cultivation environment of a trellis, a simple method based on the deep learning network and the idea of region growing is proposed. Firstly, the region of grape in an RGB image was obtained by the finely trained DeepLabV3+ model. The idea of transfer learning was adopted when training the network with a limited number of training sets. Then, the corresponding region of the grape in the depth image captured by RealSense D435 was processed by the proposed depth region growing algorithm (DRG) to extract the front cluster. The depth region growing method uses the depth value instead of gray value to achieve clustering. Finally, it fils the holes in the clustered region of interest, extracts the contours, and maps the obtained contours to the RGB image. The images captured by RealSense D435 in a natural trellis environment were adopted to evaluate the performance of the proposed method. The experimental results showed that the recall and precision of the proposed method were 89.2% and 87.5%, respectively. The demonstrated performance indicated that the proposed method could satisfy the requirements of practical application for robotic grape harvesting.


2018 ◽  
Vol 8 (11) ◽  
pp. 2017 ◽  
Author(s):  
Gyu-cheol Lee ◽  
Sang-ha Lee ◽  
Jisang Yoo

People counting in surveillance cameras is a key technology for understanding the flow population and generating heat maps. In recent years, people detection performance has been greatly improved with the development of object detection algorithms using deep learning. However, in places where people are crowded, the detection rate is low as people are often occluded by other people. We proposed a people-counting method using a stereo camera to resolve the non-detection problem due to the occlusion. We applied stereo matching to extract the depth image and convert the camera view to top view using depth information. People were detected using a height map and an occupancy map, and people were tracked and counted using a Kalman filter-based tracker. We operated the proposed method on the NVIDIA Jetson TX2 to check the real-time operation possibility on the embedded board. Experimental results showed that the proposed method had higher accuracy than the existing methods and that real-time processing is possible.


2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Hanmin Cho ◽  
Seungwha Han ◽  
Sun-Young Hwang

We propose a real-time algorithm for recognition of speed limit signs from a moving vehicle. Linear Discriminant Analysis (LDA) required for classification is performed by using Discrete Cosine Transform (DCT) coefficients. To reduce feature dimension in LDA, DCT coefficients are selected by a devised discriminant function derived from information obtained by training. Binarization and thinning are performed on a Region of Interest (ROI) obtained by preprocessing a detected ROI prior to DCT for further reduction of computation time in DCT. This process is performed on a sequence of image frames to increase the hit rate of recognition. Experimental results show that arithmetic operations are reduced by about 60%, while hit rates reach about 100% compared to previous works.


2020 ◽  
Author(s):  
Guoliang Liu

Full resolution depth is required in many realworld engineering applications. However, exist depth sensorsonly offer sparse depth sample points with limited resolutionand noise, e.g., LiDARs. We here propose a deep learningbased full resolution depth recovery method from monocularimages and corresponding sparse depth measurements of targetenvironment. The novelty of our idea is that the structure similarinformation between the RGB image and depth image is used torefine the dense depth estimation result. This important similarstructure information can be found using a correlation layerin the regression neural network. We show that the proposedmethod can achieve higher estimation accuracy compared tothe state of the art methods. The experiments conducted on theNYU Depth V2 prove the novelty of our idea.<br>


Sign in / Sign up

Export Citation Format

Share Document