People Detection Using Color and Depth Images

Introduction: Sign language is the only way to communicate for speech-impaired people. But this sign language is not known to normal people so this is the cause of barrier in communicating. This is the problem faced by speech impaired people. In this paper, we have presented our solution which captured hand gestures with Kinect camera and classified the hand gesture into its correct symbol. Method: We used Kinect camera not the ordinary web camera because the ordinary camera does not capture its 3d orientation or depth of an image from camera however Kinect camera can capture 3d image and this will make classification more accurate. Result: Kinect camera will produce a different image for hand gestures for ‘2’ and ‘V’ and similarly for ‘1’ and ‘I’ however, normal web camera will not be able to distinguish between these two. We used hand gesture for Indian sign language and our dataset had 46339, RGB images and 46339 depth images. 80% of the total images were used for training and the remaining 20% for testing. In total 36 hand gestures were considered to capture alphabets and alphabets from A-Z and 10 for numeric, 26 for digits from 0-9 were considered to capture alphabets and Keywords. Conclusion: Along with real-time implementation, we have also shown the comparison of the performance of the various machine learning models in which we have found out the accuracy of CNN on depth- images has given the most accurate performance than other models. All these resulted were obtained on PYNQ Z2 board.

Download Full-text

Smart Video Surveillance System Based on Edge Computing

Sensors ◽

10.3390/s21092958 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2958

Author(s):

Antonio Carlos Cob-Parro ◽

Cristina Losada-Gutiérrez ◽

Marta Marrón-Romera ◽

Alfredo Gardel-Vicente ◽

Ignacio Bravo-Muñoz

Keyword(s):

Computer Vision ◽

Video Surveillance ◽

Surveillance System ◽

Video Surveillance System ◽

Smart Camera ◽

People Detection ◽

Camera System ◽

Edge Node ◽

New Processing ◽

Processor Unit

New processing methods based on artificial intelligence (AI) and deep learning are replacing traditional computer vision algorithms. The more advanced systems can process huge amounts of data in large computing facilities. In contrast, this paper presents a smart video surveillance system executing AI algorithms in low power consumption embedded devices. The computer vision algorithm, typical for surveillance applications, aims to detect, count and track people’s movements in the area. This application requires a distributed smart camera system. The proposed AI application allows detecting people in the surveillance area using a MobileNet-SSD architecture. In addition, using a robust Kalman filter bank, the algorithm can keep track of people in the video also providing people counting information. The detection results are excellent considering the constraints imposed on the process. The selected architecture for the edge node is based on a UpSquared2 device that includes a vision processor unit (VPU) capable of accelerating the AI CNN inference. The results section provides information about the image processing time when multiple video cameras are connected to the same edge node, people detection precision and recall curves, and the energy consumption of the system. The discussion of results shows the usefulness of deploying this smart camera node throughout a distributed surveillance system.

Download Full-text

Two Independent Light Dilution Corrections for the SO2 Camera Retrieve Comparable Emission Rates at Masaya Volcano, Nicaragua

Remote Sensing ◽

10.3390/rs13050935 ◽

2021 ◽

Vol 13 (5) ◽

pp. 935

Author(s):

Matthew Varnam ◽

Mike Burton ◽

Ben Esse ◽

Giuseppe Salerno ◽

Ryunosuke Kazahaya ◽

...

Keyword(s):

Light Scattering ◽

Optical Depth ◽

Camera Calibration ◽

Wind Direction ◽

Emission Rate ◽

Emission Rates ◽

Depth Images ◽

Masaya Volcano ◽

Rapid Changes ◽

Volcanic Emission

SO2 cameras are able to measure rapid changes in volcanic emission rate but require accurate calibrations and corrections to convert optical depth images into slant column densities. We conducted a test at Masaya volcano of two SO2 camera calibration approaches, calibration cells and co-located spectrometer, and corrected both calibrations for light dilution, a process caused by light scattering between the plume and camera. We demonstrate an advancement on the image-based correction that allows the retrieval of the scattering efficiency across a 2D area of an SO2 camera image. When appropriately corrected for the dilution, we show that our two calibration approaches produce final calculated emission rates that agree with simultaneously measured traverse flux data and each other but highlight that the observed distribution of gas within the image is different. We demonstrate that traverses and SO2 camera techniques, when used together, generate better plume speed estimates for traverses and improved knowledge of wind direction for the camera, producing more reliable emission rates. We suggest combining traverses and the SO2 camera should be adopted where possible.

Download Full-text

RobotP: A Benchmark Dataset for 6D Object Pose Estimation

Sensors ◽

10.3390/s21041299 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1299

Author(s):

Honglin Yuan ◽

Tim Hoogenkamp ◽

Remco C. Veltkamp

Keyword(s):

Pose Estimation ◽

Ground Truth ◽

3D Models ◽

Depth Image ◽

Great Success ◽

Estimation Algorithms ◽

Depth Images ◽

Object Pose Estimation ◽

Image Pairs ◽

Bounding Boxes

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.

Download Full-text