scholarly journals DETECTION OF A HUMAN HEAD ON A LOW-QUALITY IMAGE AND ITS SOFTWARE IMPLEMENTATION

Author(s):  
D. Yudin ◽  
A. Ivanov ◽  
M. Shchendrygin

<p><strong>Abstract.</strong> The paper considers the task solution of detection on two-dimensional images not only face, but head of a human regardless of the turn to the observer. Such task is also complicated by the fact that the image receiving at the input of the recognition algorithm may be noisy or captured in low light conditions. The minimum size of a person’s head in an image to be detected for is 10&amp;thinsp;&amp;times;&amp;thinsp;10 pixels. In the course of development, a dataset was prepared containing over 1000 labelled images of classrooms at BSTU n.a. V.G. Shukhov. The markup was carried out using a segmentation software tool specially developed by the authors. Three architectures of convolutional neural networks were trained for human head detection task: a fully convolutional neural network (FCN) with clustering, the Faster R-CNN architecture and the Mask R-CNN architecture. The third architecture works more than ten times slower than the first one, but it almost does not give false positives and has the precision and recall of head detection over 90% on both test and training samples. The Faster R-CNN architecture gives worse accuracy than Mask R-CNN, but it gives fewer false positives than FCN with clustering. Based on Mask R-CNN authors have developed software for human head detection on a lowquality image. It is two-level web-service with client and server modules. This software is used to detect and count people in the premises. The developed software works with IP cameras, which ensures its scalability for different practical computer vision applications.</p>

Author(s):  
Hussein Mohammed ◽  
Volker Märgner ◽  
Giovanni Ciotti

AbstractAutomatic pattern detection has become increasingly important for scholars in the humanities as the number of manuscripts that have been digitised has grown. Most of the state-of-the-art methods used for pattern detection depend on the availability of a large number of training samples, which are typically not available in the humanities as they involve tedious manual annotation by researchers (e.g. marking the location and size of words, drawings, seals and so on). This makes the applicability of such methods very limited within the field of manuscript research. We propose a learning-free approach based on a state-of-the-art Naïve Bayes Nearest-Neighbour classifier for the task of pattern detection in manuscript images. The method has already been successfully applied to an actual research question from South Asian studies about palm-leaf manuscripts. Furthermore, state-of-the-art results have been achieved on two extremely challenging datasets, namely the AMADI_LontarSet dataset of handwriting on palm leaves for word-spotting and the DocExplore dataset of medieval manuscripts for pattern detection. A performance analysis is provided as well in order to facilitate later comparisons by other researchers. Finally, an easy-to-use implementation of the proposed method is developed as a software tool and made freely available.


2007 ◽  
Vol 6 (s2) ◽  
pp. S427-S444 ◽  
Author(s):  
Sergiu Dascalu ◽  
Sermsak Buntha ◽  
Daniela Saru ◽  
Narayan Debnath

Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1565
Author(s):  
Junwen Liu ◽  
Yongjun Zhang ◽  
Jianbin Xie ◽  
Yan Wei ◽  
Zewei Wang ◽  
...  

Pedestrian detection for complex scenes suffers from pedestrian occlusion issues, such as occlusions between pedestrians. As well-known, compared with the variability of the human body, the shape of a human head and their shoulders changes minimally and has high stability. Therefore, head detection is an important research area in the field of pedestrian detection. The translational invariance of neural network enables us to design a deep convolutional neural network, which means that, even if the appearance and location of the target changes, it can still be recognized effectively. However, the problems of scale invariance and high miss detection rates for small targets still exist. In this paper, a feature extraction network DR-Net based on Darknet-53 is proposed to improve the information transmission rate between convolutional layers and to extract more semantic information. In addition, the MDC (mixed dilated convolution) with different sampling rates of dilated convolution is embedded to improve the detection rate of small targets. We evaluated our method on three publicly available datasets and achieved excellent results. The AP (Average Precision) value on the Brainwash dataset, HollywoodHeads dataset, and SCUT-HEAD dataset reached 92.1%, 84.8%, and 90% respectively.


Author(s):  
Tong Wu ◽  
Nikolas Martelaro ◽  
Simon Stent ◽  
Jorge Ortiz ◽  
Wendy Ju

This paper examines sensor fusion techniques for modeling opportunities for proactive speech-based in-car interfaces. We leverage the Is Now a Good Time (INAGT) dataset, which consists of automotive, physiological, and visual data collected from drivers who self-annotated responses to the question "Is now a good time?," indicating the opportunity to receive non-driving information during a 50-minute drive. We augment this original driver-annotated data with third-party annotations of perceived safety, in order to explore potential driver overconfidence. We show that fusing automotive, physiological, and visual data allows us to predict driver labels of availability, achieving an 0.874 F1-score by extracting statistically relevant features and training with our proposed deep neural network, PazNet. Using the same data and network, we achieve an 0.891 F1-score for predicting third-party labeled safe moments. We train these models to avoid false positives---determinations that it is a good time to interrupt when it is not---since false positives may cause driver distraction or service deactivation by the driver. Our analyses show that conservative models still leave many moments for interaction and show that most inopportune moments are short. This work lays a foundation for using sensor fusion models to predict when proactive speech systems should engage with drivers.


2018 ◽  
Vol 10 (8) ◽  
pp. 1277 ◽  
Author(s):  
Mikhail Urbazaev ◽  
Felix Cremer ◽  
Mirco Migliavacca ◽  
Markus Reichstein ◽  
Christiane Schmullius ◽  
...  

Information on the spatial distribution of forest structure parameters (e.g., aboveground biomass, vegetation height) are crucial for assessing terrestrial carbon stocks and emissions. In this study, we sought to assess the potential and merit of multi-temporal dual-polarised L-band observations for vegetation height estimation in tropical deciduous and evergreen forests of Mexico. We estimated vegetation height using dual-polarised L-band observations and a machine learning approach. We used airborne LiDAR-based vegetation height for model training and for result validation. We split LiDAR-based vegetation height into training and test data using two different approaches, i.e., considering and ignoring spatial autocorrelation between training and test data. Our results indicate that ignoring spatial autocorrelation leads to an overoptimistic model’s predictive performance. Accordingly, a spatial splitting of the reference data should be preferred in order to provide realistic retrieval accuracies. Moreover, the model’s predictive performance increases with an increasing number of spatial predictors and training samples, but saturates at a specific level (i.e., at 12 dual-polarised L-band backscatter measurements and at around 20% of all training samples). In consideration of spatial autocorrelation between training and test data, we determined an optimal number of L-band observations and training samples as a trade-off between retrieval accuracy and data collection effort. In summary, our study demonstrates the merit of multi-temporal ScanSAR L-band observations for estimation of vegetation height at a larger scale and provides a workflow for robust predictions of this parameter.


2014 ◽  
Vol 701-702 ◽  
pp. 433-436
Author(s):  
Pei Pei Duan ◽  
Hui Li ◽  
Qi Li

The high range resolution profile samples are numerous and sparse. But less radar target recognition algorithms based on high range resolution profiles (HRRP) employed the sparseness of HRRP samples. A new radar target recognition algorithm using a fast sparse decomposition method is presented here. This algorithm was to be carried out in three major steps. First, the Gabor redundant dictionary was partitioned according to its atom characteristics to decrease the atoms storage. Then, the matching pursuit algorithm was improved by the genetic algorithm and the fast cross-correlations calculation to accelerate training samples decomposition and generate the taxonomic dictionaries. Finally, the reconstruction errors of testing samples were used to recognize different radar targets. The simulations show that this method can resist noise disturbs and its recognition rate is high.


2013 ◽  
Vol 631-632 ◽  
pp. 1303-1308
Author(s):  
He Jin Yuan

A novel human action recognition algorithm based on key posture is proposed in this paper. In the method, the mesh features of each image in human action sequences are firstly calculated; then the key postures of the human mesh features are generated through k-medoids clustering algorithm; and the motion sequences are thus represented as vectors of key postures. The component of the vector is the occurrence number of the corresponding posture included in the action. For human action recognition, the observed action is firstly changed into key posture vector; then the correlevant coefficients to the training samples are calculated and the action which best matches the observed sequence is chosen as the final category. The experiments on Weizmann dataset demonstrate that our method is effective for human action recognition. The average recognition accuracy can exceed 90%.


Sign in / Sign up

Export Citation Format

Share Document