Towards Single 2D Image-Level Self-Supervision for 3D Human Pose and Shape Estimation

Three-dimensional human pose and shape estimation is an important problem in the computer vision community, with numerous applications such as augmented reality, virtual reality, human computer interaction, and so on. However, training accurate 3D human pose and shape estimators based on deep learning approaches requires a large number of images and corresponding 3D ground-truth pose pairs, which are costly to collect. To relieve this constraint, various types of weakly or self-supervised pose estimation approaches have been proposed. Nevertheless, these methods still involve supervision signals, which require effort to collect, such as unpaired large-scale 3D ground truth data, a small subset of 3D labeled data, video priors, and so on. Often, they require installing equipment such as a calibrated multi-camera system to acquire strong multi-view priors. In this paper, we propose a self-supervised learning framework for 3D human pose and shape estimation that does not require other forms of supervision signals while using only single 2D images. Our framework inputs single 2D images, estimates human 3D meshes in the intermediate layers, and is trained to solve four types of self-supervision tasks (i.e., three image manipulation tasks and one neural rendering task) whose ground-truths are all based on the single 2D images themselves. Through experiments, we demonstrate the effectiveness of our approach on 3D human pose benchmark datasets (i.e., Human3.6M, 3DPW, and LSP), where we present the new state-of-the-art among weakly/self-supervised methods.

Download Full-text

A Methodology for Multi-Camera Surface-Shape Estimation of Deformable Unknown Objects

Robotics ◽

10.3390/robotics7040069 ◽

2018 ◽

Vol 7 (4) ◽

pp. 69 ◽

Cited By ~ 1

Author(s):

Evgeny Nuger ◽

Beno Benhabib

Keyword(s):

Particle Filtering ◽

Surface Deformation ◽

Vision System ◽

Three Dimensional ◽

Target Object ◽

Surface Shape ◽

Shape Estimation ◽

Formal Approach ◽

Camera System ◽

Unknown Objects

A novel methodology is proposed herein to estimate the three-dimensional (3D) surface shape of unknown, markerless deforming objects through a modular multi-camera vision system. The methodology is a generalized formal approach to shape estimation for a priori unknown objects. Accurate shape estimation is accomplished through a robust, adaptive particle filtering process. The estimation process yields a set of surface meshes representing the expected deformation of the target object. The methodology is based on the use of a multi-camera system, with a variable number of cameras, and range of object motions. The numerous simulations and experiments presented herein demonstrate the proposed methodology’s ability to accurately estimate the surface deformation of unknown objects, as well as its robustness to object loss under self-occlusion, and varying motion dynamics.

Download Full-text

A Product Feature Inference Model for Mining Implicit Customer Preferences Within Large Scale Social Media Networks

Volume 1B: 35th Computers and Information in Engineering Conference ◽

10.1115/detc2015-47225 ◽

2015 ◽

Cited By ~ 6

Author(s):

Suppawong Tuarob ◽

Conrad S. Tucker

Keyword(s):

Social Media ◽

Large Scale ◽

Ground Truth ◽

Inference Model ◽

Underlying Assumption ◽

Ground Truth Data ◽

Social Media Networks ◽

Customer Preferences ◽

Product Features ◽

Online Sources

The acquisition and mining of product feature data from online sources such as customer review websites and large scale social media networks is an emerging area of research. In many existing design methodologies that acquire product feature preferences form online sources, the underlying assumption is that product features expressed by customers are explicitly stated and readily observable to be mined using product feature extraction tools. In many scenarios however, product feature preferences expressed by customers are implicit in nature and do not directly map to engineering design targets. For example, a customer may implicitly state “wow I have to squint to read this on the screen”, when the explicit product feature may be a larger screen. The authors of this work propose an inference model that automatically assigns the most probable explicit product feature desired by a customer, given an implicit preference expressed. The algorithm iteratively refines its inference model by presenting a hypothesis and using ground truth data, determining its statistical validity. A case study involving smartphone product features expressed through Twitter networks is presented to demonstrate the effectiveness of the proposed methodology.

Download Full-text

CHANGING THE PRODUCTION PIPELINE – USE OF OBLIQUE AERIAL CAMERAS FOR MAPPING PURPOSES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b4-631-2016 ◽

2016 ◽

Vol XLI-B4 ◽

pp. 631-637

Author(s):

K. Moe ◽

I. Toschi ◽

D. Poli ◽

F. Lago ◽

C. Schreiner ◽

...

Keyword(s):

Urban Areas ◽

Point Cloud ◽

Reference Data ◽

Ground Truth ◽

Building Extraction ◽

Camera System ◽

Ground Truth Data ◽

Production Pipeline ◽

Dense Image ◽

Oblique Images

This paper discusses the potential of current photogrammetric multi-head oblique cameras, such as UltraCam Osprey, to improve the efficiency of standard photogrammetric methods for surveying applications like inventory surveys and topographic mapping for public administrations or private customers. <br><br> In 2015, Terra Messflug (TM), a subsidiary of Vermessung AVT ZT GmbH (Imst, Austria), has flown a number of urban areas in Austria, Czech Republic and Hungary with an UltraCam Osprey Prime multi-head camera system from Vexcel Imaging. In collaboration with FBK Trento (Italy), the data acquired at Imst (a small town in Tyrol, Austria) were analysed and processed to extract precise 3D topographic information. The Imst block comprises 780 images and covers an area of approx. 4.5 km by 1.5 km. Ground truth data is provided in the form of 6 GCPs and several check points surveyed with RTK GNSS. Besides, 3D building data obtained by photogrammetric stereo plotting from a 5 cm nadir flight and a LiDAR point cloud with 10 to 20 measurements per m² are available as reference data or for comparison. The photogrammetric workflow, from flight planning to Dense Image Matching (DIM) and 3D building extraction, is described together with the achieved accuracy. For each step, the differences and innovation with respect to standard photogrammetric procedures based on nadir images are shown, including high overlaps, improved vertical accuracy, and visibility of areas masked in the standard vertical views. Finally the advantages of using oblique images for inventory surveys are demonstrated.

Download Full-text

Anomalies in the Sky: Experiments with traffic densities and airport runway use

10.29007/3lks ◽

2019 ◽

Author(s):

Axel Tanner ◽

Martin Strohmeier

Keyword(s):

Anomaly Detection ◽

Gini Index ◽

Ground Truth ◽

Air Traffic ◽

Learning Approaches ◽

Traffic Data ◽

Traffic Patterns ◽

Ground Truth Data ◽

Political Events ◽

Airport Runway

Anomalies in the airspace can provide an indicator of critical events and changes which go beyond aviation. Devising techniques, which can detect abnormal patterns can provide intelligence and information ranging from weather to political events. This work presents our latest findings in detecting such anomalies in air traffic patterns using ADS-B data provided by the OpenSky network [8]. After discussion of specific problems in anomaly detection in air traffic data, we show an experiment in a regional setting, evaluating air traffic densities with the Gini index, and a second experiment investigating the runway use at Zurich airport. In the latter case, strong available ground truth data allows to better understand and confirm findings of different learning approaches.

Download Full-text

Comparison between two- and three-dimensional scoring of zebrafish response to psychoactive drugs: identifying when three-dimensional analysis is needed

PeerJ ◽

10.7717/peerj.7893 ◽

2019 ◽

Vol 7 ◽

pp. e7893 ◽

Cited By ~ 2

Author(s):

Simone Macrì ◽

Romain J.G. Clément ◽

Chiara Spinello ◽

Maurizio Porfiri

Keyword(s):

Time Course ◽

False Negative ◽

Three Dimensional ◽

Ground Truth ◽

Behavioral Pharmacology ◽

Experimental Session ◽

Behavioral Repertoire ◽

Ground Truth Data ◽

Negative Findings ◽

Data Yield

Zebrafish (Danio rerio) have recently emerged as a valuable laboratory species in the field of behavioral pharmacology, where they afford rapid and precise high-throughput drug screening. Although the behavioral repertoire of this species manifests along three-dimensional (3D), most of the efforts in behavioral pharmacology rely on two-dimensional (2D) projections acquired from a single overhead or front camera. We recently showed that, compared to a 3D scoring approach, 2D analyses could lead to inaccurate claims regarding individual and social behavior of drug-free experimental subjects. Here, we examined whether this conclusion extended to the field of behavioral pharmacology by phenotyping adult zebrafish, acutely exposed to citalopram (30, 50, and 100 mg/L) or ethanol (0.25%, 0.50%, and 1.00%), in the novel tank diving test over a 6-min experimental session. We observed that both compounds modulated the time course of general locomotion and anxiety-related profiles, the latter being represented by specific behaviors (erratic movements and freezing) and avoidance of anxiety-eliciting areas of the test tank (top half and distance from the side walls). We observed that 2D projections of 3D trajectories (ground truth data) may introduce a source of unwanted variation in zebrafish behavioral phenotyping. Predictably, both 2D views underestimate absolute levels of general locomotion. Additionally, while data obtained from a camera positioned on top of the experimental tank are similar to those obtained from a 3D reconstruction, 2D front view data yield false negative findings.

Download Full-text

Measuring Flood Discharge

Oxford Research Encyclopedia of Natural Hazard Science ◽

10.1093/acrefore/9780199389407.013.121 ◽

2017 ◽

Cited By ~ 1

Author(s):

Marian Muste ◽

Ton Hoitink

Keyword(s):

Real Time ◽

Water Resource Management ◽

Large Scale ◽

Water Cycle ◽

Ground Truth ◽

Flood Frequency ◽

Main Concern ◽

Essential Difference ◽

Discharge Data ◽

Ground Truth Data

With a continuous global increase in flood frequency and intensity, there is an immediate need for new science-based solutions for flood mitigation, resilience, and adaptation that can be quickly deployed in any flood-prone area. An integral part of these solutions is the availability of river discharge measurements delivered in real time with high spatiotemporal density and over large-scale areas. Stream stages and the associated discharges are the most perceivable variables of the water cycle and the ones that eventually determine the levels of hazard during floods. Consequently, the availability of discharge records (a.k.a. streamflows) is paramount for flood-risk management because they provide actionable information for organizing the activities before, during, and after floods, and they supply the data for planning and designing floodplain infrastructure. Moreover, the discharge records represent the ground-truth data for developing and continuously improving the accuracy of the hydrologic models used for forecasting streamflows. Acquiring discharge data for streams is critically important not only for flood forecasting and monitoring but also for many other practical uses, such as monitoring water abstractions for supporting decisions in various socioeconomic activities (from agriculture to industry, transportation, and recreation) and for ensuring healthy ecological flows. All these activities require knowledge of past, current, and future flows in rivers and streams. Given its importance, an ability to measure the flow in channels has preoccupied water users for millennia. Starting with the simplest volumetric methods to estimate flows, the measurement of discharge has evolved through continued innovation to sophisticated methods so that today we can continuously acquire and communicate the data in real time. There is no essential difference between the instruments and methods used to acquire streamflow data during normal conditions versus during floods. The measurements during floods are, however, complex, hazardous, and of limited accuracy compared with those acquired during normal flows. The essential differences in the configuration and operation of the instruments and methods for discharge estimation stem from the type of measurements they acquire—that is, discrete and autonomous measurements (i.e., measurements that can be taken any time any place) and those acquired continuously (i.e., estimates based on indirect methods developed for fixed locations). Regardless of the measurement situation and approach, the main concern of the data providers for flooding (as well as for other areas of water resource management) is the timely delivery of accurate discharge data at flood-prone locations across river basins.

Download Full-text

Enhancing unsupervised video-based vehicle tracking and modeling for traffic data collection

Canadian Journal of Civil Engineering ◽

10.1139/cjce-2019-0087 ◽

2020 ◽

Vol 47 (8) ◽

pp. 982-997

Author(s):

Mohamed H. Zaki ◽

Tarek Sayed ◽

Moataz Billeh

Keyword(s):

Data Collection ◽

Video Analysis ◽

Three Dimensional ◽

Ground Truth ◽

Traffic Data ◽

Ground Truth Data ◽

The Road ◽

Traffic Data Collection ◽

Bounding Boxes ◽

Automated Video Analysis

Video-based traffic analysis is a leading technology for streamlining transportation data collection. With traffic records from video cameras, unsupervised automated video analysis can detect various vehicle measures such as vehicle spatial coordinates and subsequently lane positions, speed, and other dynamic measures without the need of any physical interconnections to the road infrastructure. This paper contributes to the unsupervised automated video analysis by addressing two main shortcomings of the approach. The first objective is to alleviate tracking problems of over-segmentation and over-grouping by integrating region-based detection with feature-based tracking. This information, when combined with spatiotemporal constraints of grouping, can reduce the effects of these problems. This fusion approach offers a superior decision procedure for grouping objects and discriminating between trajectories of objects. The second objective is to model three-dimensional bounding boxes for the vehicles, leading to a better estimate of their geometry and consequently accurate measures of their position and travel information. This improvement leads to more precise measurement of traffic parameters such as average speed, gap time, and headway. The paper describes the various steps of the proposed improvements. It evaluates the effectiveness of the refinement process on data collected from traffic cameras in three different locations in Canada and validates the results with ground truth data. It illustrates the effectiveness of the improved unsupervised automated video analysis with a case study on 10 h of traffic data collection such as volume and headway measurements.

Download Full-text

Transfer Change Rules from Recurrent Fully Convolutional Networks for Hyperspectral Unmanned Aerial Vehicle Images without Ground Truth Data

Remote Sensing ◽

10.3390/rs12071099 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1099 ◽

Cited By ~ 2

Author(s):

Ahram Song ◽

Yongil Kim

Keyword(s):

Unmanned Aerial Vehicle ◽

Three Dimensional ◽

Ground Truth ◽

Training Data ◽

Fine Tuning ◽

Target Domain ◽

Ground Truth Data ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Aerial Vehicle

Change detection (CD) networks based on supervised learning have been used in diverse CD tasks. However, such supervised CD networks require a large amount of data and only use information from current images. In addition, it is time consuming to manually acquire the ground truth data for newly obtained images. Here, we proposed a novel method for CD in case of a lack of training data in an area near by another one with the available ground truth data. The proposed method automatically entails generating training data and fine-tuning the CD network. To detect changes in target images without ground truth data, the difference images were generated using spectral similarity measure, and the training data were selected via fuzzy c-means clustering. Recurrent fully convolutional networks with multiscale three-dimensional filters were used to extract objects of various sizes from unmanned aerial vehicle (UAV) images. The CD network was pre-trained on labeled source domain data; then, the network was fine-tuned on target images using generated training data. Two further CD networks were trained with a combined weighted loss function. The training data in the target domain were iteratively updated using he prediction map of the CD network. Experiments on two hyperspectral UAV datasets confirmed that the proposed method is capable of transferring change rules and improving CD results based on training data extracted in an unsupervised way.

Download Full-text

3D Human Pose Estimation Using Spatio-Temporal Networks with Explicit Occlusion Training

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6689 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10631-10638

Author(s):

Yu Cheng ◽

Bo Yang ◽

Bo Wang ◽

Robby T. Tan

Keyword(s):

Pose Estimation ◽

Ground Truth ◽

Video Data ◽

Training Data ◽

Human Pose Estimation ◽

Ground Truth Data ◽

Public Data ◽

Spatio Temporal ◽

Human Pose ◽

3D Human Pose Estimation

Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in the recent years. Generally, the performance of existing methods drops when the target person is too small/large, or the motion is too fast/slow relative to the scale and speed of the training data. Moreover, to our knowledge, many of these methods are not designed or trained under severe occlusion explicitly, making their performance on handling occlusion compromised. Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation. As humans in videos may appear in different scales and have various motion speeds, we apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multi-stride temporal convolutional networks (TCNs) to estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement. During training, we explicitly mask out some keypoints to simulate various occlusion cases, from minor to severe occlusion, so that our network can learn better and becomes robust to various degrees of occlusion. As there are limited 3D ground truth data, we further utilize 2D video data to inject a semi-supervised learning capability to our network. Experiments on public data sets validate the effectiveness of our method, and our ablation studies show the strengths of our network's individual submodules.

Download Full-text

Assessing the tropical forest cover change in northern parts of Sonitpur and Udalguri District of Assam, India

Scientific Reports ◽

10.1038/s41598-021-90595-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ranjit Mahato ◽

Gibji Nimasow ◽

Oyi Dai Nimasow ◽

Dhoni Bushi

Keyword(s):

Protected Areas ◽

National Park ◽

Large Scale ◽

Forest Cover ◽

Ground Truth ◽

Forest Cover Change ◽

Wildlife Sanctuary ◽

Remotely Sensed Data ◽

Ground Truth Data ◽

Increasing Trends

AbstractSonitpur and Udalguri district of Assam possess rich tropical forests with equally important faunal species. The Nameri National Park, Sonai-Rupai Wildlife Sanctuary, and other Reserved Forests are areas of attraction for tourists and wildlife lovers. However, these protected areas are reportedly facing the problem of encroachment and large-scale deforestation. Therefore, this study attempts to estimate the forest cover change in the area through integrating the remotely sensed data of 1990, 2000, 2010, and 2020 with the Geographic Information System. The Maximum Likelihood algorithm-based supervised classification shows acceptable agreement between the classified image and the ground truth data with an overall accuracy of about 96% and a Kappa coefficient of 0.95. The results reveal a forest cover loss of 7.47% from 1990 to 2000 and 7.11% from 2000 to 2010. However, there was a slight gain of 2.34% in forest cover from 2010 to 2020. The net change of forest to non-forest was 195.17 km2 in the last forty years. The forest transition map shows a declining trend of forest remained forest till 2010 and a slight increase after that. There was a considerable decline in the forest to non-forest (11.94% to 3.50%) from 2000–2010 to 2010–2020. Further, a perceptible gain was also observed in the non-forest to the forest during the last four decades. The overlay analysis of forest cover maps show an area of 460.76 km2 (28.89%) as forest (unchanged), 764.21 km2 (47.91%) as non-forest (unchanged), 282.67 km2 (17.72%) as deforestation and 87.50 km2 (5.48%) as afforestation. The study found hotspots of deforestation in the closest areas of National Park, Wildlife Sanctuary, and Reserved Forests due to encroachments for human habitation, agriculture, and timber/fuelwood extractions. Therefore, the study suggests an early declaration of these protected areas as Eco-Sensitive Zone to control the increasing trends of deforestation.

Download Full-text