scholarly journals I2DNet - Design and real-time evaluation of an appearance-based gaze estimation system

2021 ◽  
Vol 14 (4) ◽  
Author(s):  
L R D Murthy ◽  
Siddhi Brahmbhatt ◽  
Somnath Arjun ◽  
Pradipta Biswas

Gaze estimation problem can be addressed using either model-based or appearance-based approaches. Model-based approaches rely on features extracted from eye images to fit a 3D eye-ball model to obtain gaze point estimate while appearance-based methods attempt to directly map captured eye images to gaze point without any handcrafted features. Recently, availability of large datasets and novel deep learning techniques made appearance-based methods achieve superior accuracy than model-based approaches. However, many appearance-based gaze estimation systems perform well in within-dataset validation but fail to provide the same degree of accuracy in cross-dataset evaluation. Hence, it is still unclear how well the current state-of-the-art approaches perform in real-time in an interactive setting on unseen users. This paper proposes I2DNet, a novel architecture aimed to improve subject-independent gaze estimation accuracy that achieved a state-of-the-art 4.3 and 8.4 degree mean angle error on the MPIIGaze and RT-Gene datasets respectively. We have evaluated the proposed system as a gaze-controlled interface in real-time for a 9-block pointing and selection task and compared it with Webgazer.js and OpenFace 2.0. We have conducted a user study with 16 participants, and our proposed system reduces selection time and the number of missed selections statistically significantly compared to other two systems.

2018 ◽  
Vol 9 (1) ◽  
pp. 6-18 ◽  
Author(s):  
Dario Cazzato ◽  
Fabio Dominio ◽  
Roberto Manduchi ◽  
Silvia M. Castro

Abstract Automatic gaze estimation not based on commercial and expensive eye tracking hardware solutions can enable several applications in the fields of human computer interaction (HCI) and human behavior analysis. It is therefore not surprising that several related techniques and methods have been investigated in recent years. However, very few camera-based systems proposed in the literature are both real-time and robust. In this work, we propose a real-time user-calibration-free gaze estimation system that does not need person-dependent calibration, can deal with illumination changes and head pose variations, and can work with a wide range of distances from the camera. Our solution is based on a 3-D appearance-based method that processes the images from a built-in laptop camera. Real-time performance is obtained by combining head pose information with geometrical eye features to train a machine learning algorithm. Our method has been validated on a data set of images of users in natural environments, and shows promising results. The possibility of a real-time implementation, combined with the good quality of gaze tracking, make this system suitable for various HCI applications.


Author(s):  
William Prescott

This paper will investigate the use of large scale multibody dynamics (MBD) models for real-time vehicle simulation. Current state of the art in the real-time solution of vehicle uses 15 degree of freedom models, but there is a need for higher-fidelity systems. To increase the fidelity of models uses this paper will propose the use of the following techniques: implicit integration, parallel processing and co-simulation in a real-time environment.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Juan J. Lastra-Díaz ◽  
Alicia Lara-Clares ◽  
Ana Garcia-Serrano

Abstract Background Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. Results To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra’s algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. Conclusions We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.


2019 ◽  
Vol 11 (9) ◽  
pp. 1128 ◽  
Author(s):  
Maryam Rahnemoonfar ◽  
Dugan Dobbs ◽  
Masoud Yari ◽  
Michael J. Starek

Recent deep-learning counting techniques revolve around two distinct features of data—sparse data, which favors detection networks, or dense data where density map networks are used. Both techniques fail to address a third scenario, where dense objects are sparsely located. Raw aerial images represent sparse distributions of data in most situations. To address this issue, we propose a novel and exceedingly portable end-to-end model, DisCountNet, and an example dataset to test it on. DisCountNet is a two-stage network that uses theories from both detection and heat-map networks to provide a simple yet powerful design. The first stage, DiscNet, operates on the theory of coarse detection, but does so by converting a rich and high-resolution image into a sparse representation where only important information is encoded. Following this, CountNet operates on the dense regions of the sparse matrix to generate a density map, which provides fine locations and count predictions on densities of objects. Comparing the proposed network to current state-of-the-art networks, we find that we can maintain competitive performance while using a fraction of the computational complexity, resulting in a real-time solution.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Junfeng Yang ◽  
Yuwen Huang ◽  
Fuxian Huang ◽  
Gongping Yang

Photoplethysmography (PPG) biometric recognition has recently received considerable attention and is considered to be a promising biometric trait. Although some promising results on PPG biometric recognition have been reported, challenges in noise sensitivity and poor robustness remain. To address these issues, a PPG biometric recognition framework is presented in this article, that is, a PPG biometric recognition model based on a sparse softmax vector and k-nearest neighbor. First, raw PPG data are rerepresented by sliding window scanning. Second, three-layer features are extracted, and the features of each layer are represented by a sparse softmax vector. In the first layer, the features are extracted by PPG data as a whole. In the second layer, all the PPG data are divided into four subregions, then four subfeatures are generated by extracting features from the four subregions, and finally, the four subfeatures are averaged as the second layer features. In the third layer, all the PPG data are divided into 16 subregions, then 16 subfeatures are generated by extracting features from the 16 subregions, and finally, the 16 subfeatures are averaged as the third layer features. Finally, the features with first, second, and third layers are combined into three-layer features. Extensive experiments were conducted on three PPG datasets, and it was found that the proposed method can achieve a recognition rate of 99.95%, 97.21%, and 99.92% on the respective sets. The results demonstrate that the proposed method can outperform current state-of-the-art methods in terms of accuracy.


2021 ◽  
Vol 15 (02) ◽  
pp. 161-187
Author(s):  
Olav A. Nergård Rongved ◽  
Steven A. Hicks ◽  
Vajira Thambawita ◽  
Håkon K. Stensland ◽  
Evi Zouganeli ◽  
...  

Developing systems for the automatic detection of events in video is a task which has gained attention in many areas including sports. More specifically, event detection for soccer videos has been studied widely in the literature. However, there are still a number of shortcomings in the state-of-the-art such as high latency, making it challenging to operate at the live edge. In this paper, we present an algorithm to detect events in soccer videos in real time, using 3D convolutional neural networks. We test our algorithm on three different datasets from SoccerNet, the Swedish Allsvenskan, and the Norwegian Eliteserien. Overall, the results show that we can detect events with high recall, low latency, and accurate time estimation. The trade-off is a slightly lower precision compared to the current state-of-the-art, which has higher latency and performs better when a less accurate time estimation can be accepted. In addition to the presented algorithm, we perform an extensive ablation study on how the different parts of the training pipeline affect the final results.


2021 ◽  
Vol 178 (1-2) ◽  
pp. 31-57
Author(s):  
Franck Cassez ◽  
Peter Gjøl Jensen ◽  
Kim Guldstrand Larsen

We address the safety verification and synthesis problems for real-time systems. We introduce real-time programs that are made of instructions that can perform assignments to discrete and real-valued variables. They are general enough to capture interesting classes of timed systems such as timed automata, stopwatch automata, time(d) Petri nets and hybrid automata. We propose a semi-algorithm using refinement of trace abstractions to solve both the reachability verification problem and the parameter synthesis problem for real-time programs. All of the algorithms proposed have been implemented and we have conducted a series of experiments, comparing the performance of our new approach to state-of-the-art tools in classical reachability, robustness analysis and parameter synthesis for timed systems. We show that our new method provides solutions to problems which are unsolvable by the current state-of-the-art tools.


2021 ◽  
Vol 13 (19) ◽  
pp. 3836
Author(s):  
Clément Dechesne ◽  
Pierre Lassalle ◽  
Sébastien Lefèvre

In recent years, numerous deep learning techniques have been proposed to tackle the semantic segmentation of aerial and satellite images, increase trust in the leaderboards of main scientific contests and represent the current state-of-the-art. Nevertheless, despite their promising results, these state-of-the-art techniques are still unable to provide results with the level of accuracy sought in real applications, i.e., in operational settings. Thus, it is mandatory to qualify these segmentation results and estimate the uncertainty brought about by a deep network. In this work, we address uncertainty estimations in semantic segmentation. To do this, we relied on a Bayesian deep learning method, based on Monte Carlo Dropout, which allows us to derive uncertainty metrics along with the semantic segmentation. Built on the most widespread U-Net architecture, our model achieves semantic segmentation with high accuracy on several state-of-the-art datasets. More importantly, uncertainty maps are also derived from our model. While they allow for the performance of a sounder qualitative evaluation of the segmentation results, they also include valuable information to improve the reference databases.


2019 ◽  
Vol 11 (12) ◽  
pp. 1499 ◽  
Author(s):  
David Griffiths ◽  
Jan Boehm

Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-the-art deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches, including RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.


Sign in / Sign up

Export Citation Format

Share Document