Zero-Shot Learning for IMU-Based Activity Recognition Using Video Embeddings

Author(s):  
Catherine Tong ◽  
Jinchen Ge ◽  
Nicholas D. Lane

The Activity Recognition Chain generally precludes the challenging scenario of recognizing new activities that were unseen during training, despite this scenario being a practical and common one as users perform diverse activities at test time. A few prior works have adopted zero-shot learning methods for IMU-based activity recognition, which work by relating seen and unseen classes through an auxiliary semantic space. However, these methods usually rely heavily on a hand-crafted attribute space which is costly to define, or a learnt semantic space based on word embedding, which lacks motion-related information crucial for distinguishing IMU features. Instead, we propose a strategy to exploit videos of human activities to construct an informative semantic space. With our approach, knowledge from state-of-the-art video action recognition models is encoded into video embeddings to relate seen and unseen activity classes. Experiments on three public datasets find that our approach outperforms other learnt semantic spaces, with an additional desirable feature of scalability, as recognition performance is seen to scale with the amount of data used. More generally, our results indicate that exploiting information from the video domain for IMU-based tasks is a promising direction, with tangible returns in a zero-shot learning scenario.

Author(s):  
Tatsunori B. Hashimoto ◽  
David Alvarez-Melis ◽  
Tommi S. Jaakkola

Continuous word representations have been remarkably useful across NLP tasks but remain poorly understood. We ground word embeddings in semantic spaces studied in the cognitive-psychometric literature, taking these spaces as the primary objects to recover. To this end, we relate log co-occurrences of words in large corpora to semantic similarity assessments and show that co-occurrences are indeed consistent with an Euclidean semantic space hypothesis. Framing word embedding as metric recovery of a semantic space unifies existing word embedding algorithms, ties them to manifold learning, and demonstrates that existing algorithms are consistent metric recovery methods given co-occurrence counts from random walks. Furthermore, we propose a simple, principled, direct metric recovery algorithm that performs on par with the state-of-the-art word embedding and manifold learning methods. Finally, we complement recent focus on analogies by constructing two new inductive reasoning datasets—series completion and classification—and demonstrate that word embeddings can be used to solve them as well.


2022 ◽  
Vol 12 (2) ◽  
pp. 715
Author(s):  
Luodi Xie ◽  
Huimin Huang ◽  
Qing Du

Knowledge graph (KG) embedding has been widely studied to obtain low-dimensional representations for entities and relations. It serves as the basis for downstream tasks, such as KG completion and relation extraction. Traditional KG embedding techniques usually represent entities/relations as vectors or tensors, mapping them in different semantic spaces and ignoring the uncertainties. The affinities between entities and relations are ambiguous when they are not embedded in the same latent spaces. In this paper, we incorporate a co-embedding model for KG embedding, which learns low-dimensional representations of both entities and relations in the same semantic space. To address the issue of neglecting uncertainty for KG components, we propose a variational auto-encoder that represents KG components as Gaussian distributions. In addition, compared with previous methods, our method has the advantages of high quality and interpretability. Our experimental results on several benchmark datasets demonstrate our model’s superiority over the state-of-the-art baselines.


Psihologija ◽  
2017 ◽  
Vol 50 (4) ◽  
pp. 503-520 ◽  
Author(s):  
Marco Marelli

Distributional semantics has been for long a source of successful models in psycholinguistics, permitting to obtain semantic estimates for a large number of words in an automatic and fast way. However, resources in this respect remain scarce or limitedly accessible for languages different from English. The present paper describes WEISS (Word-Embeddings Italian Semantic Space), a distributional semantic model based on Italian. WEISS includes models of semantic representations that are trained adopting state-of-the-art word-embeddings methods, applying neural networks to induce distributed representations for lexical meanings. The resource is evaluated against two test sets, demonstrating that WEISS obtains a better performance with respect to a baseline encoding word associations. Moreover, an extensive qualitative analysis of the WEISS output provides examples of the model potentialities in capturing several semantic phenomena. Two variants of WEISS are released and made easily accessible via web through the SNAUT graphic interface.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1685
Author(s):  
Sakorn Mekruksavanich ◽  
Anuchit Jitpattanakul

Sensor-based human activity recognition (S-HAR) has become an important and high-impact topic of research within human-centered computing. In the last decade, successful applications of S-HAR have been presented through fruitful academic research and industrial applications, including for healthcare monitoring, smart home controlling, and daily sport tracking. However, the growing requirements of many current applications for recognizing complex human activities (CHA) have begun to attract the attention of the HAR research field when compared with simple human activities (SHA). S-HAR has shown that deep learning (DL), a type of machine learning based on complicated artificial neural networks, has a significant degree of recognition efficiency. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are two different types of DL methods that have been successfully applied to the S-HAR challenge in recent years. In this paper, we focused on four RNN-based DL models (LSTMs, BiLSTMs, GRUs, and BiGRUs) that performed complex activity recognition tasks. The efficiency of four hybrid DL models that combine convolutional layers with the efficient RNN-based models was also studied. Experimental studies on the UTwente dataset demonstrated that the suggested hybrid RNN-based models achieved a high level of recognition performance along with a variety of performance indicators, including accuracy, F1-score, and confusion matrix. The experimental results show that the hybrid DL model called CNN-BiGRU outperformed the other DL models with a high accuracy of 98.89% when using only complex activity data. Moreover, the CNN-BiGRU model also achieved the highest recognition performance in other scenarios (99.44% by using only simple activity data and 98.78% with a combination of simple and complex activities).


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3243
Author(s):  
Robert Jackermeier ◽  
Bernd Ludwig

In smartphone-based pedestrian navigation systems, detailed knowledge about user activity and device placement is a key information. Landmarks such as staircases or elevators can help the system in determining the user position when located inside buildings, and navigation instructions can be adapted to the current context in order to provide more meaningful assistance. Typically, most human activity recognition (HAR) approaches distinguish between general activities such as walking, standing or sitting. In this work, we investigate more specific activities that are tailored towards the use-case of pedestrian navigation, including different kinds of stationary and locomotion behavior. We first collect a dataset of 28 combinations of device placements and activities, in total consisting of over 6 h of data from three sensors. We then use LSTM-based machine learning (ML) methods to successfully train hierarchical classifiers that can distinguish between these placements and activities. Test results show that the accuracy of device placement classification (97.2%) is on par with a state-of-the-art benchmark in this dataset while being less resource-intensive on mobile devices. Activity recognition performance highly depends on the classification task and ranges from 62.6% to 98.7%, once again performing close to the benchmark. Finally, we demonstrate in a case study how to apply the hierarchical classifiers to experimental and naturalistic datasets in order to analyze activity patterns during the course of a typical navigation session and to investigate the correlation between user activity and device placement, thereby gaining insights into real-world navigation behavior.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 998
Author(s):  
Peng Zhang ◽  
Yi Bu ◽  
Peng Jiang ◽  
Xiaowen Shi ◽  
Bing Lun ◽  
...  

This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation.


2021 ◽  
Vol 15 (6) ◽  
pp. 1-17
Author(s):  
Chenglin Li ◽  
Carrie Lu Tong ◽  
Di Niu ◽  
Bei Jiang ◽  
Xiao Zuo ◽  
...  

Deep learning models for human activity recognition (HAR) based on sensor data have been heavily studied recently. However, the generalization ability of deep models on complex real-world HAR data is limited by the availability of high-quality labeled activity data, which are hard to obtain. In this article, we design a similarity embedding neural network that maps input sensor signals onto real vectors through carefully designed convolutional and Long Short-Term Memory (LSTM) layers. The embedding network is trained with a pairwise similarity loss, encouraging the clustering of samples from the same class in the embedded real space, and can be effectively trained on a small dataset and even on a noisy dataset with mislabeled samples. Based on the learned embeddings, we further propose both nonparametric and parametric approaches for activity recognition. Extensive evaluation based on two public datasets has shown that the proposed similarity embedding network significantly outperforms state-of-the-art deep models on HAR classification tasks, is robust to mislabeled samples in the training set, and can also be used to effectively denoise a noisy dataset.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 692
Author(s):  
Jingcheng Chen ◽  
Yining Sun ◽  
Shaoming Sun

Human activity recognition (HAR) is essential in many health-related fields. A variety of technologies based on different sensors have been developed for HAR. Among them, fusion from heterogeneous wearable sensors has been developed as it is portable, non-interventional and accurate for HAR. To be applied in real-time use with limited resources, the activity recognition system must be compact and reliable. This requirement can be achieved by feature selection (FS). By eliminating irrelevant and redundant features, the system burden is reduced with good classification performance (CP). This manuscript proposes a two-stage genetic algorithm-based feature selection algorithm with a fixed activation number (GFSFAN), which is implemented on the datasets with a variety of time, frequency and time-frequency domain features extracted from the collected raw time series of nine activities of daily living (ADL). Six classifiers are used to evaluate the effects of selected feature subsets from different FS algorithms on HAR performance. The results indicate that GFSFAN can achieve good CP with a small size. A sensor-to-segment coordinate calibration algorithm and lower-limb joint angle estimation algorithm are introduced. Experiments on the effect of the calibration and the introduction of joint angle on HAR shows that both of them can improve the CP.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ying Li ◽  
Hang Sun ◽  
Shiyao Feng ◽  
Qi Zhang ◽  
Siyu Han ◽  
...  

Abstract Background Long noncoding RNAs (lncRNAs) play important roles in multiple biological processes. Identifying LncRNA–protein interactions (LPIs) is key to understanding lncRNA functions. Although some LPIs computational methods have been developed, the LPIs prediction problem remains challenging. How to integrate multimodal features from more perspectives and build deep learning architectures with better recognition performance have always been the focus of research on LPIs. Results We present a novel multichannel capsule network framework to integrate multimodal features for LPI prediction, Capsule-LPI. Capsule-LPI integrates four groups of multimodal features, including sequence features, motif information, physicochemical properties and secondary structure features. Capsule-LPI is composed of four feature-learning subnetworks and one capsule subnetwork. Through comprehensive experimental comparisons and evaluations, we demonstrate that both multimodal features and the architecture of the multichannel capsule network can significantly improve the performance of LPI prediction. The experimental results show that Capsule-LPI performs better than the existing state-of-the-art tools. The precision of Capsule-LPI is 87.3%, which represents a 1.7% improvement. The F-value of Capsule-LPI is 92.2%, which represents a 1.4% improvement. Conclusions This study provides a novel and feasible LPI prediction tool based on the integration of multimodal features and a capsule network. A webserver (http://csbg-jlu.site/lpc/predict) is developed to be convenient for users.


Sign in / Sign up

Export Citation Format

Share Document