Human activity recognition is a key to a lot of applications such as healthcare and smart home. In this study, we provide a comprehensive survey on recent advances and challenges in human activity recognition (HAR) with deep learning. Although there are many surveys on HAR, they focused mainly on the taxonomy of HAR and reviewed the state-of-the-art HAR systems implemented with conventional machine learning methods. Recently, several works have also been done on reviewing studies that use deep models for HAR, whereas these works cover few deep models and their variants. There is still a need for a comprehensive and in-depth survey on HAR with recently developed deep learning methods.
This work aims to develop a novel fuzzy associator rule-based fuzzified deep convolutional neural network (FDCNN) architecture for the classification of smartphone sensor-based human activity recognition. This work mainly focuses on fusing the λmax method for weight initialization, as a data normalization technique, to achieve high accuracy of classification.
The major contributions of this work are modeled as FDCNN architecture, which is initially fused with a fuzzy logic based data aggregator. This work significantly focuses on normalizing the University of California, Irvine data set’s statistical parameters before feeding that to convolutional neural network layers. This FDCNN model with λmax method is instrumental in ensuring the faster convergence with improved performance accuracy in sensor based human activity recognition. Impact analysis is carried out to validate the appropriateness of the results with hyper-parameter tuning on the proposed FDCNN model with λmax method.
The effectiveness of the proposed FDCNN model with λmax method was outperformed than state-of-the-art models and attained with overall accuracy of 97.89% with overall F1 score as 0.9795.
The proposed fuzzy associate rule layer (FAL) layer is responsible for feature association based on fuzzy rules and regulates the uncertainty in the sensor data because of signal inferences and noises. Also, the normalized data is subjectively grouped based on the FAL kernel structure weights assigned with the λmax method.
Contributed a novel FDCNN architecture that can support those who are keen in advancing human activity recognition (HAR) recognition.
A novel FDCNN architecture is implemented with appropriate FAL kernel structures.
Due to the wide application of human activity recognition (HAR) in sports and health, a large number of HAR models based on deep learning have been proposed. However, many existing models ignore the effective extraction of spatial and temporal features of human activity data. This paper proposes a deep learning model based on residual block and bi-directional LSTM (BiLSTM). The model first extracts spatial features of multidimensional signals of MEMS inertial sensors automatically using the residual block, and then obtains the forward and backward dependencies of feature sequence using BiLSTM. Finally, the obtained features are fed into the Softmax layer to complete the human activity recognition. The optimal parameters of the model are obtained by experiments. A homemade dataset containing six common human activities of sitting, standing, walking, running, going upstairs and going downstairs is developed. The proposed model is evaluated on our dataset and two public datasets, WISDM and PAMAP2. The experimental results show that the proposed model achieves the accuracy of 96.95%, 97.32% and 97.15% on our dataset, WISDM and PAMAP2, respectively. Compared with some existing models, the proposed model has better performance and fewer parameters.
Advancement in smart sensing and computing technologies has provided a dynamic opportunity to develop intelligent systems for human activity monitoring and thus assisted living. Consequently, many researchers have put their efforts into implementing sensor-based activity recognition systems. However, recognizing people’s natural behavior and physical activities with diverse contexts is still a challenging problem because human physical activities are often distracted by changes in their surroundings/environments. Therefore, in addition to physical activity recognition, it is also vital to model and infer the user’s context information to realize human-environment interactions in a better way. Therefore, this research paper proposes a new idea for activity recognition in-the-wild, which entails modeling and identifying detailed human contexts (such as human activities, behavioral environments, and phone states) using portable accelerometer sensors. The proposed scheme offers a detailed/fine-grained representation of natural human activities with contexts, which is crucial for modeling human-environment interactions in context-aware applications/systems effectively. The proposed idea is validated using a series of experiments, and it achieved an average balanced accuracy of 89.43%, which proves its effectiveness.
AbstractHuman activity recognition (HAR) is a line of research whose goal is to design and develop automatic techniques for recognizing activities of daily living (ADLs) using signals from sensors. HAR is an active research filed in response to the ever-increasing need to collect information remotely related to ADLs for diagnostic and therapeutic purposes. Traditionally, HAR used environmental or wearable sensors to acquire signals and relied on traditional machine-learning techniques to classify ADLs. In recent years, HAR is moving towards the use of both wearable devices (such as smartphones or fitness trackers, since they are daily used by people and they include reliable inertial sensors), and deep learning techniques (given the encouraging results obtained in the area of computer vision). One of the major challenges related to HAR is population diversity, which makes difficult traditional machine-learning algorithms to generalize. Recently, researchers successfully attempted to address the problem by proposing techniques based on personalization combined with traditional machine learning. To date, no effort has been directed at investigating the benefits that personalization can bring in deep learning techniques in the HAR domain. The goal of our research is to verify if personalization applied to both traditional and deep learning techniques can lead to better performance than classical approaches (i.e., without personalization). The experiments were conducted on three datasets that are extensively used in the literature and that contain metadata related to the subjects. AdaBoost is the technique chosen for traditional machine learning, while convolutional neural network is the one chosen for deep learning. These techniques have shown to offer good performance. Personalization considers both the physical characteristics of the subjects and the inertial signals generated by the subjects. Results suggest that personalization is most effective when applied to traditional machine-learning techniques rather than to deep learning ones. Moreover, results show that deep learning without personalization performs better than any other methods experimented in the paper in those cases where the number of training samples is high and samples are heterogeneous (i.e., they represent a wider spectrum of the population). This suggests that traditional deep learning can be more effective, provided you have a large and heterogeneous dataset, intrinsically modeling the population diversity in the training process.
Multiple cameras are used to resolve occlusion problem that often occur in single-view human activity recognition. Based on the success of learning representation with deep neural networks (DNNs), recent works have proposed DNNs models to estimate human activity from multi-view inputs. However, currently available datasets are inadequate in training DNNs model to obtain high accuracy rate. Against such an issue, this study presents a DNNs model, trained by employing transfer learning and shared-weight techniques, to classify human activity from multiple cameras. The model comprised pre-trained convolutional neural networks (CNNs), attention layers, long short-term memory networks with residual learning (LSTMRes), and Softmax layers. The experimental results suggested that the proposed model could achieve a promising performance on challenging MVHAR datasets: IXMAS (97.27%) and i3DPost (96.87%). A competitive recognition rate was also observed in online classification.