scholarly journals A Dynamic Part-Attention Model for Person Re-Identification

Sensors ◽  
2019 ◽  
Vol 19 (9) ◽  
pp. 2080 ◽  
Author(s):  
Ziying Yao ◽  
Xinkai Wu ◽  
Zhongxia Xiong ◽  
Yalong Ma

Person re-identification (ReID) is gaining more attention due to its important applications in pedestrian tracking and security prevention. Recently developed part-based methods have proven beneficial for stronger and explicit feature descriptions, but how to find real significant parts and reduce miscorrelation between images to improve accuracy of ReID still leaves much room to improve. In this paper, we propose a dynamic part-attention (DPA) method based on masks, which aims to improve the use of variable attention parts. Particularly, a two-branch network with a dynamic loss function is designed to extract features of the global image and the parts of the body separately. With the comprehensive but targeting learning strategy, the proposed method can capture discriminative features based, but not depending on, masks, which guides the whole network to focus on body features more consciously and achieves more robust performance. Our method achieves rank-1 accuracy of 91.68% on public dataset Market1501, and experimental results on three public datasets indicate that the proposed method is effective and achieves favorable accuracy when compared with the state-of-the-art methods.

2020 ◽  
Vol 34 (07) ◽  
pp. 11077-11084
Author(s):  
Yung-Han Huang ◽  
Kuang-Jui Hsu ◽  
Shyh-Kang Jeng ◽  
Yen-Yu Lin

Video re-localization aims to localize a sub-sequence, called target segment, in an untrimmed reference video that is similar to a given query video. In this work, we propose an attention-based model to accomplish this task in a weakly supervised setting. Namely, we derive our CNN-based model without using the annotated locations of the target segments in reference videos. Our model contains three modules. First, it employs a pre-trained C3D network for feature extraction. Second, we design an attention mechanism to extract multiscale temporal features, which are then used to estimate the similarity between the query video and a reference video. Third, a localization layer detects where the target segment is in the reference video by determining whether each frame in the reference video is consistent with the query video. The resultant CNN model is derived based on the proposed co-attention loss which discriminatively separates the target segment from the reference video. This loss maximizes the similarity between the query video and the target segment while minimizing the similarity between the target segment and the rest of the reference video. Our model can be modified to fully supervised re-localization. Our method is evaluated on a public dataset and achieves the state-of-the-art performance under both weakly supervised and fully supervised settings.


2017 ◽  
Vol 1 (1) ◽  
pp. 115
Author(s):  
Sudar Kajin

Growth and development of the child have the nature of a thorough and intertwined relationships between components (health, nutrition, and environment). In general, child development can be grouped into three areas, namely cognitive, affective, and psychomotor, whereas biological growth which includes a change in the body structure. Body structure regarding the changes in bone structure, especially the long bones that have an impact on changes in body size, whereas changes in bodily functions is a result of hormonal changes that affect the physiological function .. The purpose of this development are: 1) Describe the product feasibility study التربية الجسمية for grade XI IPA at MAN I Mojokerto 2) Describe the development of learning tools using process skills can improve learning outcomes subjects التربية الجسمية class XI IPA at MAN I Mojokerto From the results of this development can be concluded: 1) results of expert validation and testing, the model approach process skills is fit for use for subjects of Physical Education, Sport and Health, because the products developed are not revised by experts but from the results of questionnaire of students stated that require revision are: (a) Improve the look model or change the learning strategy, and (b) improve the use of resources in implementing the model. 2) Product development learning tools using process skills can improve learning outcomes subjects التربية الجسمية class XI IPA at MAN I Mojokerto. From the class of the test increased learning completeness of Pre and Post Tests Tests are respectively 77.78% increase to 91.67%.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Shreeya Sriram ◽  
Shitij Avlani ◽  
Matthew P. Ward ◽  
Shreyas Sen

AbstractContinuous multi-channel monitoring of biopotential signals is vital in understanding the body as a whole, facilitating accurate models and predictions in neural research. The current state of the art in wireless technologies for untethered biopotential recordings rely on radiative electromagnetic (EM) fields. In such transmissions, only a small fraction of this energy is received since the EM fields are widely radiated resulting in lossy inefficient systems. Using the body as a communication medium (similar to a ’wire’) allows for the containment of the energy within the body, yielding order(s) of magnitude lower energy than radiative EM communication. In this work, we introduce Animal Body Communication (ABC), which utilizes the concept of using the body as a medium into the domain of untethered animal biopotential recording. This work, for the first time, develops the theory and models for animal body communication circuitry and channel loss. Using this theoretical model, a sub-inch$$^3$$ 3 [1″ × 1″ × 0.4″], custom-designed sensor node is built using off the shelf components which is capable of sensing and transmitting biopotential signals, through the body of the rat at significantly lower powers compared to traditional wireless transmissions. In-vivo experimental analysis proves that ABC successfully transmits acquired electrocardiogram (EKG) signals through the body with correlation $$>99\%$$ > 99 % when compared to traditional wireless communication modalities, with a 50$$\times$$ × reduction in power consumption.


2021 ◽  
Vol 29 ◽  
pp. 475-486
Author(s):  
Bohdan Petryshak ◽  
Illia Kachko ◽  
Mykola Maksymenko ◽  
Oles Dobosevych

BACKGROUND: Premature ventricular contraction (PVC) is among the most frequently occurring types of arrhythmias. Existing approaches for automated PVC identification suffer from a range of disadvantages related to hand-crafted features and benchmarking on datasets with a tiny sample of PVC beats. OBJECTIVE: The main objective is to address the drawbacks described above in the proposed framework, which takes a raw ECG signal as an input and localizes R peaks of the PVC beats. METHODS: Our method consists of two neural networks. First, an encoder-decoder architecture trained on PVC-rich dataset localizes the R peak of both Normal and anomalous heartbeats. Provided R peaks positions, our CardioIncNet model does the delineation of healthy versus PVC beats. RESULTS: We have performed an extensive evaluation of our pipeline with both single- and cross-dataset paradigms on three public datasets. Our approach results in over 0.99 and 0.979 F1-measure on both single- and cross-dataset paradigms for R peaks localization task and above 0.96 and 0.85 F1 score for the PVC beats classification task. CONCLUSIONS: We have shown a method that provides robust performance beyond the beats of Normal nature and clearly outperforms classical algorithms both in the case of a single and cross-dataset evaluation. We provide a Github1 repository for the reproduction of the results.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-23
Author(s):  
Bo Liu ◽  
Haowen Zhong ◽  
Yanshan Xiao

Multi-view classification aims at designing a multi-view learning strategy to train a classifier from multi-view data, which are easily collected in practice. Most of the existing works focus on multi-view classification by assuming the multi-view data are collected with precise information. However, we always collect the uncertain multi-view data due to the collection process is corrupted with noise in real-life application. In this case, this article proposes a novel approach, called uncertain multi-view learning with support vector machine (UMV-SVM) to cope with the problem of multi-view learning with uncertain data. The method first enforces the agreement among all the views to seek complementary information of multi-view data and takes the uncertainty of the multi-view data into consideration by modeling reachability area of the noise. Then it proposes an iterative framework to solve the proposed UMV-SVM model such that we can obtain the multi-view classifier for prediction. Extensive experiments on real-life datasets have shown that the proposed UMV-SVM can achieve a better performance for uncertain multi-view classification in comparison to the state-of-the-art multi-view classification methods.


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Yikui Zhai ◽  
He Cao ◽  
Wenbo Deng ◽  
Junying Gan ◽  
Vincenzo Piuri ◽  
...  

Because of the lack of discriminative face representations and scarcity of labeled training data, facial beauty prediction (FBP), which aims at assessing facial attractiveness automatically, has become a challenging pattern recognition problem. Inspired by recent promising work on fine-grained image classification using the multiscale architecture to extend the diversity of deep features, BeautyNet for unconstrained facial beauty prediction is proposed in this paper. Firstly, a multiscale network is adopted to improve the discriminative of face features. Secondly, to alleviate the computational burden of the multiscale architecture, MFM (max-feature-map) is utilized as an activation function which can not only lighten the network and speed network convergence but also benefit the performance. Finally, transfer learning strategy is introduced here to mitigate the overfitting phenomenon which is caused by the scarcity of labeled facial beauty samples and improves the proposed BeautyNet’s performance. Extensive experiments performed on LSFBD demonstrate that the proposed scheme outperforms the state-of-the-art methods, which can achieve 67.48% classification accuracy.


2022 ◽  
Vol 29 (2) ◽  
pp. 1-33
Author(s):  
Nigel Bosch ◽  
Sidney K. D'Mello

The ability to identify whether a user is “zoning out” (mind wandering) from video has many HCI (e.g., distance learning, high-stakes vigilance tasks). However, it remains unknown how well humans can perform this task, how they compare to automatic computerized approaches, and how a fusion of the two might improve accuracy. We analyzed videos of users’ faces and upper bodies recorded 10s prior to self-reported mind wandering (i.e., ground truth) while they engaged in a computerized reading task. We found that a state-of-the-art machine learning model had comparable accuracy to aggregated judgments of nine untrained human observers (area under receiver operating characteristic curve [AUC] = .598 versus .589). A fusion of the two (AUC = .644) outperformed each, presumably because each focused on complementary cues. Furthermore, adding more humans beyond 3–4 observers yielded diminishing returns. We discuss implications of human–computer fusion as a means to improve accuracy in complex tasks.


Author(s):  
Junyi Wu ◽  
Yan Huang ◽  
Qiang Wu ◽  
Zhipeng Gao ◽  
Jianqiang Zhao ◽  
...  

The task of person re-identification (re-ID) is to find the same pedestrian across non-overlapping camera views. Generally, the performance of person re-ID can be affected by background clutter. However, existing segmentation algorithms cannot obtain perfect foreground masks to cover the background information clearly. In addition, if the background is completely removed, some discriminative ID-related cues (i.e., backpack or companion) may be lost. In this article, we design a dual-stream network consisting of a Provider Stream (P-Stream) and a Receiver Stream (R-Stream). The R-Stream performs an a priori optimization operation on foreground information. The P-Stream acts as a pusher to guide the R-Stream to concentrate on foreground information and some useful ID-related cues in the background. The proposed dual-stream network can make full use of the a priori optimization and guided-learning strategy to learn encouraging foreground information and some useful ID-related information in the background. Our method achieves Rank-1 accuracy of 95.4% on Market-1501, 89.0% on DukeMTMC-reID, 78.9% on CUHK03 (labeled), and 75.4% on CUHK03 (detected), outperforming state-of-the-art methods.


2020 ◽  
Vol 34 (05) ◽  
pp. 9057-9064
Author(s):  
Bayu Trisedya ◽  
Jianzhong Qi ◽  
Rui Zhang

We study neural data-to-text generation. Specifically, we consider a target entity that is associated with a set of attributes. We aim to generate a sentence to describe the target entity. Previous studies use encoder-decoder frameworks where the encoder treats the input as a linear sequence and uses LSTM to encode the sequence. However, linearizing a set of attributes may not yield the proper order of the attributes, and hence leads the encoder to produce an improper context to generate a description. To handle disordered input, recent studies propose two-stage neural models that use pointer networks to generate a content-plan (i.e., content-planner) and use the content-plan as input for an encoder-decoder model (i.e., text generator). However, in two-stage models, the content-planner may yield an incomplete content-plan, due to missing one or more salient attributes in the generated content-plan. This will in turn cause the text generator to generate an incomplete description. To address these problems, we propose a novel attention model that exploits content-plan to highlight salient attributes in a proper order. The challenge of integrating a content-plan in the attention model of an encoder-decoder framework is to align the content-plan and the generated description. We handle this problem by devising a coverage mechanism to track the extent to which the content-plan is exposed in the previous decoding time-step, and hence it helps our proposed attention model select the attributes to be mentioned in the description in a proper order. Experimental results show that our model outperforms state-of-the-art baselines by up to 3% and 5% in terms of BLEU score on two real-world datasets, respectively.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Nikta Pournoori ◽  
Lauri Sydänheimo ◽  
Yahya Rahmat-Samii ◽  
Leena Ukkonen ◽  
Toni Björninen

We present a meandered triple-band planar-inverted-F antenna (PIFA) for integration into brain-implantable biotelemetric systems. The target applications are wireless data communication, far-field wireless power transfer, and switching control between sleep/wake-up mode at the Medical Device Radiocommunication Service (MedRadio) band (401–406 MHz) and Industrial, Scientific and Medical (ISM) bands (902–928 MHz and 2400–2483.5 MHz), respectively. By embedding meandered slots into the radiator and shorting it to the ground, we downsized the antenna to the volume of 11 × 20.5 × 1.8 mm3. We optimized the antenna using a 7-layer numerical human head model using full-wave electromagnetic field simulation. In the simulation, we placed the implant in the cerebrospinal fluid (CSF) at a depth of 13.25 mm from the body surface, which is deeper than in most works on implantable antennas. We manufactured and tested the antenna in a liquid phantom which we replicated in the simulator for further comparison. The measured gain of the antenna reached the state-of-the-art values of −43.6 dBi, −25.8 dBi, and −20.1 dBi at 402 MHz, 902 MHz, and 2400 MHz, respectively.


Sign in / Sign up

Export Citation Format

Share Document