scholarly journals Learning Neural Bag-of-Matrix-Summarization with Riemannian Network

Author(s):  
Hong Liu ◽  
Jie Li ◽  
Yongjian Wu ◽  
Rongrong Ji

Symmetric positive defined (SPD) matrix has attracted increasing research focus in image/video analysis, which merits in capturing the Riemannian geometry in its structured 2D feature representation. However, computation in the vector space on SPD matrices cannot capture the geometric properties, which corrupts the classification performance. To this end, Riemannian based deep network has become a promising solution for SPD matrix classification, because of its excellence in performing non-linear learning over SPD matrix. Besides, Riemannian metric learning typically adopts a kNN classifier that cannot be extended to large-scale datasets, which limits its application in many time-efficient scenarios. In this paper, we propose a Bag-of-Matrix-Summarization (BoMS) method to be combined with Riemannian network, which handles the above issues towards highly efficient and scalable SPD feature representation. Our key innovation lies in the idea of summarizing data in a Riemannian geometric space instead of the vector space. First, the whole training set is compressed with a small number of matrix features to ensure high scalability. Second, given such a compressed set, a constant-length vector representation is extracted by efficiently measuring the distribution variations between the summarized data and the latent feature of the Riemannian network. Finally, the proposed BoMS descriptor is integrated into the Riemannian network, upon which the whole framework is end-to-end trained via matrix back-propagation. Experiments on four different classification tasks demonstrate the superior performance of the proposed method over the state-of-the-art methods.

Author(s):  
Pingyang Dai ◽  
Rongrong Ji ◽  
Haibin Wang ◽  
Qiong Wu ◽  
Yuyu Huang

Person re-identification (Re-ID) is an important task in video surveillance which automatically searches and identifies people across different cameras. Despite the extensive Re-ID progress in RGB cameras, few works have studied the Re-ID between infrared and RGB images, which is essentially a cross-modality problem and widely encountered in real-world scenarios. The key challenge lies in two folds, i.e., the lack of discriminative information to re-identify the same person between RGB and infrared modalities, and the difficulty to learn a robust metric towards such a large-scale cross-modality retrieval. In this paper, we tackle the above two challenges by proposing a novel cross-modality generative adversarial network (termed cmGAN). To handle the issue of insufficient discriminative information, we leverage the cutting-edge generative adversarial training to design our own discriminator to learn discriminative feature representation from different modalities. To handle the issue of large-scale cross-modality metric learning, we integrates both identification loss and cross-modality triplet loss, which minimize inter-class ambiguity while maximizing cross-modality similarity among instances. The entire cmGAN can be trained in an end-to-end manner by using standard deep neural network framework. We have quantized the performance of our work in the newly-released SYSU RGB-IR Re-ID benchmark, and have reported superior performance, i.e., Cumulative Match Characteristic curve (CMC) and Mean Average Precision (MAP), over the state-of-the-art works [Wu et al., 2017], respectively.


Video Analytics applications like security and surveillance face a critical problem of person re-identification abbreviated as re-ID. The last decade witnessed the emergence of large-scale datasets and deep learning methods to use these huge data volumes. Most current re-ID methods are classified into either image-based or video-based re-ID. Matching persons across multiple camera views have attracted lots of recent research attention. Feature representation and metric learning are major issues for person re-identification. The focus of re-ID work is now shifting towards developing end-to-end re-Id and tracking systems for practical use with dynamic datasets. Most previous works contributed to the significant progress of person re-identification on still images using image retrieval models. This survey considers the more informative and challenging video-based person re-ID problem, pedestrian re-ID in particular. Publicly available datasets and codes are listed as a part of this work. Current trends which include open re-identification systems, use of discriminative features and deep learning is marching towards new applications in security and surveillance, typically for tracking


Author(s):  
Xiawu Zheng ◽  
Rongrong Ji ◽  
Xiaoshuai Sun ◽  
Baochang Zhang ◽  
Yongjian Wu ◽  
...  

Recent advances on fine-grained image retrieval prefer learning convolutional neural network (CNN) with specific fullyconnect layer designed loss function for discriminative feature representation. Essentially, such loss should establish a robust metric to efficiently distinguish high-dimensional features within and outside fine-grained categories. To this end, the existing loss functions are defected in two aspects: (a) The feature relationship is encoded inside the training batch. Such a local scope leads to low accuracy. (b) The error is established by the mean square, which needs pairwise distance computation in training set and results in low efficiency. In this paper, we propose a novel metric learning scheme, termed Normalize-Scale Layer and Decorrelated Global Centralized Ranking Loss, which achieves extremely efficient and discriminative learning, i.e., 5× speedup over triplet loss and 12% recall boost on CARS196. Our method originates from the classic softmax loss, which has a global structure but does not directly optimize the distance metric as well as the inter/intra class distance. We tackle this issue through a hypersphere layer and a global centralized ranking loss with a pairwise decorrelated learning. In particular, we first propose a Normalize-Scale Layer to eliminate the gap between metric distance (for measuring distance in retrieval) and dot product (for dimension reduction in classification). Second, the relationship between features is encoded under a global centralized ranking loss, which targets at optimizing metric distance globally and accelerating learning procedure. Finally, the centers are further decorrelated by Gram-Schmidt process, leading to extreme efficiency (with 20 epochs in training procedure) and discriminability in feature learning. We have conducted quantitative evaluations on two fine-grained retrieval benchmark. The superior performance demonstrates the merits of the proposed approach over the state-of-the-arts.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2654
Author(s):  
Xue Ding ◽  
Ting Jiang ◽  
Yi Zhong ◽  
Yan Huang ◽  
Zhiwei Li

Wi-Fi-based device-free human activity recognition has recently become a vital underpinning for various emerging applications, ranging from the Internet of Things (IoT) to Human–Computer Interaction (HCI). Although this technology has been successfully demonstrated for location-dependent sensing, it relies on sufficient data samples for large-scale sensing, which is enormously labor-intensive and time-consuming. However, in real-world applications, location-independent sensing is crucial and indispensable. Therefore, how to alleviate adverse effects on recognition accuracy caused by location variations with the limited dataset is still an open question. To address this concern, we present a location-independent human activity recognition system based on Wi-Fi named WiLiMetaSensing. Specifically, we first leverage a Convolutional Neural Network and Long Short-Term Memory (CNN-LSTM) feature representation method to focus on location-independent characteristics. Then, in order to well transfer the model across different positions with limited data samples, a metric learning-based activity recognition method is proposed. Consequently, not only the generalization ability but also the transferable capability of the model would be significantly promoted. To fully validate the feasibility of the presented approach, extensive experiments have been conducted in an office with 24 testing locations. The evaluation results demonstrate that our method can achieve more than 90% in location-independent human activity recognition accuracy. More importantly, it can adapt well to the data samples with a small number of subcarriers and a low sampling rate.


2021 ◽  
Author(s):  
Zhen HUANG ◽  
Minxing Liao ◽  
Haoliang Zhang ◽  
Jiabing Zhang ◽  
Shaokun Ma ◽  
...  

Abstract Rock squeezing has a large influence on tunnel construction safety; thus, when designing and constructing tunnels it is highly important to use a reliable method for predicting tunnel squeezing from incomplete data. In this study, a combination SVM-BP (support vector machine-back-propagation) model is proposed to classify the deformation caused by surrounding rock squeezing. We designed different characteristic parameters and three types of classifiers (an SVM model, a BP model, and the proposed SVM-BP model) for the tunnel-squeezing prediction experiments and analysed the accuracy of predictions by different models and the influences of characteristic parameters on the prediction results. In contrast to other prediction methods, the proposed SVM-BP model is verified to be reliable. The results show that four characteristics: tunnel diameter (D), tunnel buried depth (H), rock quality index (Q) and support stiffness (K) reflect the effect of rock squeezing sufficiently for classification. The SVM-BP model combines the advantages of both an SVM and a BP neural network. It possesses flexible nonlinear modelling ability and the ability to perform parallel processing of large-scale information. Therefore, the SVM-BP model achieves better classification performance than do the SVM or BP models separately. Moreover, coupling D, H, and K has a significant impact on the predicted results of tunnel squeezing.


2020 ◽  
Vol 39 (6) ◽  
pp. 8823-8830
Author(s):  
Jiafeng Li ◽  
Hui Hu ◽  
Xiang Li ◽  
Qian Jin ◽  
Tianhao Huang

Under the influence of COVID-19, the economic benefits of shale gas development are greatly affected. With the large-scale development and utilization of shale gas in China, it is increasingly important to assess the economic impact of shale gas development. Therefore, this paper proposes a method for predicting the production of shale gas reservoirs, and uses back propagation (BP) neural network to nonlinearly fit reservoir reconstruction data to obtain shale gas well production forecasting models. Experiments show that compared with the traditional BP neural network, the proposed method can effectively improve the accuracy and stability of the prediction. There is a nonlinear correlation between reservoir reconstruction data and gas well production, which does not apply to traditional linear prediction methods


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


Author(s):  
Kristian Miok ◽  
Blaž Škrlj ◽  
Daniela Zaharie ◽  
Marko Robnik-Šikonja

AbstractHate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, have achieved superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the results of the proposed approach on hate speech detection problems in several languages. Additionally, we test whether affective dimensions can enhance the information extracted by the BERT model in hate speech classification. Our experiments show that Monte Carlo dropout provides a viable mechanism for reliability estimation in transformer networks. Used within the BERT model, it offers state-of-the-art classification performance and can detect less trusted predictions.


2021 ◽  
Vol 14 (16) ◽  
Author(s):  
Adnan A. Ismael ◽  
Saleh J. Suleiman ◽  
Raid Rafi Omar Al-Nima ◽  
Nadhir Al-Ansari

AbstractCylindrical weir shapes offer a steady-state overflow pattern, where the type of weirs can offer a simple design and provide the ease-to-pass floating debris. This study considers a coefficient of discharge (Cd) prediction for oblique cylindrical weir using three diameters, the first is of D1 = 0.11 m, the second is of D2 = 0.09 m, and the third is of D3 = 0.06.5 m, and three inclination angles with respect to channel axis, the first is of θ1 = 90 ͦ, the second is of θ2 = 45 ͦ, and the third is of θ3 = 30 ͦ. The Cd values for total of 56 experiments are estimated by using the radial basis function network (RBFN), in addition of comparing that with the back-propagation neural network (BPNN) and cascade-forward neural network (CFNN). Root mean square error (RMSE), mean square error (MSE), and correlation coefficient (CC) statics are used as metrics measurements. The RBFN attained superior performance comparing to the other neural networks of BPNN and CFNN. It is found that, for the training stage, the RBFN network benchmarked very small RMSE and MSE values of 1.35E-12 and 1.83E-24, respectively and for the testing stage, it also could benchmark very small RMSE and MSE values of 0.0082 and 6.80E-05, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 598
Author(s):  
Jean-François Pratte ◽  
Frédéric Nolet ◽  
Samuel Parent ◽  
Frédéric Vachon ◽  
Nicolas Roy ◽  
...  

Analog and digital SiPMs have revolutionized the field of radiation instrumentation by replacing both avalanche photodiodes and photomultiplier tubes in many applications. However, multiple applications require greater performance than the current SiPMs are capable of, for example timing resolution for time-of-flight positron emission tomography and time-of-flight computed tomography, and mitigation of the large output capacitance of SiPM array for large-scale time projection chambers for liquid argon and liquid xenon experiments. In this contribution, the case will be made that 3D photon-to-digital converters, also known as 3D digital SiPMs, have a potentially superior performance over analog and 2D digital SiPMs. A review of 3D photon-to-digital converters is presented along with various applications where they can make a difference, such as time-of-flight medical imaging systems and low-background experiments in noble liquids. Finally, a review of the key design choices that must be made to obtain an optimized 3D photon-to-digital converter for radiation instrumentation, more specifically the single-photon avalanche diode array, the CMOS technology, the quenching circuit, the time-to-digital converter, the digital signal processing and the system level integration, are discussed in detail.


Sign in / Sign up

Export Citation Format

Share Document