Improving novelty detection by self-supervised learning and channel attention mechanism

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Miao Tian ◽  
Ying Cui ◽  
Haixia Long ◽  
Junxia Li

Purpose In novelty detection, the autoencoder based image reconstruction strategy is one of the mainstream solutions. The basic idea is that once the autoencoder is trained on normal data, it has a low reconstruction error on normal data. However, when faced with complex natural images, the conventional pixel-level reconstruction becomes poor and does not show the promising results. This paper aims to provide a new method for improving the performance of novelty detection based autoencoder. Design/methodology/approach To solve the problem that conventional pixel-level reconstruction cannot effectively extract the global semantic information of the image, a novel model with the combination of attention mechanism and self-supervised learning method is proposed. First, an auxiliary task, reconstruct rotated image, is set to enable the network to learn global semantic feature information. Then, the channel attention mechanism is introduced to perform adaptive feature refinement on the intermediate feature map to optimize the correspondingly passed feature map. Findings Experimental results on three public data sets show that the proposed method has potential performance for novelty detection. Originality/value This study explores the ability of self-supervised learning methods and attention mechanism to extract features on a single class of images. In this way, the performance of novelty detection can be improved.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jiawei Lian ◽  
Junhong He ◽  
Yun Niu ◽  
Tianze Wang

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.


Author(s):  
Eugene Yujun Fu ◽  
Hong Va Leong ◽  
Grace Ngai ◽  
Stephen C.F. Chan

Purpose Social signal processing under affective computing aims at recognizing and extracting useful human social interaction patterns. Fight is a common social interaction in real life. A fight detection system finds wide applications. This paper aims to detect fights in a natural and low-cost manner. Design/methodology/approach Research works on fight detection are often based on visual features, demanding substantive computation and good video quality. In this paper, the authors propose an approach to detect fight events through motion analysis. Most existing works evaluated their algorithms on public data sets manifesting simulated fights, where the fights are acted out by actors. To evaluate real fights, the authors collected videos involving real fights to form a data set. Based on the two types of data sets, the authors evaluated the performance of their motion signal analysis algorithm, which was then compared with the state-of-the-art approach based on MoSIFT descriptors with Bag-of-Words mechanism, and basic motion signal analysis with Bag-of-Words. Findings The experimental results indicate that the proposed approach accurately detects fights in real scenarios and performs better than the MoSIFT approach. Originality/value By collecting and annotating real surveillance videos containing real fight events and augmenting with well-known data sets, the authors proposed, implemented and evaluated a low computation approach, comparing it with the state-of-the-art approach. The authors uncovered some fundamental differences between real and simulated fights and initiated a new study in discriminating real against simulated fight events, with very good performance.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ziming Zeng ◽  
Yu Shi ◽  
Lavinia Florentina Pieptea ◽  
Junhua Ding

Purpose Aspects extracted from the user’s historical records are widely used to define user’s fine-grained preferences for building interpretable recommendation systems. As the aspects were extracted from the historical records, the aspects that represent user’s negative preferences cannot be identified because of their absence from the records. However, these latent aspects are also as important as those aspects representing user’s positive preferences for building a recommendation system. This paper aims to identify the user’s positive preferences and negative preferences for building an interpretable recommendation. Design/methodology/approach First, high-frequency tags are selected as aspects to describe user preferences in aspect-level. Second, user positive and negative preferences are calculated according to the positive and negative preference model, and the interaction between similar aspects is adopted to address the aspect sparsity problem. Finally, an experiment is designed to evaluate the effectiveness of the model. The code and the experiment data link is: https://github.com/shiyu108/Recommendation-system Findings Experimental results show the proposed approach outperformed the state-of-the-art methods in widely used public data sets. These latent aspects are also as important as those aspects representing the user’s positive preferences for building a recommendation system. Originality/value This paper provides a new approach that identifies and uses not only users’ positive preferences but also negative preferences, which can capture user preference precisely. Besides, the proposed model provides good interpretability.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yan Xu ◽  
Hong Qin ◽  
Jiani Huang ◽  
Yanyun Wang

Purpose Conventional learning-based visual odometry (VO) systems usually use convolutional neural networks (CNN) to extract features, where some important context-related and attention-holding global features might be ignored. Without essential global features, VO system will be sensitive to various environmental perturbations. The purpose of this paper is to design a novel learning-based framework that aims to improve accuracy of learning-based VO without decreasing the generalization ability. Design/methodology/approach Instead of CNN, a context-gated convolution is adopted to build an end-to-end learning framework, which enables convolutional layers that dynamically capture representative local patterns and composes local features of interest under the guidance of global context. In addition, an attention mechanism module is introduced to further improve learning ability and enhance robustness and generalization ability of the VO system. Findings The proposed system is evaluated on the public data set KITTI and the self-collected data sets of our college building, where it shows competitive performance compared with some classical and state-of-the-art learning-based methods. Quantitative experimental results on the public data set KITTI show that compared with CNN-based VO methods, the average translational error and rotational error of all the test sequences are reduced by 45.63% and 37.22%, respectively. Originality/value The main contribution of this paper is that an end-to-end deep context gate convolutional VO system based on lightweight attention mechanism is proposed, which effectively improves the accuracy compared with other learning-based methods.


2019 ◽  
Vol 8 (3) ◽  
pp. 177-186
Author(s):  
Rokas Jurevičius ◽  
Virginijus Marcinkevičius

Purpose The purpose of this paper is to present a new data set of aerial imagery from robotics simulator (AIR). AIR data set aims to provide a starting point for localization system development and to become a typical benchmark for accuracy comparison of map-based localization algorithms, visual odometry and SLAM for high-altitude flights. Design/methodology/approach The presented data set contains over 100,000 aerial images captured from Gazebo robotics simulator using orthophoto maps as a ground plane. Flights with three different trajectories are performed on maps from urban and forest environment at different altitudes, totaling over 33 kilometers of flight distance. Findings The review of previous research studies show that the presented data set is the largest currently available public data set with downward facing camera imagery. Originality/value This paper presents the problem of missing publicly available data sets for high-altitude (100‒3,000 meters) UAV flights; the current state-of-the-art research studies performed to develop map-based localization system for UAVs depend on real-life test flights and custom-simulated data sets for accuracy evaluation of the algorithms. The presented new data set solves this problem and aims to help the researchers to improve and benchmark new algorithms for high-altitude flights.


Sensor Review ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Vishakha Pareek ◽  
Santanu Chaudhury ◽  
Sanjay Singh

Purpose The electronic nose is an array of chemical or gas sensors and associated with a pattern-recognition framework competent in identifying and classifying odorant or non-odorant and simple or complex gases. Despite more than 30 years of research, the robust e-nose device is still limited. Most of the challenges towards reliable e-nose devices are associated with the non-stationary environment and non-stationary sensor behaviour. Data distribution of sensor array response evolves with time, referred to as non-stationarity. The purpose of this paper is to provide a comprehensive introduction to challenges related to non-stationarity in e-nose design and to review the existing literature from an application, system and algorithm perspective to provide an integrated and practical view. Design/methodology/approach The authors discuss the non-stationary data in general and the challenges related to the non-stationarity environment in e-nose design or non-stationary sensor behaviour. The challenges are categorised and discussed with the perspective of learning with data obtained from the sensor systems. Later, the e-nose technology is reviewed with the system, application and algorithmic point of view to discuss the current status. Findings The discussed challenges in e-nose design will be beneficial for researchers, as well as practitioners as it presents a comprehensive view on multiple aspects of non-stationary learning, system, algorithms and applications for e-nose. The paper presents a review of the pattern-recognition techniques, public data sets that are commonly referred to as olfactory research. Generic techniques for learning in the non-stationary environment are also presented. The authors discuss the future direction of research and major open problems related to handling non-stationarity in e-nose design. Originality/value The authors first time review the existing literature related to learning with e-nose in a non-stationary environment and existing generic pattern-recognition algorithms for learning in the non-stationary environment to bridge the gap between these two. The authors also present details of publicly available sensor array data sets, which will benefit the upcoming researchers in this field. The authors further emphasise several open problems and future directions, which should be considered to provide efficient solutions that can handle non-stationarity to make e-nose the next everyday device.


2004 ◽  
Vol 101 (Supplement3) ◽  
pp. 326-333 ◽  
Author(s):  
Klaus D. Hamm ◽  
Gunnar Surber ◽  
Michael Schmücking ◽  
Reinhard E. Wurm ◽  
Rene Aschenbach ◽  
...  

Object. Innovative new software solutions may enable image fusion to produce the desired data superposition for precise target definition and follow-up studies in radiosurgery/stereotactic radiotherapy in patients with intracranial lesions. The aim is to integrate the anatomical and functional information completely into the radiation treatment planning and to achieve an exact comparison for follow-up examinations. Special conditions and advantages of BrainLAB's fully automatic image fusion system are evaluated and described for this purpose. Methods. In 458 patients, the radiation treatment planning and some follow-up studies were performed using an automatic image fusion technique involving the use of different imaging modalities. Each fusion was visually checked and corrected as necessary. The computerized tomography (CT) scans for radiation treatment planning (slice thickness 1.25 mm), as well as stereotactic angiography for arteriovenous malformations, were acquired using head fixation with stereotactic arc or, in the case of stereotactic radiotherapy, with a relocatable stereotactic mask. Different magnetic resonance (MR) imaging sequences (T1, T2, and fluid-attenuated inversion-recovery images) and positron emission tomography (PET) scans were obtained without head fixation. Fusion results and the effects on radiation treatment planning and follow-up studies were analyzed. The precision level of the results of the automatic fusion depended primarily on the image quality, especially the slice thickness and the field homogeneity when using MR images, as well as on patient movement during data acquisition. Fully automated image fusion of different MR, CT, and PET studies was performed for each patient. Only in a few cases was it necessary to correct the fusion manually after visual evaluation. These corrections were minor and did not materially affect treatment planning. High-quality fusion of thin slices of a region of interest with a complete head data set could be performed easily. The target volume for radiation treatment planning could be accurately delineated using multimodal information provided by CT, MR, angiography, and PET studies. The fusion of follow-up image data sets yielded results that could be successfully compared and quantitatively evaluated. Conclusions. Depending on the quality of the originally acquired image, automated image fusion can be a very valuable tool, allowing for fast (∼ 1–2 minute) and precise fusion of all relevant data sets. Fused multimodality imaging improves the target volume definition for radiation treatment planning. High-quality follow-up image data sets should be acquired for image fusion to provide exactly comparable slices and volumetric results that will contribute to quality contol.


2019 ◽  
Vol 45 (9) ◽  
pp. 1183-1198
Author(s):  
Gaurav S. Chauhan ◽  
Pradip Banerjee

Purpose Recent papers on target capital structure show that debt ratio seems to vary widely in space and time, implying that the functional specifications of target debt ratios are of little empirical use. Further, target behavior cannot be adjudged correctly using debt ratios, as they could revert due to mechanical reasons. The purpose of this paper is to develop an alternative testing strategy to test the target capital structure. Design/methodology/approach The authors make use of a major “shock” to the debt ratios as an event and think of a subsequent reversion as a movement toward a mean or target debt ratio. By doing this, the authors no longer need to identify target debt ratios as a function of firm-specific variables or any other rigid functional form. Findings Similar to the broad empirical evidence in developed economies, there is no perceptible and systematic mean reversion by Indian firms. However, unlike developed countries, proportionate usage of debt to finance firms’ marginal financing deficits is extensive; equity is used rather sparingly. Research limitations/implications The trade-off theory could be convincingly refuted at least for the emerging market of India. The paper here stimulated further research on finding reasons for specific financing behavior of emerging market firms. Practical implications The results show that the firms’ financing choices are not only depending on their own firm’s specific variables but also on the financial markets in which they operate. Originality/value This study attempts to assess mean reversion in debt ratios in a unique but reassuring manner. The results are confirmed by extensive calibration of the testing strategy using simulated data sets.


2021 ◽  
Vol 13 (14) ◽  
pp. 2686
Author(s):  
Di Wei ◽  
Yuang Du ◽  
Lan Du ◽  
Lu Li

The existing Synthetic Aperture Radar (SAR) image target detection methods based on convolutional neural networks (CNNs) have achieved remarkable performance, but these methods require a large number of target-level labeled training samples to train the network. Moreover, some clutter is very similar to targets in SAR images with complex scenes, making the target detection task very difficult. Therefore, a SAR target detection network based on a semi-supervised learning and attention mechanism is proposed in this paper. Since the image-level label simply marks whether the image contains the target of interest or not, which is easier to be labeled than the target-level label, the proposed method uses a small number of target-level labeled training samples and a large number of image-level labeled training samples to train the network with a semi-supervised learning algorithm. The proposed network consists of a detection branch and a scene recognition branch with a feature extraction module and an attention module shared between these two branches. The feature extraction module can extract the deep features of the input SAR images, and the attention module can guide the network to focus on the target of interest while suppressing the clutter. During the semi-supervised learning process, the target-level labeled training samples will pass through the detection branch, while the image-level labeled training samples will pass through the scene recognition branch. During the test process, considering the help of global scene information in SAR images for detection, a novel coarse-to-fine detection procedure is proposed. After the coarse scene recognition determining whether the input SAR image contains the target of interest or not, the fine target detection is performed on the image that may contain the target. The experimental results based on the measured SAR dataset demonstrate that the proposed method can achieve better performance than the existing methods.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-24
Author(s):  
Yaojin Lin ◽  
Qinghua Hu ◽  
Jinghua Liu ◽  
Xingquan Zhu ◽  
Xindong Wu

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.


Sign in / Sign up

Export Citation Format

Share Document