scholarly journals Spatial and temporal learning representation for end-to-end recording device identification

Author(s):  
Chunyan Zeng ◽  
Dongliang Zhu ◽  
Zhifeng Wang ◽  
Minghu Wu ◽  
Wei Xiong ◽  
...  

AbstractDeep learning techniques have achieved specific results in recording device source identification. The recording device source features include spatial information and certain temporal information. However, most recording device source identification methods based on deep learning only use spatial representation learning from recording device source features, which cannot make full use of recording device source information. Therefore, in this paper, to fully explore the spatial information and temporal information of recording device source, we propose a new method for recording device source identification based on the fusion of spatial feature information and temporal feature information by using an end-to-end framework. From a feature perspective, we designed two kinds of networks to extract recording device source spatial and temporal information. Afterward, we use the attention mechanism to adaptively assign the weight of spatial information and temporal information to obtain fusion features. From a model perspective, our model uses an end-to-end framework to learn the deep representation from spatial feature and temporal feature and train using deep and shallow loss to joint optimize our network. This method is compared with our previous work and baseline system. The results show that the proposed method is better than our previous work and baseline system under general conditions.

2021 ◽  
Vol 13 (9) ◽  
pp. 1732
Author(s):  
Hadis Madani ◽  
Kenneth McIsaac

Pixel-wise classification of hyperspectral images (HSIs) from remote sensing data is a common approach for extracting information about scenes. In recent years, approaches based on deep learning techniques have gained wide applicability. An HSI dataset can be viewed either as a collection of images, each one captured at a different wavelength, or as a collection of spectra, each one associated with a specific point (pixel). Enhanced classification accuracy is enabled if the spectral and spatial information are combined in the input vector. This allows simultaneous classification according to spectral type but also according to geometric relationships. In this study, we proposed a novel spatial feature vector which improves accuracies in pixel-wise classification. Our proposed feature vector is based on the distance transform of the pixels with respect to the dominant edges in the input HSI. In other words, we allow the location of pixels within geometric subdivisions of the dataset to modify the contribution of each pixel to the spatial feature vector. Moreover, we used the extended multi attribute profile (EMAP) features to add more geometric features to the proposed spatial feature vector. We have performed experiments with three hyperspectral datasets. In addition to the Salinas and University of Pavia datasets, which are commonly used in HSI research, we include samples from our Surrey BC dataset. Our proposed method results compares favorably to traditional algorithms as well as to some recently published deep learning-based algorithms.


2020 ◽  
Vol 10 (3) ◽  
pp. 966
Author(s):  
Zeyu Jiao ◽  
Guozhu Jia ◽  
Yingjie Cai

In this study, we consider fully automated action recognition based on deep learning in the industrial environment. In contrast to most existing methods, which rely on professional knowledge to construct complex hand-crafted features, or only use basic deep-learning methods, such as convolutional neural networks (CNNs), to extract information from images in the production process, we exploit a novel and effective method, which integrates multiple deep-learning networks including CNNs, spatial transformer networks (STNs), and graph convolutional networks (GCNs) to process video data in industrial workflows. The proposed method extracts both spatial and temporal information from video data. The spatial information is extracted by estimating the human pose of each frame, and the skeleton image of the human body in each frame is obtained. Furthermore, multi-frame skeleton images are processed by GCN to obtain temporal information, meaning the action recognition results are predicted automatically. By training on a large human action dataset, Kinetics, we apply the proposed method to the real-world industrial environment and achieve superior performance compared with the existing methods.


2020 ◽  
Vol 16 (4) ◽  
pp. 413-425
Author(s):  
Chunyan Zeng ◽  
Dongliang Zhu ◽  
Zhifeng Wang ◽  
Zhenghui Wang ◽  
Nan Zhao ◽  
...  

Purpose Most source recording device identification models for Web media forensics are based on a single feature to complete the identification task and often have the disadvantages of long time and poor accuracy. The purpose of this paper is to propose a new method for end-to-end network source identification of multi-feature fusion devices. Design/methodology/approach This paper proposes an efficient multi-feature fusion source recording device identification method based on end-to-end and attention mechanism, so as to achieve efficient and convenient identification of recording devices of Web media forensics. Findings The authors conducted sufficient experiments to prove the effectiveness of the models that they have proposed. The experiments show that the end-to-end system is improved by 7.1% compared to the baseline i-vector system, compared to the authors’ previous system, the accuracy is improved by 0.4%, and the training time is reduced by 50%. Research limitations/implications With the development of Web media forensics and internet technology, the use of Web media as evidence is increasing. Among them, it is particularly important to study the authenticity and accuracy of Web media audio. Originality/value This paper aims to promote the development of source recording device identification and provide effective technology for Web media forensics and judicial record evidence that need to apply device source identification technology.


2020 ◽  
Vol 12 (22) ◽  
pp. 9490
Author(s):  
Hao Zhen ◽  
Dongxiao Niu ◽  
Min Yu ◽  
Keke Wang ◽  
Yi Liang ◽  
...  

The inherent intermittency and uncertainty of wind power have brought challenges in accurate wind power output forecasting, which also cause tricky problems in the integration of wind power to the grid. In this paper, a hybrid deep learning model bidirectional long short term memory-convolutional neural network (BiLSTM-CNN) is proposed for short-term wind power forecasting. First, the grey correlation analysis is utilized to select the inputs for forecasting model; Then, the proposed hybrid model extracts multi-dimension features of inputs to predict the wind power from the temporal-spatial perspective, where the Bi-LSTM model is utilized to mine the bidirectional temporal characteristics while the convolution and pooling operations of CNN are utilized to extract the spatial characteristics from multiple input time series. Lastly, a case study is conducted to verify the superiority of the proposed model. Other deep learning models (Bi-LSTM, LSTM, CNN, LSTM-CNN, CNN-BiLSTM, CNN-LSTM) are also simulated to conduct comparison from three aspects. The results show that the BiLSTM-CNN model has the best accuracy with the lowest RMSE of 2.5492, MSE of 6.4984, MAE of 1.7344 and highest R2 of 0.9929. CNN has the fastest speed with an average computational time of 0.0741s. The hybrid model that mines the spatial feature based on the extracted temporal feature has a better performance than the model mines the temporal feature based on the extracted spatial feature.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Chao Che ◽  
Peiliang Zhang ◽  
Min Zhu ◽  
Yue Qu ◽  
Bo Jin

Abstract Background Heart disease diagnosis is a challenging task and it is important to explore useful information from the massive amount of electrocardiogram (ECG) records of patients. The high-precision diagnostic identification of ECG can save clinicians and cardiologists considerable time while helping reduce the possibility of misdiagnosis at the same time.Currently, some deep learning-based methods can effectively perform feature selection and classification prediction, reducing the consumption of manpower. Methods In this work, an end-to-end deep learning framework based on convolutional neural network (CNN) is proposed for ECG signal processing and arrhythmia classification. In the framework, a transformer network is embedded in CNN to capture the temporal information of ECG signals and a new link constraint is introduced to the loss function to enhance the classification ability of the embedding vector. Results To evaluate the proposed method, extensive experiments based on real-world data were conducted. Experimental results show that the proposed model achieve better performance than most baselines. The experiment results also proved that the transformer network pays more attention to the temporal continuity of the data and captures the hidden deep features of the data well. The link constraint strengthens the constraint on the embedded features and effectively suppresses the effect of data imbalance on the results. Conclusions In this paper, an end-to-end model is used to process ECG signal and classify arrhythmia. The model combine CNN and Transformer network to extract temporal information in ECG signal and is capable of performing arrhythmia classification with acceptable accuracy. The model can help cardiologists perform assisted diagnosis of heart disease and improve the efficiency of healthcare delivery.


2021 ◽  
Vol 13 (2) ◽  
pp. 274
Author(s):  
Guobiao Yao ◽  
Alper Yilmaz ◽  
Li Zhang ◽  
Fei Meng ◽  
Haibin Ai ◽  
...  

The available stereo matching algorithms produce large number of false positive matches or only produce a few true-positives across oblique stereo images with large baseline. This undesired result happens due to the complex perspective deformation and radiometric distortion across the images. To address this problem, we propose a novel affine invariant feature matching algorithm with subpixel accuracy based on an end-to-end convolutional neural network (CNN). In our method, we adopt and modify a Hessian affine network, which we refer to as IHesAffNet, to obtain affine invariant Hessian regions using deep learning framework. To improve the correlation between corresponding features, we introduce an empirical weighted loss function (EWLF) based on the negative samples using K nearest neighbors, and then generate deep learning-based descriptors with high discrimination that is realized with our multiple hard network structure (MTHardNets). Following this step, the conjugate features are produced by using the Euclidean distance ratio as the matching metric, and the accuracy of matches are optimized through the deep learning transform based least square matching (DLT-LSM). Finally, experiments on Large baseline oblique stereo images acquired by ground close-range and unmanned aerial vehicle (UAV) verify the effectiveness of the proposed approach, and comprehensive comparisons demonstrate that our matching algorithm outperforms the state-of-art methods in terms of accuracy, distribution and correct ratio. The main contributions of this article are: (i) our proposed MTHardNets can generate high quality descriptors; and (ii) the IHesAffNet can produce substantial affine invariant corresponding features with reliable transform parameters.


2021 ◽  
Vol 13 (8) ◽  
pp. 1602
Author(s):  
Qiaoqiao Sun ◽  
Xuefeng Liu ◽  
Salah Bourennane

Deep learning models have strong abilities in learning features and they have been successfully applied in hyperspectral images (HSIs). However, the training of most deep learning models requires labeled samples and the collection of labeled samples are labor-consuming in HSI. In addition, single-level features from a single layer are usually considered, which may result in the loss of some important information. Using multiple networks to obtain multi-level features is a solution, but at the cost of longer training time and computational complexity. To solve these problems, a novel unsupervised multi-level feature extraction framework that is based on a three dimensional convolutional autoencoder (3D-CAE) is proposed in this paper. The designed 3D-CAE is stacked by fully 3D convolutional layers and 3D deconvolutional layers, which allows for the spectral-spatial information of targets to be mined simultaneously. Besides, the 3D-CAE can be trained in an unsupervised way without involving labeled samples. Moreover, the multi-level features are directly obtained from the encoded layers with different scales and resolutions, which is more efficient than using multiple networks to get them. The effectiveness of the proposed multi-level features is verified on two hyperspectral data sets. The results demonstrate that the proposed method has great promise in unsupervised feature learning and can help us to further improve the hyperspectral classification when compared with single-level features.


2021 ◽  
pp. 0309524X2199826
Author(s):  
Guowei Cai ◽  
Yuqing Yang ◽  
Chao Pan ◽  
Dian Wang ◽  
Fengjiao Yu ◽  
...  

Multi-step real-time prediction based on the spatial correlation of wind speed is a research hotspot for large-scale wind power grid integration, and this paper proposes a multi-location multi-step wind speed combination prediction method based on the spatial correlation of wind speed. The correlation coefficients were determined by gray relational analysis for each turbine in the wind farm. Based on this, timing-control spatial association optimization is used for optimization and scheduling, obtaining spatial information on the typical turbine and its neighborhood information. This spatial information is reconstructed to improve the efficiency of spatial feature extraction. The reconstructed spatio-temporal information is input into a convolutional neural network with memory cells. Spatial feature extraction and multi-step real-time prediction are carried out, avoiding the problem of missing information affecting prediction accuracy. The method is innovative in terms of both efficiency and accuracy, and the prediction accuracy and generalization ability of the proposed method is verified by predicting wind speed and wind power for different wind farms.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Kwang-Hyun Uhm ◽  
Seung-Won Jung ◽  
Moon Hyung Choi ◽  
Hong-Kyu Shin ◽  
Jae-Ik Yoo ◽  
...  

AbstractIn 2020, it is estimated that 73,750 kidney cancer cases were diagnosed, and 14,830 people died from cancer in the United States. Preoperative multi-phase abdominal computed tomography (CT) is often used for detecting lesions and classifying histologic subtypes of renal tumor to avoid unnecessary biopsy or surgery. However, there exists inter-observer variability due to subtle differences in the imaging features of tumor subtypes, which makes decisions on treatment challenging. While deep learning has been recently applied to the automated diagnosis of renal tumor, classification of a wide range of subtype classes has not been sufficiently studied yet. In this paper, we propose an end-to-end deep learning model for the differential diagnosis of five major histologic subtypes of renal tumors including both benign and malignant tumors on multi-phase CT. Our model is a unified framework to simultaneously identify lesions and classify subtypes for the diagnosis without manual intervention. We trained and tested the model using CT data from 308 patients who underwent nephrectomy for renal tumors. The model achieved an area under the curve (AUC) of 0.889, and outperformed radiologists for most subtypes. We further validated the model on an independent dataset of 184 patients from The Cancer Imaging Archive (TCIA). The AUC for this dataset was 0.855, and the model performed comparably to the radiologists. These results indicate that our model can achieve similar or better diagnostic performance than radiologists in differentiating a wide range of renal tumors on multi-phase CT.


Sign in / Sign up

Export Citation Format

Share Document