scholarly journals An empirical comparison of recurrent neural network models on authorship analysis tasks

Author(s):  
◽  
Nils Schaetti

In the last few years, a machine learning field named Deep-Learning (DL) has improved the results of several challenging tasks mainly in the field of computer vision. Deep architectures such as Convolutional Neural Networks (CNN) have been shown as very powerful for computer vision tasks. For those related to language and timeseries the state of the art models such as Long Short-Term Memory (LSTM) have a recurrent component that take into account the order of inputs and are able to memorise them. Among these tasks related to Natural Language Processing (NLP), an important problem in computational linguistics is authorship attribution where the goal is to find the true author of a text or, in an author profiling perspective, to extract information such as gender, origin and socio-economic background. However, few work have tackle the issue of authorship analysis with recurrent neural networks (RNNs). Consequently, we have decided to explore in this study the performances of several recurrent neural models, such as Echo State Networks (ESN), LSTM and Gated Recurrent Units (GRU) on three authorship analysis tasks. The first one on the classical authorship attribution task using the Reuters C50 dataset where models have to predict the true author of a document in a set of candidate authors. The second task is referred as author profiling as the model must determine the gender (male/female) of the author of a set of tweets using the PAN 2017 dataset from the CLEF conference. The third task is referred as author verification using an in-house dataset named SFGram and composed of dozens of science-fiction magazines from the 50s to the 70s. This task is separated into two problems. In the first, the goal is to extract passages written by a particular author inside a magazine co-written by several dozen authors. The second is to find out if a magazine contains passages written by a particular author. In order for our research to be applicable in authorship studies, we limited evaluated models to those with a so-called many-to-many architecture. This fulfills a fundamental constraint of the field of stylometry which is the ability to provide evidences for each prediction made. To evaluate these three models, we defined a set of experiments, performance measures and hyperparame-ters that could impact the output. We carried out these experiments with each model and their corresponding hyperparameters. Then we used statistical tests to detect significant di˙erences between these models, and with state-of-the-art baseline methods in authorship analysis. Our results shows that shallow and simple RNNs such as ESNs can be competitive with traditional meth-ods in authorship studies while keeping a learning time that can be used in practice and a reasonable number of parameters. These properties allow them to outperform much more complex neural models such as LSTMs and GRUs considered as state of the art in NLP. We also show that pretraining word and character features can be useful on stylometry problems if these are trained on a similar dataset. Consequently, interesting results are achievable on such tasks where the quantity of data is limited and therefore diÿcult to solve for deep learning methods. We also show that representations based on words and combinations of three characters (trigrams) are the most e˙ective for this kind of methods. Finally, we draw a landscape of possi-ble research paths for the future of neural networks and deep learning methods in the field authorship analysis.

2016 ◽  
Vol 21 (9) ◽  
pp. 998-1003 ◽  
Author(s):  
Oliver Dürr ◽  
Beate Sick

Deep learning methods are currently outperforming traditional state-of-the-art computer vision algorithms in diverse applications and recently even surpassed human performance in object recognition. Here we demonstrate the potential of deep learning methods to high-content screening–based phenotype classification. We trained a deep learning classifier in the form of convolutional neural networks with approximately 40,000 publicly available single-cell images from samples treated with compounds from four classes known to lead to different phenotypes. The input data consisted of multichannel images. The construction of appropriate feature definitions was part of the training and carried out by the convolutional network, without the need for expert knowledge or handcrafted features. We compare our results against the recent state-of-the-art pipeline in which predefined features are extracted from each cell using specialized software and then fed into various machine learning algorithms (support vector machine, Fisher linear discriminant, random forest) for classification. The performance of all classification approaches is evaluated on an untouched test image set with known phenotype classes. Compared to the best reference machine learning algorithm, the misclassification rate is reduced from 8.9% to 6.6%.


Author(s):  
Dong-Dong Chen ◽  
Wei Wang ◽  
Wei Gao ◽  
Zhi-Hua Zhou

Deep neural networks have witnessed great successes in various real applications, but it requires a large number of labeled data for training. In this paper, we propose tri-net, a deep neural network which is able to use massive unlabeled data to help learning with limited labeled data. We consider model initialization, diversity augmentation and pseudo-label editing simultaneously. In our work, we utilize output smearing to initialize modules, use fine-tuning on labeled data to augment diversity and eliminate unstable pseudo-labels to alleviate the influence of suspicious pseudo-labeled data. Experiments show that our method achieves the best performance in comparison with state-of-the-art semi-supervised deep learning methods. In particular, it achieves 8.30% error rate on CIFAR-10 by using only 4000 labeled examples.


Author(s):  
Marco Star ◽  
Kristoffer McKee

Data-driven machinery prognostics has seen increasing popularity recently, especially with the effectiveness of deep learning methods growing. However, deep learning methods lack useful properties such as the lack of uncertainty quantification of their outputs and have a black-box nature. Neural ordinary differential equations (NODEs) use neural networks to define differential equations that propagate data from the inputs to the outputs. They can be seen as a continuous generalization of a popular network architecture used for image recognition known as the Residual Network (ResNet). This paper compares the performance of each network for machinery prognostics tasks to show the validity of Neural ODEs in machinery prognostics. The comparison is done using NASA’s Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset, which simulates the sensor information of degrading turbofan engines. To compare both architectures, they are set up as convolutional neural networks and the sensors are transformed to the time-frequency domain through the short-time Fourier transform (STFT). The spectrograms from the STFT are the input images to the networks and the output is the estimated RUL; hence, the task is turned into an image recognition task. The results found NODEs can compete with state-of-the-art machinery prognostics methods. While it does not beat the state-of-the-art method, it is close enough that it could warrant further research into using NODEs. The potential benefits of using NODEs instead of other network architectures are also discussed in this work.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1078
Author(s):  
Ibon Merino ◽  
Jon Azpiazu ◽  
Anthony Remazeilles ◽  
Basilio Sierra

Deep learning methods have been successfully applied to image processing, mainly using 2D vision sensors. Recently, the rise of depth cameras and other similar 3D sensors has opened the field for new perception techniques. Nevertheless, 3D convolutional neural networks perform slightly worse than other 3D deep learning methods, and even worse than their 2D version. In this paper, we propose to improve 3D deep learning results by transferring the pretrained weights learned in 2D networks to their corresponding 3D version. Using an industrial object recognition context, we have analyzed different combinations of 3D convolutional networks (VGG16, ResNet, Inception ResNet, and EfficientNet), comparing the recognition accuracy. The highest accuracy is obtained with EfficientNetB0 using extrusion with an accuracy of 0.9217, which gives comparable results to state-of-the art methods. We also observed that the transfer approach enabled to improve the accuracy of the Inception ResNet 3D version up to 18% with respect to the score of the 3D approach alone.


Author(s):  
Harold Erbin ◽  
Riccardo Finotello ◽  
Robin Schneider ◽  
Mohamed Tamaazousti

Abstract We continue earlier efforts in computing the dimensions of tangent space cohomologies of Calabi-Yau manifolds using deep learning. In this paper, we consider the dataset of all Calabi-Yau four-folds constructed as complete intersections in products of projective spaces. Employing neural networks inspired by state-of-the-art computer vision architectures, we improve earlier benchmarks and demonstrate that all four non-trivial Hodge numbers can be learned at the same time using a multi-task architecture. With 30 % (80 %) training ratio, we reach an accuracy of 100 % for h(1,1) and 97 % for h(2,1) (100 % for both), 81 % (96 %) for h(3,1), and 49 % (83 %) for h(2,2). Assuming that the Euler number is known, as it is easy to compute, and taking into account the linear constraint arising from index computations, we get 100 % total accuracy.


Now-a-days diabetics are affecting many people and it causes an eye disease called “diabetics retinopathy” but many are not aware of that, so it causes blindness. Diabetes aimed at protracted time harms the blood vessels of retina in addition to thereby affecting seeing ability of an individual in addition to leading to diabetic retinopathy. Diabetic retinopathy is classified hooked on twofold classes, non-proliferative diabetic retinopathy (NPDR) and proliferative diabetic retinopathy (PDR). Finding of diabetic retinopathy in fundus imaginary is done by computer vision and deep learning methods using artificial neural networks. The images of the diabetic retinopathy datasets are trained in neural networks. And based on the training datasets we can detect whether the person has (i)no diabetic retinopathy, (ii) mild non-proliferative diabetic retinopathy, (iii) severe non-proliferative diabetic retinopathy and (iv) proliferative diabetic retinopathy.


2020 ◽  
Author(s):  
Aikaterini Symeonidi ◽  
Anguelos Nicolaou ◽  
Frank Johannes ◽  
Vincent Christlein

AbstractDeep learning methods have proved to be powerful classification tools in the fields of structural and functional genomics. In this paper, we introduce a Recursive Convolutional Neural Networks (RCNN) for the analysis of epigenomic data. We focus on the task of predicting gene expression from the intensity of histone modifications. The proposed RCNN architecture can be applied to data of an arbitrary size, and has a single meta-parameter that quantifies the models capacity, thus making it flexible for experimenting. The proposed architecture outperforms state-of-the-art systems, while having several orders of magnitude fewer parameters.


Author(s):  
Jianwen Jiang ◽  
Yuxuan Wei ◽  
Yifan Feng ◽  
Jingxuan Cao ◽  
Yue Gao

In recent years, graph/hypergraph-based deep learning methods have attracted much attention from researchers. These deep learning methods take graph/hypergraph structure as prior knowledge in the model. However, hidden and important relations are not directly represented in the inherent structure. To tackle this issue, we propose a dynamic hypergraph neural networks framework (DHGNN), which is composed of the stacked layers of two modules: dynamic hypergraph construction (DHG) and hypergrpah convolution (HGC). Considering initially constructed hypergraph is probably not a suitable representation for data, the DHG module dynamically updates hypergraph structure on each layer. Then hypergraph convolution is introduced to encode high-order data relations in a hypergraph structure. The HGC module includes two phases: vertex convolution and hyperedge convolution, which are designed to aggregate feature among vertices and hyperedges, respectively. We have evaluated our method on standard datasets, the Cora citation network and Microblog dataset. Our method outperforms state-of-the-art methods. More experiments are conducted to demonstrate the effectiveness and robustness of our method to diverse data distributions.


2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 223
Author(s):  
Yen-Ling Tai ◽  
Shin-Jhe Huang ◽  
Chien-Chang Chen ◽  
Henry Horng-Shing Lu

Nowadays, deep learning methods with high structural complexity and flexibility inevitably lean on the computational capability of the hardware. A platform with high-performance GPUs and large amounts of memory could support neural networks having large numbers of layers and kernels. However, naively pursuing high-cost hardware would probably drag the technical development of deep learning methods. In the article, we thus establish a new preprocessing method to reduce the computational complexity of the neural networks. Inspired by the band theory of solids in physics, we map the image space into a noninteraction physical system isomorphically and then treat image voxels as particle-like clusters. Then, we reconstruct the Fermi–Dirac distribution to be a correction function for the normalization of the voxel intensity and as a filter of insignificant cluster components. The filtered clusters at the circumstance can delineate the morphological heterogeneity of the image voxels. We used the BraTS 2019 datasets and the dimensional fusion U-net for the algorithmic validation, and the proposed Fermi–Dirac correction function exhibited comparable performance to other employed preprocessing methods. By comparing to the conventional z-score normalization function and the Gamma correction function, the proposed algorithm can save at least 38% of computational time cost under a low-cost hardware architecture. Even though the correction function of global histogram equalization has the lowest computational time among the employed correction functions, the proposed Fermi–Dirac correction function exhibits better capabilities of image augmentation and segmentation.


Sign in / Sign up

Export Citation Format

Share Document