scholarly journals Learning network embeddings using small graphlets

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Luce le Gorrec ◽  
Philip A. Knight ◽  
Auguste Caen

AbstractTechniques for learning vectorial representations of graphs (graph embeddings) have recently emerged as an effective approach to facilitate machine learning on graphs. Some of the most popular methods involve sophisticated features such as graph kernels or convolutional networks. In this work, we introduce two straightforward supervised learning algorithms based on small-size graphlet counts, combined with a dimension reduction step. The first relies on a classic feature extraction method powered by principal component analysis (PCA). The second is a feature selection procedure also based on PCA. Despite their conceptual simplicity, these embeddings are arguably more meaningful than some popular alternatives and at the same time are competitive with state-of-the-art methods. We illustrate this second point on a downstream classification task. We then use our algorithms in a novel setting, namely to conduct an analysis of author relationships in Wikipedia articles, for which we present an original dataset. Finally, we provide empirical evidence suggesting that our methods could also be adapted to unsupervised learning algorithms.

2019 ◽  
Vol 12 (2) ◽  
pp. 103
Author(s):  
Kuntoro Adi Nugroho ◽  
Yudi Eko Windarto

Various methods are available to perform feature extraction on satellite images. Among the available alternatives, deep convolutional neural network (ConvNet) is the state of the art method. Although previous studies have reported successful attempts on developing and implementing ConvNet on remote sensing application, several issues are not well explored, such as the use of depthwise convolution, final pooling layer size, and comparison between grayscale and RGB settings. The objective of this study is to perform analysis to address these issues. Two feature learning algorithms were proposed, namely ConvNet as the current state of the art for satellite image classification and Gray Level Co-occurence Matrix (GLCM) which represents a classic unsupervised feature extraction method. The experiment demonstrated consistent result with previous studies that ConvNet is superior in most cases compared to GLCM, especially with 3x3xn final pooling. The performance of the learning algorithms are much higher on features from RGB channels, except for ConvNet with relatively small number of features.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


2021 ◽  
Vol 19 (1) ◽  
pp. 205-213
Author(s):  
Hany W. Darwish ◽  
Abdulrahman A. Al Majed ◽  
Ibrahim A. Al-Suwaidan ◽  
Ibrahim A. Darwish ◽  
Ahmed H. Bakheit ◽  
...  

Abstract Five various chemometric methods were established for the simultaneous determination of azilsartan medoxomil (AZM) and chlorthalidone in the presence of azilsartan which is the core impurity of AZM. The full spectrum-based chemometric techniques, namely partial least squares (PLS), principal component regression, and artificial neural networks (ANN), were among the applied methods. Besides, the ANN and PLS were the other two methods that were extended by genetic algorithm procedure (GA-PLS and GA-ANN) as a wavelength selection procedure. The models were developed by applying a multilevel multifactor experimental design. The predictive power of the suggested models was evaluated through a validation set containing nine mixtures with different ratios of the three analytes. For the analysis of Edarbyclor® tablets, all the proposed procedures were applied and the best results were achieved in the case of ANN, GA-ANN, and GA-PLS methods. The findings of the three methods were revealed as the quantitative tool for the analysis of the three components without any intrusion from the co-formulated excipient and without prior separation procedures. Moreover, the GA impact on strengthening the predictive power of ANN- and PLS-based models was also highlighted.


Algorithms ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 226
Author(s):  
Wenzel Pilar von Pilchau ◽  
Anthony Stein ◽  
Jörg Hähner

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.


2021 ◽  
Vol 11 (14) ◽  
pp. 6370
Author(s):  
Elena Quatrini ◽  
Francesco Costantino ◽  
David Mba ◽  
Xiaochuan Li ◽  
Tat-Hean Gan

The water purification process is becoming increasingly important to ensure the continuity and quality of subsequent production processes, and it is particularly relevant in pharmaceutical contexts. However, in this context, the difficulties arising during the monitoring process are manifold. On the one hand, the monitoring process reveals various discontinuities due to different characteristics of the input water. On the other hand, the monitoring process is discontinuous and random itself, thus not guaranteeing continuity of the parameters and hindering a straightforward analysis. Consequently, further research on water purification processes is paramount to identify the most suitable techniques able to guarantee good performance. Against this background, this paper proposes an application of kernel principal component analysis for fault detection in a process with the above-mentioned characteristics. Based on the temporal variability of the process, the paper suggests the use of past and future matrices as input for fault detection as an alternative to the original dataset. In this manner, the temporal correlation between process parameters and machine health is accounted for. The proposed approach confirms the possibility of obtaining very good monitoring results in the analyzed context.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 1993
Author(s):  
Fernando Pérez-Sanz ◽  
Miriam Riquelme-Pérez ◽  
Enrique Martínez-Barba ◽  
Jesús de la Peña-Moral ◽  
Alejandro Salazar Nicolás ◽  
...  

Liver transplantation is the only curative treatment option in patients diagnosed with end-stage liver disease. The low availability of organs demands an accurate selection procedure based on histological analysis, in order to evaluate the allograft. This assessment, traditionally carried out by a pathologist, is not exempt from subjectivity. In this sense, new tools based on machine learning and artificial vision are continuously being developed for the analysis of medical images of different typologies. Accordingly, in this work, we develop a computer vision-based application for the fast and automatic objective quantification of macrovesicular steatosis in histopathological liver section slides stained with Sudan stain. For this purpose, digital microscopy images were used to obtain thousands of feature vectors based on the RGB and CIE L*a*b* pixel values. These vectors, under a supervised process, were labelled as fat vacuole or non-fat vacuole, and a set of classifiers based on different algorithms were trained, accordingly. The results obtained showed an overall high accuracy for all classifiers (>0.99) with a sensitivity between 0.844 and 1, together with a specificity >0.99. In relation to their speed when classifying images, KNN and Naïve Bayes were substantially faster than other classification algorithms. Sudan stain is a convenient technique for evaluating ME in pre-transplant liver biopsies, providing reliable contrast and facilitating fast and accurate quantification through the machine learning algorithms tested.


Cybersecurity ◽  
2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Shushan Arakelyan ◽  
Sima Arasteh ◽  
Christophe Hauser ◽  
Erik Kline ◽  
Aram Galstyan

AbstractTackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).


Author(s):  
Leandro Skowronski ◽  
Paula Martin de Moraes ◽  
Mario Luiz Teixeira de Moraes ◽  
Wesley Nunes Gonçalves ◽  
Michel Constantino ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document