Learning network embeddings using small graphlets

Luce le Gorrec; Philip A. Knight; Auguste Caen

doi:10.1007/s13278-021-00846-9

Learning network embeddings using small graphlets

Social Network Analysis and Mining ◽

10.1007/s13278-021-00846-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Luce le Gorrec ◽

Philip A. Knight ◽

Auguste Caen

Keyword(s):

State Of The Art ◽

Learning Algorithms ◽

Selection Procedure ◽

Principal Component ◽

Feature Extraction Method ◽

Learning Network ◽

Convolutional Networks ◽

Original Dataset ◽

Supervised Learning Algorithms ◽

As Graph

AbstractTechniques for learning vectorial representations of graphs (graph embeddings) have recently emerged as an effective approach to facilitate machine learning on graphs. Some of the most popular methods involve sophisticated features such as graph kernels or convolutional networks. In this work, we introduce two straightforward supervised learning algorithms based on small-size graphlet counts, combined with a dimension reduction step. The first relies on a classic feature extraction method powered by principal component analysis (PCA). The second is a feature selection procedure also based on PCA. Despite their conceptual simplicity, these embeddings are arguably more meaningful than some popular alternatives and at the same time are competitive with state-of-the-art methods. We illustrate this second point on a downstream classification task. We then use our algorithms in a novel setting, namely to conduct an analysis of author relationships in Wikipedia articles, for which we present an original dataset. Finally, we provide empirical evidence suggesting that our methods could also be adapted to unsupervised learning algorithms.

Download Full-text

Analyzing Depthwise Convolution Based Neural Network: Study Case in Ship Detection and Land Cover Classification

Jurnal Ilmu Komputer dan Informasi ◽

10.21609/jiki.v12i2.752 ◽

2019 ◽

Vol 12 (2) ◽

pp. 103

Author(s):

Kuntoro Adi Nugroho ◽

Yudi Eko Windarto

Keyword(s):

Neural Network ◽

Feature Extraction ◽

State Of The Art ◽

Learning Algorithms ◽

Satellite Image ◽

Feature Learning ◽

Feature Extraction Method ◽

Current State ◽

Study Case ◽

Unsupervised Feature Extraction

Various methods are available to perform feature extraction on satellite images. Among the available alternatives, deep convolutional neural network (ConvNet) is the state of the art method. Although previous studies have reported successful attempts on developing and implementing ConvNet on remote sensing application, several issues are not well explored, such as the use of depthwise convolution, final pooling layer size, and comparison between grayscale and RGB settings. The objective of this study is to perform analysis to address these issues. Two feature learning algorithms were proposed, namely ConvNet as the current state of the art for satellite image classification and Gray Level Co-occurence Matrix (GLCM) which represents a classic unsupervised feature extraction method. The experiment demonstrated consistent result with previous studies that ConvNet is superior in most cases compared to GLCM, especially with 3x3xn final pooling. The performance of the learning algorithms are much higher on features from RGB channels, except for ConvNet with relatively small number of features.

Download Full-text

Analysis of Tree Based Supervised Learning Algorithms on Medical Data

International Journal of Scientific and Research Publications (IJSRP) ◽

10.29322/ijsrp.9.04.2019.p8817 ◽

2019 ◽

Vol 9 (4) ◽

pp. p8817

Author(s):

Thin Thin Swe

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

Medical Data ◽

Supervised Learning Algorithms

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

Full spectrum and genetic algorithm-selected spectrum-based chemometric methods for simultaneous determination of azilsartan medoxomil, chlorthalidone, and azilsartan: Development, validation, and application on commercial dosage form

Open Chemistry ◽

10.1515/chem-2021-0022 ◽

2021 ◽

Vol 19 (1) ◽

pp. 205-213

Author(s):

Hany W. Darwish ◽

Abdulrahman A. Al Majed ◽

Ibrahim A. Al-Suwaidan ◽

Ibrahim A. Darwish ◽

Ahmed H. Bakheit ◽

...

Keyword(s):

Genetic Algorithm ◽

Simultaneous Determination ◽

Predictive Power ◽

Principal Component Regression ◽

Selection Procedure ◽

Principal Component ◽

Chemometric Methods ◽

Full Spectrum ◽

Azilsartan Medoxomil

Abstract Five various chemometric methods were established for the simultaneous determination of azilsartan medoxomil (AZM) and chlorthalidone in the presence of azilsartan which is the core impurity of AZM. The full spectrum-based chemometric techniques, namely partial least squares (PLS), principal component regression, and artificial neural networks (ANN), were among the applied methods. Besides, the ANN and PLS were the other two methods that were extended by genetic algorithm procedure (GA-PLS and GA-ANN) as a wavelength selection procedure. The models were developed by applying a multilevel multifactor experimental design. The predictive power of the suggested models was evaluated through a validation set containing nine mixtures with different ratios of the three analytes. For the analysis of Edarbyclor® tablets, all the proposed procedures were applied and the best results were achieved in the case of ANN, GA-ANN, and GA-PLS methods. The findings of the three methods were revealed as the quantitative tool for the analysis of the three components without any intrusion from the co-formulated excipient and without prior separation procedures. Moreover, the GA impact on strengthening the predictive power of ANN- and PLS-based models was also highlighted.

Download Full-text

Investigating the performance of the supervised learning algorithms for estimating NPPs parameters in combination with the different feature selection techniques

Annals of Nuclear Energy ◽

10.1016/j.anucene.2021.108299 ◽

2021 ◽

Vol 158 ◽

pp. 108299

Author(s):

Khalil Moshkbar-Bakhshayesh

Keyword(s):

Feature Selection ◽

Supervised Learning ◽

Learning Algorithms ◽

Supervised Learning Algorithms ◽

Feature Selection Techniques

Download Full-text

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text

Monitoring a Reverse Osmosis Process with Kernel Principal Component Analysis: A Preliminary Approach

Applied Sciences ◽

10.3390/app11146370 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6370

Author(s):

Elena Quatrini ◽

Francesco Costantino ◽

David Mba ◽

Xiaochuan Li ◽

Tat-Hean Gan

Keyword(s):

Principal Component Analysis ◽

Fault Detection ◽

Water Purification ◽

Principal Component ◽

Temporal Correlation ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Original Dataset ◽

Monitoring Process ◽

Machine Health

The water purification process is becoming increasingly important to ensure the continuity and quality of subsequent production processes, and it is particularly relevant in pharmaceutical contexts. However, in this context, the difficulties arising during the monitoring process are manifold. On the one hand, the monitoring process reveals various discontinuities due to different characteristics of the input water. On the other hand, the monitoring process is discontinuous and random itself, thus not guaranteeing continuity of the parameters and hindering a straightforward analysis. Consequently, further research on water purification processes is paramount to identify the most suitable techniques able to guarantee good performance. Against this background, this paper proposes an application of kernel principal component analysis for fault detection in a process with the above-mentioned characteristics. Based on the temporal variability of the process, the paper suggests the use of past and future matrices as input for fault detection as an alternative to the original dataset. In this manner, the temporal correlation between process parameters and machine health is accounted for. The proposed approach confirms the possibility of obtaining very good monitoring results in the analyzed context.

Download Full-text

Efficiency of Machine Learning Algorithms for the Determination of Macrovesicular Steatosis in Frozen Sections Stained with Sudan to Evaluate the Quality of the Graft in Liver Transplantation

Sensors ◽

10.3390/s21061993 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1993

Author(s):

Fernando Pérez-Sanz ◽

Miriam Riquelme-Pérez ◽

Enrique Martínez-Barba ◽

Jesús de la Peña-Moral ◽

Alejandro Salazar Nicolás ◽

...

Keyword(s):

Machine Learning ◽

Liver Transplantation ◽

Learning Algorithms ◽

Selection Procedure ◽

Machine Learning Algorithms ◽

Liver Biopsies ◽

Convenient Technique ◽

End Stage ◽

Macrovesicular Steatosis

Liver transplantation is the only curative treatment option in patients diagnosed with end-stage liver disease. The low availability of organs demands an accurate selection procedure based on histological analysis, in order to evaluate the allograft. This assessment, traditionally carried out by a pathologist, is not exempt from subjectivity. In this sense, new tools based on machine learning and artificial vision are continuously being developed for the analysis of medical images of different typologies. Accordingly, in this work, we develop a computer vision-based application for the fast and automatic objective quantification of macrovesicular steatosis in histopathological liver section slides stained with Sudan stain. For this purpose, digital microscopy images were used to obtain thousands of feature vectors based on the RGB and CIE L*a*b* pixel values. These vectors, under a supervised process, were labelled as fat vacuole or non-fat vacuole, and a set of classifiers based on different algorithms were trained, accordingly. The results obtained showed an overall high accuracy for all classifiers (>0.99) with a sensitivity between 0.844 and 1, together with a specificity >0.99. In relation to their speed when classifying images, KNN and Naïve Bayes were substantially faster than other classification algorithms. Sudan stain is a convenient technique for evaluating ME in pre-transplant liver biopsies, providing reliable contrast and facilitating fast and accurate quantification through the machine learning algorithms tested.

Download Full-text

Bin2vec: learning representations of binary executable programs for security tasks

Cybersecurity ◽

10.1186/s42400-021-00088-4 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Shushan Arakelyan ◽

Sima Arasteh ◽

Christophe Hauser ◽

Erik Kline ◽

Aram Galstyan

Keyword(s):

Program Analysis ◽

State Of The Art ◽

Classification Error ◽

New Approach ◽

Convolutional Networks ◽

Computational Program ◽

Functional Algorithm ◽

Binary Program ◽

Vulnerability Discovery ◽

Executable Programs

AbstractTackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).

Download Full-text

Supervised learning algorithms in the classification of plant populations with different degrees of kinship

Revista Brasileira de Botânica ◽

10.1007/s40415-021-00703-1 ◽

2021 ◽

Author(s):

Leandro Skowronski ◽

Paula Martin de Moraes ◽

Mario Luiz Teixeira de Moraes ◽

Wesley Nunes Gonçalves ◽

Michel Constantino ◽

...

Keyword(s):

Supervised Learning ◽

Learning Algorithms ◽

Plant Populations ◽

Supervised Learning Algorithms

Download Full-text