Bin2vec: learning representations of binary executable programs for security tasks

AbstractTackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).

Download Full-text

Pruning High-Similarity Clusters to Optimize Data Diversity when Building Ensemble Classifiers

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500275 ◽

2019 ◽

Vol 18 (04) ◽

pp. 1950027

Author(s):

Sam Fletcher ◽

Brijesh Verma

Keyword(s):

State Of The Art ◽

Computation Time ◽

Ensemble Classifier ◽

Classification Error ◽

Ensemble Classifiers ◽

High Similarity ◽

New Approach ◽

The Past ◽

Multiple Clusterings ◽

Benchmark Datasets

Diversity is a key component for building a successful ensemble classifier. One approach to diversifying the base classifiers in an ensemble classifier is to diversify the data they are trained on. While sampling approaches such as bagging have been used for this task in the past, we argue that since they maintain the global distribution, they do not create diversity. Instead, we make a principled argument for the use of [Formula: see text]-means clustering to create diversity. Expanding on previous work, we observe that when creating multiple clusterings with multiple [Formula: see text] values, there is a risk of different clusterings discovering the same clusters, which would in turn train the same base classifiers. This would bias the ensemble voting process. We propose a new approach that uses the Jaccard Index to detect and remove similar clusters before training the base classifiers, not only saving computation time, but also reducing classification error by removing repeated votes. We empirically demonstrate the effectiveness of the proposed approach compared to the state of the art on 19 UCI benchmark datasets.

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

A new approach based on graph matching and evolutionary approach for sport scheduling problem

Intelligent Decision Technologies ◽

10.3233/idt-190114 ◽

2020 ◽

pp. 1-16

Author(s):

Meriem Khelifa ◽

Dalila Boughaci ◽

Esma Aïmeur

Keyword(s):

Graph Matching ◽

State Of The Art ◽

Travel Cost ◽

Round Robin ◽

New Approach ◽

Traveling Tournament Problem ◽

Significant Interest ◽

National League ◽

Better Than

The Traveling Tournament Problem (TTP) is concerned with finding a double round-robin tournament schedule that minimizes the total distances traveled by the teams. It has attracted significant interest recently since a favorable TTP schedule can result in significant savings for the league. This paper proposes an original evolutionary algorithm for TTP. We first propose a quick and effective constructive algorithm to construct a Double Round Robin Tournament (DRRT) schedule with low travel cost. We then describe an enhanced genetic algorithm with a new crossover operator to improve the travel cost of the generated schedules. A new heuristic for ordering efficiently the scheduled rounds is also proposed. The latter leads to significant enhancement in the quality of the schedules. The overall method is evaluated on publicly available standard benchmarks and compared with other techniques for TTP and UTTP (Unconstrained Traveling Tournament Problem). The computational experiment shows that the proposed approach could build very good solutions comparable to other state-of-the-art approaches or better than the current best solutions on UTTP. Further, our method provides new valuable solutions to some unsolved UTTP instances and outperforms prior methods for all US National League (NL) instances.

Download Full-text

Target Localization via Integrated and Segregated Ranging Based on RSS and TOA Measurements

Sensors ◽

10.3390/s19020230 ◽

2019 ◽

Vol 19 (2) ◽

pp. 230 ◽

Cited By ~ 5

Author(s):

Slavisa Tomic ◽

Marko Beko

Keyword(s):

State Of The Art ◽

Hybrid Approach ◽

Critical Distance ◽

Heuristic Approach ◽

Target Localization ◽

Line Of Sight ◽

Time Of Arrival ◽

Individual Measurement ◽

New Approach ◽

Non Line Of Sight

This work addresses the problem of target localization in adverse non-line-of-sight (NLOS) environments by using received signal strength (RSS) and time of arrival (TOA) measurements. It is inspired by a recently published work in which authors discuss about a critical distance below and above which employing combined RSS-TOA measurements is inferior to employing RSS-only and TOA-only measurements, respectively. Here, we revise state-of-the-art estimators for the considered target localization problem and study their performance against their counterparts that employ each individual measurement exclusively. It is shown that the hybrid approach is not the best one by default. Thus, we propose a simple heuristic approach to choose the best measurement for each link, and we show that it can enhance the performance of an estimator. The new approach implicitly relies on the concept of the critical distance, but does not assume certain link parameters as given. Our simulations corroborate with findings available in the literature for line-of-sight (LOS) to a certain extent, but they indicate that more work is required for NLOS environments. Moreover, they show that the heuristic approach works well, matching or even improving the performance of the best fixed choice in all considered scenarios.

Download Full-text

New Approach in Transform-Based Speaker Adaptation Using Minimum Classification Error

2010 12th International Conference on Computer Modelling and Simulation ◽

10.1109/uksim.2010.62 ◽

2010 ◽

Author(s):

Reza Sahraian ◽

Behzad Zamani ◽

Ahmad Akbari ◽

Ahmad Ayatollahi ◽

Babak Nasersharif

Keyword(s):

Classification Error ◽

New Approach ◽

Minimum Classification Error

Download Full-text

High-Fidelity Simulated Players for Interactive Narrative Planning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/540 ◽

2018 ◽

Author(s):

Pengcheng Wang ◽

Jonathan Rowe ◽

Wookhee Min ◽

Bradford Mott ◽

James Lester

Keyword(s):

State Of The Art ◽

Data Driven ◽

High Fidelity ◽

Interactive Narrative ◽

Interaction Data ◽

Convolutional Networks ◽

Novel Approach ◽

Adaptation Policies ◽

Narrative Planning ◽

Prior State

Interactive narrative planning offers significant potential for creating adaptive gameplay experiences. While data-driven techniques have been devised that utilize player interaction data to induce policies for interactive narrative planners, they require enormously large gameplay datasets. A promising approach to addressing this challenge is creating simulated players whose behaviors closely approximate those of human players. In this paper, we propose a novel approach to generating high-fidelity simulated players based on deep recurrent highway networks and deep convolutional networks. Empirical results demonstrate that the proposed models significantly outperform the prior state-of-the-art in generating high-fidelity simulated player models that accurately imitate human players’ narrative interactions. Using the high-fidelity simulated player models, we show the advantage of more exploratory reinforcement learning methods for deriving generalizable narrative adaptation policies.

Download Full-text

TripletProt: Deep Representation Learning of Proteins based on Siamese Networks

10.1101/2020.05.11.088237 ◽

2020 ◽

Author(s):

Esmaeil Nourani ◽

Ehsaneddin Asgari ◽

Alice C. McHardy ◽

Mohammad R.K. Mofrad

Keyword(s):

Functional Annotation ◽

Cellular Localization ◽

State Of The Art ◽

Language Model ◽

Representation Learning ◽

Learning Problems ◽

Ppi Network ◽

New Approach ◽

Protein Protein Interaction ◽

Siamese Networks

AbstractWe introduce TripletProt, a new approach for protein representation learning based on the Siamese neural networks. We evaluate TripletProt comprehensively in protein functional annotation tasks including sub-cellular localization (14 categories) and gene ontology prediction (more than 2000 classes), which are both challenging multi-class multi-label classification machine learning problems. We compare the performance of TripletProt with the state-of-the-art approaches including recurrent language model-based approach (i.e., UniRep), as well as protein-protein interaction (PPI) network and sequence-based method (i.e., DeepGO). Our TripletProt showed an overall improvement of F1 score in the above mentioned comprehensive functional annotation tasks, solely relying on the PPI network. TripletProt and in general Siamese Network offer great potentials for the protein informatics tasks and can be widely applied to similar tasks.

Download Full-text

Graph Algorithms for Word Sense Disambiguation in Biomedicine

10.5753/sbcas.2015.10365 ◽

2015 ◽

Author(s):

Rodrigo Goulart ◽

Juliano De Carvalho ◽

Vera De Lima

Keyword(s):

Text Mining ◽

Graph Algorithms ◽

State Of The Art ◽

Word Sense Disambiguation ◽

The State ◽

Word Sense ◽

New Approach ◽

Similar Performance ◽

Sense Disambiguation ◽

Different Levels

Word Sense Disambiguation (WSD) is an important task for Biomedicine text-mining. Supervised WSD methods have the best results but they are complex and their cost for testing is too high. This work presents an experiment on WSD using graph-based approaches (unsupervised methods). Three algorithms were tested and compared to the state of the art. Results indicate that similar performance could be reached with different levels of complexity, what may point to a new approach to this problem.

Download Full-text

MR-GCN: Multi-Relational Graph Convolutional Networks based on Generalized Tensor Product

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/175 ◽

2020 ◽

Author(s):

Zhichao Huang ◽

Xutao Li ◽

Yunming Ye ◽

Michael K. Ng

Keyword(s):

Tensor Product ◽

Convolution Operator ◽

State Of The Art ◽

Single Type ◽

Convolutional Network ◽

Convolutional Networks ◽

Node Classification ◽

Relational Graphs ◽

Eigen Decomposition ◽

Single Relation

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.

Download Full-text

Multipath Lightweight Deep Network Using Randomly Selected Dilated Convolution

Sensors ◽

10.3390/s21237862 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7862

Author(s):

Sangun Park ◽

Dong Eui Chang

Keyword(s):

State Of The Art ◽

Robot Vision ◽

Research Field ◽

Machine Learning Algorithms ◽

Classification Error ◽

Feature Maps ◽

Deep Network ◽

Dilated Convolution ◽

Input Feature ◽

Multipath Networks

Robot vision is an essential research field that enables machines to perform various tasks by classifying/detecting/segmenting objects as humans do. The classification accuracy of machine learning algorithms already exceeds that of a well-trained human, and the results are rather saturated. Hence, in recent years, many studies have been conducted in the direction of reducing the weight of the model and applying it to mobile devices. For this purpose, we propose a multipath lightweight deep network using randomly selected dilated convolutions. The proposed network consists of two sets of multipath networks (minimum 2, maximum 8), where the output feature maps of one path are concatenated with the input feature maps of the other path so that the features are reusable and abundant. We also replace the 3×3 standard convolution of each path with a randomly selected dilated convolution, which has the effect of increasing the receptive field. The proposed network lowers the number of floating point operations (FLOPs) and parameters by more than 50% and the classification error by 0.8% as compared to the state-of-the-art. We show that the proposed network is efficient.

Download Full-text