A study of continuous vector representations for theorem proving

Journal of Logic and Computation ◽

10.1093/logcom/exab006 ◽

2021 ◽

Author(s):

StanisŁaw PurgaŁ ◽

Julian Parsert ◽

Cezary Kaliszyk

Keyword(s):

Machine Learning ◽

Theorem Proving ◽

Natural Deduction ◽

Structural Formula ◽

Vector Representation ◽

Continuous Vector ◽

Tree Shape ◽

Vector Representations ◽

Semantic Properties

Abstract Applying machine learning to mathematical terms and formulas requires a suitable representation of formulas that is adequate for AI methods. In this paper, we develop an encoding that allows for logical properties to be preserved and is additionally reversible. This means that the tree shape of a formula including all symbols can be reconstructed from the dense vector representation. We do that by training two decoders: one that extracts the top symbol of the tree and one that extracts embedding vectors of subtrees. The syntactic and semantic logical properties that we aim to preserve include both structural formula properties, applicability of natural deduction steps and even more complex operations like unifiability. We propose datasets that can be used to train these syntactic and semantic properties. We evaluate the viability of the developed encoding across the proposed datasets as well as for the practical theorem proving problem of premise selection in the Mizar corpus.

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

Scholar2vec: Vector Representation of Scholars for Lifetime Collaborator Prediction

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3442199 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-19

Author(s):

Wei Wang ◽

Feng Xia ◽

Jian Wu ◽

Zhiguo Gong ◽

Hanghang Tong ◽

...

Keyword(s):

Scientific Collaboration ◽

Early Stage ◽

Collaboration Network ◽

Vector Representation ◽

Network Embedding ◽

Machine Learning Methods ◽

Academic Networks ◽

Special Relationships ◽

Real World Datasets ◽

Vector Representations

While scientific collaboration is critical for a scholar, some collaborators can be more significant than others, e.g., lifetime collaborators. It has been shown that lifetime collaborators are more influential on a scholar’s academic performance. However, little research has been done on investigating predicting such special relationships in academic networks. To this end, we propose Scholar2vec, a novel neural network embedding for representing scholar profiles. First, our approach creates scholars’ research interest vector from textual information, such as demographics, research, and influence. After bridging research interests with a collaboration network, vector representations of scholars can be gained with graph learning. Meanwhile, since scholars are occupied with various attributes, we propose to incorporate four types of scholar attributes for learning scholar vectors. Finally, the early-stage similarity sequence based on Scholar2vec is used to predict lifetime collaborators with machine learning methods. Extensive experiments on two real-world datasets show that Scholar2vec outperforms state-of-the-art methods in lifetime collaborator prediction. Our work presents a new way to measure the similarity between two scholars by vector representation, which tackles the knowledge between network embedding and academic relationship mining.

Download Full-text

A content spectral-based text representation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219248 ◽

2021 ◽

pp. 1-12

Author(s):

Melesio Crespo-Sanchez ◽

Ivan Lopez-Arevalo ◽

Edwin Aldana-Bobadilla ◽

Alejandro Molina-Villegas

Keyword(s):

Machine Learning ◽

Text Analysis ◽

Question Answering ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Text Representation ◽

Feature Vectors ◽

Learning Tasks ◽

Semantic Component ◽

Vector Representations

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.

Download Full-text

A natural deduction automated theorem proving system

Automated Deduction—CADE-11 - Lecture Notes in Computer Science ◽

10.1007/3-540-55602-8_200 ◽

1992 ◽

pp. 668-672 ◽

Cited By ~ 2

Author(s):

Li Dafa

Keyword(s):

Theorem Proving ◽

Automated Theorem Proving ◽

Natural Deduction

Download Full-text

Automated methods of coherence evaluation of Ukrainian texts using machine learning techniques

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2020.02-03.295 ◽

2020 ◽

pp. 295-303

Author(s):

A.A. Kramov ◽

◽

S.D. Pogorilyy ◽

Keyword(s):

Machine Learning ◽

Semantic Similarity ◽

Machine Learning Techniques ◽

Vector Representation ◽

Early Stopping ◽

Experimental Examination ◽

Convolutional Networks ◽

Learning Techniques ◽

Similarity Graph ◽

Automated Methods

The main methods of coherence evaluation of texts with the usage of different machine learning techniques have been analyzed. The principles of methods with the usage of recurrent and convolutional neural networks have been described in details. The advantages of a semantic similarity graph method have been considered. Other approaches to perform the vector representation of sentences for the estimation of semantic similarity between the elements of a text have been suggested to use. The experimental examination of methods has been performed on the set of Ukrainian scientific articles. The training of recurrent and convolutional networks with the usage of early stopping has been performed. The accuracy of the solving of document discrimination and insertion tasks has been calculated. The comparative analysis of the results obtained has been performed.

Download Full-text

ProLIF: a library to encode molecular interactions as fingerprints

Journal of Cheminformatics ◽

10.1186/s13321-021-00548-6 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Cédric Bouysset ◽

Sébastien Fiorucci

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Data Analysis ◽

Three Dimensional ◽

Molecular Complexes ◽

Use Case ◽

Docking Simulations ◽

Rna Molecules ◽

Vector Representations ◽

Interaction Fingerprints

AbstractInteraction fingerprints are vector representations that summarize the three-dimensional nature of interactions in molecular complexes, typically formed between a protein and a ligand. This kind of encoding has found many applications in drug-discovery projects, from structure-based virtual-screening to machine-learning. Here, we present ProLIF, a Python library designed to generate interaction fingerprints for molecular complexes extracted from molecular dynamics trajectories, experimental structures, and docking simulations. It can handle complexes formed of any combination of ligand, protein, DNA, or RNA molecules. The available interaction types can be fully reparametrized or extended by user-defined ones. Several tutorials that cover typical use-case scenarios are available, and the documentation is accompanied with code snippets showcasing the integration with other data-analysis libraries for a more seamless user-experience. The library can be freely installed from our GitHub repository (https://github.com/chemosim-lab/ProLIF).

Download Full-text

Domain Heuristic Fusion of Multi-Word Embeddings for Nutrient Value Prediction

Mathematics ◽

10.3390/math9161941 ◽

2021 ◽

Vol 9 (16) ◽

pp. 1941

Author(s):

Gordana Ispirova ◽

Tome Eftimov ◽

Barbara Koroušić Seljak

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Nutrient Content ◽

Relevant Information ◽

Word Embeddings ◽

Short Text ◽

Domain Specific ◽

Nutrient Value ◽

Protein Prediction ◽

Vector Representations

Being both a poison and a cure for many lifestyle and non-communicable diseases, food is inscribing itself into the prime focus of precise medicine. The monitoring of few groups of nutrients is crucial for some patients, and methods for easing their calculations are emerging. Our proposed machine learning pipeline deals with nutrient prediction based on learned vector representations on short text–recipe names. In this study, we explored how the prediction results change when, instead of using the vector representations of the recipe description, we use the embeddings of the list of ingredients. The nutrient content of one food depends on its ingredients; therefore, the text of the ingredients contains more relevant information. We define a domain-specific heuristic for merging the embeddings of the ingredients, which combines the quantities of each ingredient in order to use them as features in machine learning models for nutrient prediction. The results from the experiments indicate that the prediction results improve when using the domain-specific heuristic. The prediction models for protein prediction were highly effective, with accuracies up to 97.98%. Implementing a domain-specific heuristic for combining multi-word embeddings yields better results than using conventional merging heuristics, with up to 60% more accuracy in some cases.

Download Full-text

Machine Learning for Automated Theorem Proving: Learning to Solve SAT and QSAT

Foundations and Trends® in Machine Learning ◽

10.1561/2200000081 ◽

2021 ◽

Vol 14 (6) ◽

pp. 807-989

Author(s):

Sean B. Holden

Keyword(s):

Machine Learning ◽

Theorem Proving ◽

Automated Theorem Proving

Download Full-text

Representing Objects, Relations, and Sequences

Neural Computation ◽

10.1162/neco_a_00467 ◽

2013 ◽

Vol 25 (8) ◽

pp. 2038-2078 ◽

Cited By ~ 37

Author(s):

Stephen I. Gallant ◽

T. Wendy Okaywe

Keyword(s):

Machine Learning ◽

Language Processing ◽

Matrix Multiplication ◽

Machine Learning Algorithms ◽

Complex Structures ◽

Neural Learning ◽

Large Numbers ◽

Real World Applications ◽

Three Stages ◽

Vector Representations

Vector symbolic architectures (VSAs) are high-dimensional vector representations of objects (e.g., words, image parts), relations (e.g., sentence structures), and sequences for use with machine learning algorithms. They consist of a vector addition operator for representing a collection of unordered objects, a binding operator for associating groups of objects, and a methodology for encoding complex structures. We first develop constraints that machine learning imposes on VSAs; for example, similar structures must be represented by similar vectors. The constraints suggest that current VSAs should represent phrases (“The smart Brazilian girl”) by binding sums of terms, in addition to simply binding the terms directly. We show that matrix multiplication can be used as the binding operator for a VSA, and that matrix elements can be chosen at random. A consequence for living systems is that binding is mathematically possible without the need to specify, in advance, precise neuron-to-neuron connection properties for large numbers of synapses. A VSA that incorporates these ideas, Matrix Binding of Additive Terms (MBAT), is described that satisfies all constraints. With respect to machine learning, for some types of problems appropriate VSA representations permit us to prove learnability rather than relying on simulations. We also propose dividing machine (and neural) learning and representation into three stages, with differing roles for learning in each stage. For neural modeling, we give representational reasons for nervous systems to have many recurrent connections, as well as for the importance of phrases in language processing. Sizing simulations and analyses suggest that VSAs in general, and MBAT in particular, are ready for real-world applications.

Download Full-text

SEMANTIC DERIVATION VERIFICATION: TECHNIQUES AND IMPLEMENTATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213006003119 ◽

2006 ◽

Vol 15 (06) ◽

pp. 1053-1070 ◽

Cited By ~ 13

Author(s):

GEOFF SUTCLIFFE

Keyword(s):

Theorem Proving ◽

Automated Theorem Proving ◽

Verification Techniques ◽

Semantic Properties

Automated Theorem Proving (ATP) systems are complex pieces of software, and thus may have bugs that make them unsound. In order to guard against unsoundness, the derivations output by an ATP system may be semantically verified by trusted ATP systems that check the required semantic properties of each inference step. Such verification needs to be augmented by structural verification that checks that inferences have been used correctly in the context of the overall derivation. This paper describes techniques for semantic verification of derivations, and reports on their implementation and testing in the GDV verifier.

Download Full-text