Shallow Parsing Using Probabilistic Grammatical Inference

To solve the problem of information support for the synthesis of new technical solutions, a method of extracting structured data from an array of Russian-language patents is presented. The key features of the invention, such as the structural elements of the technical object and the relationships between them, are considered as information support. The data source addresses the main claim of the invention in the device patent. The unit of extraction is the semantic structure Subject-Action-Object (SAO), which semantically describes the constructive elements. The extraction method is based on shallow parsing and claim segmentation, taking into account the specifics of writing patent texts. Often the excessive length of the claim sentence and the specificity of the patent language make it difficult to efficiently use off-the-shelf tools for data extracting. All processing steps include: segmentation of the claim sentences; extraction of primary SAO structures; construction of the graph of the construct elements f the invention; integration of the data into the domain ontology. This article deals with the first two stages. Segmentation is carried out according to a number of heuristic rules, and several natural language processing tools are used to reduce analysis errors. The primary SAO elements are extracted considering the valences of the predefined semantic group of verbs, as well as information about the type of processed segment. The result of the work is the organization of the domain ontology, which can be used to find alternative designs for nodes in a technical object. In the second part of the article, an algorithm for constructing a graph of structural elements of a separate technical object, an assessment of the effectiveness of the system, as well as ontology organization and the result are considered.

Download Full-text

Using Grammatical Inference to Automate Information Extraction from the Web

Principles of Data Mining and Knowledge Discovery - Lecture Notes in Computer Science ◽

10.1007/3-540-44794-6_18 ◽

2001 ◽

pp. 216-227 ◽

Cited By ~ 13

Author(s):

Theodore W. Hong ◽

Keith L. Clark

Keyword(s):

Information Extraction ◽

Grammatical Inference ◽

Automate Information ◽

The Web

Download Full-text

Grammatical Inference

Encyclopedia of Machine Learning and Data Mining ◽

10.1007/978-1-4899-7687-1_115 ◽

2017 ◽

pp. 569-570

Author(s):

Lorenza Saitta ◽

Michele Sebag

Keyword(s):

Grammatical Inference

Download Full-text

Grammatical Inference

Encyclopedia of Machine Learning ◽

10.1007/978-0-387-30164-8_346 ◽

2011 ◽

pp. 458-458

Author(s):

Xinhua Zhang ◽

Novi Quadrianto ◽

Kristian Kersting ◽

Zhao Xu ◽

Yaakov Engel ◽

...

Keyword(s):

Grammatical Inference

Download Full-text

Grammatical Inference

Encyclopedia of the Sciences of Learning ◽

10.1007/978-1-4419-1428-6_4189 ◽

2012 ◽

pp. 1387-1387

Keyword(s):

Grammatical Inference

Download Full-text

Grammatical inference in document recognition

Grammatical Inference - Lecture Notes in Computer Science ◽

10.1007/bfb0054074 ◽

1998 ◽

pp. 175-186 ◽

Cited By ~ 1

Author(s):

Alexander S. Saidi ◽

Souad Tayeb-bey

Keyword(s):

Grammatical Inference ◽

Document Recognition

Download Full-text

Parsing Expression Grammars and Their Induction Algorithm

Applied Sciences ◽

10.3390/app10238747 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8747

Author(s):

Wojciech Wieczorek ◽

Olgierd Unold ◽

Łukasz Strąk

Keyword(s):

Genetic Programming ◽

Roc Curve ◽

Classification Performance ◽

Grammatical Inference ◽

Inference Algorithm ◽

Finite Sample ◽

Induction Method ◽

Grammar Learning ◽

Area Under Roc Curve ◽

Inference Methods

Grammatical inference (GI), i.e., the task of finding a rule that lies behind given words, can be used in the analyses of amyloidogenic sequence fragments, which are essential in studies of neurodegenerative diseases. In this paper, we developed a new method that generates non-circular parsing expression grammars (PEGs) and compares it with other GI algorithms on the sequences from a real dataset. The main contribution of this paper is a genetic programming-based algorithm for the induction of parsing expression grammars from a finite sample. The induction method has been tested on a real bioinformatics dataset and its classification performance has been compared to the achievements of existing grammatical inference methods. The evaluation of the generated PEG on an amyloidogenic dataset revealed its accuracy when predicting amyloid segments. We show that the new grammatical inference algorithm achieves the best ACC (Accuracy), AUC (Area under ROC curve), and MCC (Mathew’s correlation coefficient) scores in comparison to five other automata or grammar learning methods.

Download Full-text

K-best max-margin approaches for sequence labeling

Computer Science and Information Systems ◽

10.2298/csis140713014m ◽

2015 ◽

Vol 12 (2) ◽

pp. 465-486

Author(s):

Dejan Mancev ◽

Branimir Todorovic

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Computational Time ◽

Structured Learning ◽

Training Procedure ◽

Named Entity ◽

Sequence Labeling ◽

Small Collection ◽

Shallow Parsing ◽

Output Space

Structured learning algorithms usually require inference during the training procedure. Due to their exponential size of output space, the parameter update is performed only on a relatively small collection built from the ?best? structures. The k-best MIRA is an example of an online algorithm which seeks optimal parameters by making updates on k structures with the highest score at a time. Following the idea of using k-best structures during the learning process, in this paper we introduce four new k-best extensions of max-margin structured algorithms. We discuss their properties and connection, and evaluate all algorithms on two sequence labeling problems, the shallow parsing and named entity recognition. The experiments show how the proposed algorithms are affected by the changes of k in terms of the F-measure and computational time, and that the proposed algorithms can improve results in comparison to the single best case. Moreover, the restriction to the single best case produces a comparison of the existing algorithms.

Download Full-text