Shallow Parsing Using Probabilistic Grammatical Inference

Author(s):  
Franck Thollard ◽  
Alexander Clark
Author(s):  
S. S. Vasiliev ◽  
D. M. Korobkin ◽  
S. A. Fomenkov

To solve the problem of information support for the synthesis of new technical solutions, a method of extracting structured data from an array of Russian-language patents is presented. The key features of the invention, such as the structural elements of the technical object and the relationships between them, are considered as information support. The data source addresses the main claim of the invention in the device patent. The unit of extraction is the semantic structure Subject-Action-Object (SAO), which semantically describes the constructive elements. The extraction method is based on shallow parsing and claim segmentation, taking into account the specifics of writing patent texts. Often the excessive length of the claim sentence and the specificity of the patent language make it difficult to efficiently use off-the-shelf tools for data extracting. All processing steps include: segmentation of the claim sentences; extraction of primary SAO structures; construction of the graph of the construct elements f the invention; integration of the data into the domain ontology. This article deals with the first two stages. Segmentation is carried out according to a number of heuristic rules, and several natural language processing tools are used to reduce analysis errors. The primary SAO elements are extracted considering the valences of the predefined semantic group of verbs, as well as information about the type of processed segment. The result of the work is the organization of the domain ontology, which can be used to find alternative designs for nodes in a technical object. In the second part of the article, an algorithm for constructing a graph of structural elements of a separate technical object, an assessment of the effectiveness of the system, as well as ontology organization and the result are considered.


Author(s):  
Lorenza Saitta ◽  
Michele Sebag

2011 ◽  
pp. 458-458
Author(s):  
Xinhua Zhang ◽  
Novi Quadrianto ◽  
Kristian Kersting ◽  
Zhao Xu ◽  
Yaakov Engel ◽  
...  

2020 ◽  
Vol 10 (23) ◽  
pp. 8747
Author(s):  
Wojciech Wieczorek ◽  
Olgierd Unold ◽  
Łukasz Strąk

Grammatical inference (GI), i.e., the task of finding a rule that lies behind given words, can be used in the analyses of amyloidogenic sequence fragments, which are essential in studies of neurodegenerative diseases. In this paper, we developed a new method that generates non-circular parsing expression grammars (PEGs) and compares it with other GI algorithms on the sequences from a real dataset. The main contribution of this paper is a genetic programming-based algorithm for the induction of parsing expression grammars from a finite sample. The induction method has been tested on a real bioinformatics dataset and its classification performance has been compared to the achievements of existing grammatical inference methods. The evaluation of the generated PEG on an amyloidogenic dataset revealed its accuracy when predicting amyloid segments. We show that the new grammatical inference algorithm achieves the best ACC (Accuracy), AUC (Area under ROC curve), and MCC (Mathew’s correlation coefficient) scores in comparison to five other automata or grammar learning methods.


2015 ◽  
Vol 12 (2) ◽  
pp. 465-486
Author(s):  
Dejan Mancev ◽  
Branimir Todorovic

Structured learning algorithms usually require inference during the training procedure. Due to their exponential size of output space, the parameter update is performed only on a relatively small collection built from the ?best? structures. The k-best MIRA is an example of an online algorithm which seeks optimal parameters by making updates on k structures with the highest score at a time. Following the idea of using k-best structures during the learning process, in this paper we introduce four new k-best extensions of max-margin structured algorithms. We discuss their properties and connection, and evaluate all algorithms on two sequence labeling problems, the shallow parsing and named entity recognition. The experiments show how the proposed algorithms are affected by the changes of k in terms of the F-measure and computational time, and that the proposed algorithms can improve results in comparison to the single best case. Moreover, the restriction to the single best case produces a comparison of the existing algorithms.


2013 ◽  
Vol 12 (11) ◽  
pp. 2119-2131 ◽  
Author(s):  
Sahin Cem Geyik ◽  
Eyuphan Bulut ◽  
Boleslaw K. Szymanski

2006 ◽  
Vol 66 (1) ◽  
pp. 3-5 ◽  
Author(s):  
Georgios Paliouras ◽  
Yasubumi Sakakibara

Sign in / Sign up

Export Citation Format

Share Document