shallow parsing
Recently Published Documents


TOTAL DOCUMENTS

84
(FIVE YEARS 8)

H-INDEX

9
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Luis-Gil Moreno-Jiménez ◽  
Juan-Manuel Torres-Moreno ◽  
Roseli S. Wedemann

In recent years, researchers in the area of Computational Creativity have studied the human creative process proposing different approaches to reproduce it with a formal procedure. In this paper, we introduce a model for the generation of literary rhymes in Spanish, combining structures of language and neural network models The results obtained with a manual evaluation of the texts generated by our algorithm are encouraging.


Author(s):  
S. S. Vasiliev ◽  
D. M. Korobkin ◽  
S. A. Fomenkov

To solve the problem of information support for the synthesis of new technical solutions, a method of extracting structured data from an array of Russian-language patents is presented. The key features of the invention, such as the structural elements of the technical object and the relationships between them, are considered as information support. The data source addresses the main claim of the invention in the device patent. The unit of extraction is the semantic structure Subject-Action-Object (SAO), which semantically describes the constructive elements. The extraction method is based on shallow parsing and claim segmentation, taking into account the specifics of writing patent texts. Often the excessive length of the claim sentence and the specificity of the patent language make it difficult to efficiently use off-the-shelf tools for data extracting. All processing steps include: segmentation of the claim sentences; extraction of primary SAO structures; construction of the graph of the construct elements f the invention; integration of the data into the domain ontology. This article deals with the first two stages. Segmentation is carried out according to a number of heuristic rules, and several natural language processing tools are used to reduce analysis errors. The primary SAO elements are extracted considering the valences of the predefined semantic group of verbs, as well as information about the type of processed segment. The result of the work is the organization of the domain ontology, which can be used to find alternative designs for nodes in a technical object. In the second part of the article, an algorithm for constructing a graph of structural elements of a separate technical object, an assessment of the effectiveness of the system, as well as ontology organization and the result are considered.


2021 ◽  
Author(s):  
Jonas Sjöbergh ◽  
Viggo Kann

We present an online API to access a number of Natural Language Processing services developed at KTH. The services work on Swedish text. They include tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more. The services can be accessed in several ways, including a RESTful interface, direct socket communication, and premade Web forms. The services are open to anyone. The source code is also freely available making it possible to set up another server or run the tools locally. We have also evaluated the performance of several of the services and compared them to other available systems. Both the precision and the recall for the Granska grammar checker are higher than for both Microsoft Word and Google Docs. The evaluation also shows that the recall is greatly improved when combining all the grammar checking services in the API, compared to any one method, and combining services is made easy by the API.


2021 ◽  
Author(s):  
Ruben Kruiper ◽  
Ioannis Konstas ◽  
Alasdair J.G. Gray ◽  
Farhad Sadeghineko ◽  
Richard Watson ◽  
...  
Keyword(s):  

2020 ◽  
Vol 27 (1) ◽  
Author(s):  
MK Aregbesola ◽  
RA Ganiyu ◽  
SO Olabiyisi ◽  
EO Omidiora

The concept of automated grammar evaluation of natural language texts is one that has attracted significant interests in the natural language processing community. It is the examination of natural language text for grammatical accuracy using computer software. The current work is a comparative study of different deep and shallow parsing techniques that have been applied to lexical analysis and grammaticality evaluation of natural language texts. The comparative analysis was based on data gathered from numerous related works. Shallow parsing using induced grammars was first examined along with its two main sub-categories, the probabilistic statistical parsers and the connectionist approach using neural networks. Deep parsing using handcrafted grammar was subsequently examined along with several of it‟s subcategories including Transformational Grammars, Feature Based Grammars, Lexical Functional Grammar (LFG), Definite Clause Grammar (DCG), Property Grammar (PG), Categorial Grammar (CG), Generalized Phrase Structure Grammar (GPSG), and Head-driven Phrase Structure Grammar (HPSG). Based on facts gathered from literature on the different aforementioned formalisms, a comparative analysis of the deep and shallow parsing techniques was performed. The comparative analysis showed among other things that while the shallow parsing approach was usually domain dependent, influenced by sentence length and lexical frequency and employed machine learning to induce grammar rules, the deep parsing approach were not domain dependent, not influenced by sentence length nor lexical frequency, and they made use of well spelt out set of precise linguistic rules. The deep parsing techniques proved to be a more labour intensive approach while the induced grammar rules were usually faster and reliability increased with size, accuracy and coverage of training data. The shallow parsing approach has gained immense popularity owing to availability of large corpora for different languages, and has therefore become the most accepted and adopted approach in recent times. Keywords: Grammaticality, Natural language processing, Deep parsing, Shallow parsing, Handcrafted grammar, Precision grammar, Induced grammar, Automated scoring, Computational linguistics, Comparative study.


2020 ◽  
Vol 34 (10) ◽  
pp. 13921-13922
Author(s):  
Chan Hee Song ◽  
Arijit Sehanobish

Most Named Entity Recognition (NER) systems use additional features like part-of-speech (POS) tags, shallow parsing, gazetteers, etc. Adding these external features to NER systems have been shown to have a positive impact. However, creating gazetteers or taggers can take a lot of time and may require extensive data cleaning. In this work instead of using these traditional features we use lexicographic features of Chinese characters. Chinese characters are composed of graphical components called radicals and these components often have some semantic indicators. We propose CNN based models that incorporate this semantic information and use them for NER. Our models show an improvement over the baseline BERT-BiLSTM-CRF model. We present one of the first studies on Chinese OntoNotes v5.0 and show an improvement of + .64 F1 score over the baseline. We present a state-of-the-art (SOTA) F1 score of 71.81 on the Weibo dataset, show a competitive improvement of + 0.72 over baseline on the ResumeNER dataset, and a SOTA F1 score of 96.49 on the MSRA dataset.


2018 ◽  
Vol 10 (2) ◽  
pp. 269-305 ◽  
Author(s):  
Dirk Pijpops ◽  
Isabeau De Smet ◽  
Freek Van de Velde

Abstract In every-day language use, two or more structurally unrelated constructions may occasionally give rise to strings that look very similar on the surface. As a result of this superficial resemblance, a subset of instances of one of these constructions may deviate in the probabilistic preference for either of several possible formal variants. This effect is called ‘constructional contamination’, and was introduced in Pijpops & Van de Velde (2016). Constructional contamination bears testimony to the hypothesis that language users do not always execute a full parse of the utterances they interpret, but instead often rely on ‘shallow parsing’ and the storage of large, unanalyzed chunks of language in memory, as proposed in Ferreira, Bailey, & Ferraro (2002), Ferreira & Patson (2007), and Dąbrowska (2014). Pijpops & Van de Velde (2016) investigated a single case study in depth, namely the Dutch partitive genitive. This case study is reviewed, and three new case studies are added, namely the competition between long and bare infinitives, word order variation in verbal clusters, and preterite formation. We find evidence of constructional contamination in all case studies, albeit in varying degrees. This indicates that constructional contamination is not a particularity of the Dutch partitive genitive but appears to be more wide-spread, affecting both morphology and syntax. Furthermore, we distinguish between two forms of constructional contamination, viz. first degree and second degree contamination, with first degree contamination producing greater effects than second degree contamination.


Sign in / Sign up

Export Citation Format

Share Document