shallow parsing Latest Research Papers

A Preliminary Study for Literary Rhyme Generation based on Neuronal Representation, Semantics and Shallow Parsing

10.5753/stil.2021.17798 ◽

2021 ◽

Author(s):

Luis-Gil Moreno-Jiménez ◽

Juan-Manuel Torres-Moreno ◽

Roseli S. Wedemann

Keyword(s):

Neural Network ◽

Creative Process ◽

Network Models ◽

Computational Creativity ◽

Neural Network Models ◽

Formal Procedure ◽

Shallow Parsing ◽

Preliminary Study ◽

Neuronal Representation

In recent years, researchers in the area of Computational Creativity have studied the human creative process proposing different approaches to reproduce it with a formal procedure. In this paper, we introduce a model for the generation of literary rhymes in Spanish, combining structures of language and neural network models The results obtained with a manual evaluation of the texts generated by our algorithm are encouraging.

Download Full-text

METHOD OF DOMAIN ONTOLOGY AUTOMATED REPLENISHMENT FOR THE SUPPORT OF NEW TECHNICAL SOLUTIONS SYNTHESIS. PART I

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2021.11.pp.003-012 ◽

2021 ◽

pp. 3-12

Author(s):

S. S. Vasiliev ◽

D. M. Korobkin ◽

S. A. Fomenkov

Keyword(s):

Language Processing ◽

Domain Ontology ◽

Russian Language ◽

Information Support ◽

Structural Elements ◽

Technical Object ◽

Main Claim ◽

Shallow Parsing ◽

Technical Solutions ◽

Alternative Designs

To solve the problem of information support for the synthesis of new technical solutions, a method of extracting structured data from an array of Russian-language patents is presented. The key features of the invention, such as the structural elements of the technical object and the relationships between them, are considered as information support. The data source addresses the main claim of the invention in the device patent. The unit of extraction is the semantic structure Subject-Action-Object (SAO), which semantically describes the constructive elements. The extraction method is based on shallow parsing and claim segmentation, taking into account the specifics of writing patent texts. Often the excessive length of the claim sentence and the specificity of the patent language make it difficult to efficiently use off-the-shelf tools for data extracting. All processing steps include: segmentation of the claim sentences; extraction of primary SAO structures; construction of the graph of the construct elements f the invention; integration of the data into the domain ontology. This article deals with the first two stages. Segmentation is carried out according to a number of heuristic rules, and several natural language processing tools are used to reduce analysis errors. The primary SAO elements are extracted considering the valences of the predefined semantic group of verbs, as well as information about the type of processed segment. The result of the work is the organization of the domain ontology, which can be used to find alternative designs for nodes in a technical object. In the second part of the article, an algorithm for constructing a graph of structural elements of a separate technical object, an assessment of the effectiveness of the system, as well as ontology organization and the result are considered.

Download Full-text

Granska API – an Online API for Grammar Checking and Other NLP Services

10.3384/ecp184175 ◽

2021 ◽

Author(s):

Jonas Sjöbergh ◽

Viggo Kann

Keyword(s):

Language Processing ◽

Error Detection ◽

Spelling Error ◽

Google Docs ◽

Part Of Speech ◽

Microsoft Word ◽

Error Detection And Correction ◽

Shallow Parsing ◽

Set Up ◽

Swedish Text

We present an online API to access a number of Natural Language Processing services developed at KTH. The services work on Swedish text. They include tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more. The services can be accessed in several ways, including a RESTful interface, direct socket communication, and premade Web forms. The services are open to anyone. The source code is also freely available making it possible to set up another server or run the tools locally. We have also evaluated the performance of several of the services and compared them to other available systems. Both the precision and the recall for the Granska grammar checker are higher than for both Microsoft Word and Google Docs. The evaluation also shows that the recall is greatly improved when combining all the grammar checking services in the API, compared to any one method, and combining services is made easy by the API.

Download Full-text

SPaR.txt, a Cheap Shallow Parsing Approach for Regulatory Texts

10.18653/v1/2021.nllp-1.14 ◽

2021 ◽

Author(s):

Ruben Kruiper ◽

Ioannis Konstas ◽

Alasdair J.G. Gray ◽

Farhad Sadeghineko ◽

Richard Watson ◽

...

Keyword(s):

Shallow Parsing

Download Full-text

BIDIRECTIONAL GATED RECURRENT UNIT FOR SHALLOW PARSING

Indian Journal of Computer Science and Engineering ◽

10.21817/indjcse/2020/v11i5/201105167 ◽

2020 ◽

Vol 11 (5) ◽

pp. 517-521

Author(s):

Medari Janai Tham

Keyword(s):

Shallow Parsing ◽

Gated Recurrent Unit

Download Full-text

A Comparative Study of Deep and Shallow Parsing Approaches to Automated Grammaticality Evaluation

Journal of Computer Science and Its Application ◽

10.4314/jcsia.v27i1.2 ◽

2020 ◽

Vol 27 (1) ◽

Author(s):

MK Aregbesola ◽

RA Ganiyu ◽

SO Olabiyisi ◽

EO Omidiora

Keyword(s):

Natural Language Processing ◽

Comparative Analysis ◽

Natural Language ◽

Comparative Study ◽

Language Processing ◽

Sentence Length ◽

Phrase Structure Grammar ◽

Lexical Frequency ◽

Grammar Rules ◽

Shallow Parsing

The concept of automated grammar evaluation of natural language texts is one that has attracted significant interests in the natural language processing community. It is the examination of natural language text for grammatical accuracy using computer software. The current work is a comparative study of different deep and shallow parsing techniques that have been applied to lexical analysis and grammaticality evaluation of natural language texts. The comparative analysis was based on data gathered from numerous related works. Shallow parsing using induced grammars was first examined along with its two main sub-categories, the probabilistic statistical parsers and the connectionist approach using neural networks. Deep parsing using handcrafted grammar was subsequently examined along with several of it‟s subcategories including Transformational Grammars, Feature Based Grammars, Lexical Functional Grammar (LFG), Definite Clause Grammar (DCG), Property Grammar (PG), Categorial Grammar (CG), Generalized Phrase Structure Grammar (GPSG), and Head-driven Phrase Structure Grammar (HPSG). Based on facts gathered from literature on the different aforementioned formalisms, a comparative analysis of the deep and shallow parsing techniques was performed. The comparative analysis showed among other things that while the shallow parsing approach was usually domain dependent, influenced by sentence length and lexical frequency and employed machine learning to induce grammar rules, the deep parsing approach were not domain dependent, not influenced by sentence length nor lexical frequency, and they made use of well spelt out set of precise linguistic rules. The deep parsing techniques proved to be a more labour intensive approach while the induced grammar rules were usually faster and reliability increased with size, accuracy and coverage of training data. The shallow parsing approach has gained immense popularity owing to availability of large corpora for different languages, and has therefore become the most accepted and adopted approach in recent times. Keywords: Grammaticality, Natural language processing, Deep parsing, Shallow parsing, Handcrafted grammar, Precision grammar, Induced grammar, Automated scoring, Computational linguistics, Comparative study.

Download Full-text

Using Chinese Glyphs for Named Entity Recognition (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7233 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13921-13922

Author(s):

Chan Hee Song ◽

Arijit Sehanobish

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Positive Impact ◽

Data Cleaning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Chinese Characters ◽

Named Entity ◽

Part Of Speech ◽

Shallow Parsing

Most Named Entity Recognition (NER) systems use additional features like part-of-speech (POS) tags, shallow parsing, gazetteers, etc. Adding these external features to NER systems have been shown to have a positive impact. However, creating gazetteers or taggers can take a lot of time and may require extensive data cleaning. In this work instead of using these traditional features we use lexicographic features of Chinese characters. Chinese characters are composed of graphical components called radicals and these components often have some semantic indicators. We propose CNN based models that incorporate this semantic information and use them for NER. Our models show an improvement over the baseline BERT-BiLSTM-CRF model. We present one of the first studies on Chinese OntoNotes v5.0 and show an improvement of + .64 F1 score over the baseline. We present a state-of-the-art (SOTA) F1 score of 71.81 on the Weibo dataset, show a competitive improvement of + 0.72 over baseline on the ResumeNER dataset, and a SOTA F1 score of 96.49 on the MSRA dataset.

Download Full-text

A Shallow Parsing Approach to Natural Language Queries of a Database

Journal of Software Engineering and Applications ◽

10.4236/jsea.2019.129022 ◽

2019 ◽

Vol 12 (09) ◽

pp. 365-382

Author(s):

Richard Skeggs ◽

Stasha Lauria

Keyword(s):

Natural Language ◽

Shallow Parsing

Download Full-text

Constructional contamination in morphology and syntax

Constructions and Frames ◽

10.1075/cf.00021.pij ◽

2018 ◽

Vol 10 (2) ◽

pp. 269-305 ◽

Cited By ~ 2

Author(s):

Dirk Pijpops ◽

Isabeau De Smet ◽

Freek Van de Velde

Keyword(s):

Case Studies ◽

Word Order ◽

Language Use ◽

Single Case ◽

Superficial Resemblance ◽

Word Order Variation ◽

Order Variation ◽

Shallow Parsing ◽

Second Degree

Abstract In every-day language use, two or more structurally unrelated constructions may occasionally give rise to strings that look very similar on the surface. As a result of this superficial resemblance, a subset of instances of one of these constructions may deviate in the probabilistic preference for either of several possible formal variants. This effect is called ‘constructional contamination’, and was introduced in Pijpops & Van de Velde (2016). Constructional contamination bears testimony to the hypothesis that language users do not always execute a full parse of the utterances they interpret, but instead often rely on ‘shallow parsing’ and the storage of large, unanalyzed chunks of language in memory, as proposed in Ferreira, Bailey, & Ferraro (2002), Ferreira & Patson (2007), and Dąbrowska (2014). Pijpops & Van de Velde (2016) investigated a single case study in depth, namely the Dutch partitive genitive. This case study is reviewed, and three new case studies are added, namely the competition between long and bare infinitives, word order variation in verbal clusters, and preterite formation. We find evidence of constructional contamination in all case studies, albeit in varying degrees. This indicates that constructional contamination is not a particularity of the Dutch partitive genitive but appears to be more wide-spread, affecting both morphology and syntax. Furthermore, we distinguish between two forms of constructional contamination, viz. first degree and second degree contamination, with first degree contamination producing greater effects than second degree contamination.

Download Full-text