Categorial Grammar

Linguistics ◽  
2019 ◽  
Author(s):  
Glyn Morrill

The term “categorial grammar” refers to a variety of approaches to syntax and semantics in which expressions are categorized by recursively defined types and in which grammatical structure is the projection of the properties of the lexical types of words. In the earliest forms of categorical grammar types are functional/implicational and interact by the logical rule of Modus Ponens. In categorial grammar there are two traditions: the logical tradition that grew out of the work of Joachim Lambek, and the combinatory tradition associated with the work of Mark Steedman. The logical approach employs methods from mathematical logic and situates categorial grammars in the context of substructural logic. The combinatory approach emphasizes practical applicability to natural language processing and situates categorial grammars within extended rewriting systems. The logical tradition interprets the history of categorial grammar as comprising evolution and generalization of basic functional/implicational types into a rich categorial logic suited to the characterization of the syntax and semantics of natural language which is at once logical, formal, computational, and mathematical, reaching a level of formal explicitness not achieved in other grammar formalisms. This is the interpretation of the field that is being made in this article. This research has been partially supported by MINICO project TIN2017–89244-R. Thanks to Stepan Kuznetsov, Oriol Valentín and Sylvain Salvati for comments and suggestions. All errors and shortcomings are the author’s own.

2018 ◽  
pp. 35-38
Author(s):  
O. Hyryn

The article deals with natural language processing, namely that of an English sentence. The article describes the problems, which might arise during the process and which are connected with graphic, semantic, and syntactic ambiguity. The article provides the description of how the problems had been solved before the automatic syntactic analysis was applied and the way, such analysis methods could be helpful in developing new analysis algorithms. The analysis focuses on the issues, blocking the basis for the natural language processing — parsing — the process of sentence analysis according to their structure, content and meaning, which aims to analyze the grammatical structure of the sentence, the division of sentences into constituent components and defining links between them.


Author(s):  
John Carroll

This article introduces the concepts and techniques for natural language (NL) parsing, which signifies, using a grammar to assign a syntactic analysis to a string of words, a lattice of word hypotheses output by a speech recognizer or similar. The level of detail required depends on the language processing task being performed and the particular approach to the task that is being pursued. This article further describes approaches that produce ‘shallow’ analyses. It also outlines approaches to parsing that analyse the input in terms of labelled dependencies between words. Producing hierarchical phrase structure requires grammars that have at least context-free (CF) power. CF algorithms that are widely used in parsing of NL are described in this article. To support detailed semantic interpretation more powerful grammar formalisms are required, but these are usually parsed using extensions of CF parsing algorithms. Furthermore, this article describes unification-based parsing. Finally, it discusses three important issues that have to be tackled in real-world applications of parsing: evaluation of parser accuracy, parser efficiency, and measurement of grammar/parser coverage.


Author(s):  
T. Venkat Narayana Rao et al.

Chatbot enables the business people to reach their target customers using popular messenger apps like Facebook, Whatsapp etc. Chatbots are not handled by humans directly. Nowadays, Chatbots are becoming very popular especially in business sector by reducing the human efforts and automated customer service. It is a software which interacts with user using natural language processing, Machine Language and Artificial Intelligence. They allow users to simply ask questions which would simulate interaction with the humans. The popular and well known chatbots are Alex and Siri. This paper focus on review of chatbot, history of chatbot and its implementation along with applications.


2020 ◽  
pp. 32-51
Author(s):  
Włodzimierz Gruszczyński ◽  
Dorota Adamiec ◽  
Renata Bronikowska ◽  
Aleksandra Wieczorek

Electronic Corpus of 17th- and 18th-century Polish Texts – theoretical and workshop problems Summary This paper presents the Electronic Corpus of 17th- and 18th-century Polish Texts (KorBa) – a large (13.5-million), annotated historical corpus available online. Its creation was modelled on the assumptions of the National Corpus of Polish (NKJP), yet the specifi c nature of the historical material enforced certain modifi cations of the solutions applied in NKJP, e.g. two forms of text representation (transliteration and transcription) were introduced, the principle of designating foreign-language fragments was adopted, and the tagset was adapted to the description of the grammatical structure of the Middle Polish language. The texts collected in KorBa are diversified in chronological, geographical, stylistic, and thematic terms although, due to e.g. limited access to the material, the postulate of representativeness and sustainability of the corpus was not fully implemented. The work on the corpus was to a large extent automated as a result of using natural language processing tools. Keywords: electronic text corpus – historical corpus – 17th-18th-century Polish – natural language processing


Reading Comprehension (RC) plays an important role in Natural Language Processing (NLP) as it reads and understands text written in Natural Language. Reading Comprehension systems comprehend the given document and answer questions in the context of the given document. This paper proposes a Reading Comprehension System for Kannada documents. The RC system analyses text in the Kannada script and allows users to pose questions to It in Kannada. This system is aimed at masses whose primary language is Kannada - who would otherwise have difficulties in parsing through vast Kannada documents for the information they require. This paper discusses the proposed model built using Term Frequency - Inverse Document Frequency (TF-IDF) and its performance in extracting the answers from the context document. The proposed model captures the grammatical structure of Kannada to provide the most accurate answers to the user


2020 ◽  
pp. 41-45
Author(s):  
O. Hyryn

The article proceeds from the intended use of parsing for the purposes of automatic information search, question answering, logical conclusions, authorship verification, text authenticity verification, grammar check, natural language synthesis and other related tasks, such as ungrammatical speech analysis, morphological class definition, anaphora resolution etc. The study covers natural language processing challenges, namely of an English sentence. The article describes formal and linguistic problems, which might arise during the process and which are connected with graphic, semantic, and syntactic ambiguity. The article provides the description of how the problems had been solved before the automatic syntactic analysis was applied and the way, such analysis methods could be helpful in developing new analysis algorithms today. The analysis focuses on the issues, blocking the basis for the natural language processing — parsing — the process of sentence analysis according to their structure, content and meaning, which aims to examine the grammatical structure of the sentence, the division of sentences into constituent components and defining links between them. The analysis identifies a number of linguistic issues that will contribute to the development of an improved model of automatic syntactic analysis: lexical and grammatical synonymy and homonymy, hypo- and hyperonymy, lexical and semantic fields, anaphora resolution, ellipsis, inversion etc. The scope of natural language processing reveals obvious directions for the improvement of parsing models. The improvement will consequently expand the scope and improve the results in areas that already employ automatic parsing. Indispensable achievements in vocabulary and morphology processing shall not be neglected while improving automatic syntactic analysis mechanisms for natural languages.


Author(s):  
Ben Scott ◽  
Laurence Livermore

The Natural History Museum holds over 80 million specimens and 300 million pages of scientific text. This information is a vital research tool to help solve the most important challenge humans face over the coming years – mapping a sustainable future for ourselves and the ecosystems on which we depend. Digitising these collections and providing the data in a structured, computable form is a mammoth challenge. As of 2020, less than 15% of available specimen information currently residing on specimen labels or physical registers is digitised and publicly available (Walton et al. 2020). Machine learning applications can deliver a step-change in our activities’ scope, scale, and speed (Borsch et al. 2020). As part of SYNTHESYS+, the Natural History Museum is leading on the development of a cloud-based workflow platform for natural science specimens, the Specimen Data Refinery (SDR) (Smith et al. 2019). The SDR will provide a series of Machine Learning (ML) models, ranging from semantic segmentation to identify regions of interest on labels, to natural language processing to extract locality and taxonomic text entities from the labels, and image analysis to identify specimen traits and collection quality metrics. Each ML task is atomic, with users of the SDR selecting which model would best extract data from their digitised specimen images, allowing the workflows to be used in different institutions worldwide. It also solves one of the key problems in developing ML-based applications: the rapidity at which models become obsolete. New ML models can be introduced into the workflow, with incremental changes to improve processing, without interruption or refactoring of the pipeline. Alongside specimens, digitised images of pages of scientific literature provide another vital source of data. Functional traits mediate the interactions between plant species and their environment and play roles in determining species’ range size and threatened status. Such information is contained within the taxonomic descriptions of species and a natural language processing library has been developed to locate and extract plant functional traits from these texts (Hoehndorf et al. 2016). The ML models allow complex interrelationships between taxa and trait entities to be inferred based on the grammatical structure of sentences, improving the accuracy and extent of data point extraction. These two projects, like many other applications of ML in natural history collections, are focused on the extraction of visible information, for example, a piece of text or a measurable trait. Given the image of the specimen or page, a person would be able to extract the self-same information. However, ML excels in pattern matching and inferring unknown characters from an entire corpus. At the museum, we have started exploring this space, with our voyagerAI project for identifying specimens collected on historical expeditions of scientific discovery (e.g., the voyages of the Beagle and Challenger). This process fills in the gaps in specimen provenance and identifies 'lost' specimens collected by some of the most famous names in biodiversity history. Developing new applications of ML to uncover scientific meaning and tell the narratives of our collections, will be at the forefront of our scientific innovation in the coming years. This presentation will give an overview of these projects, and our future plans for using ML to extract data at scale within the Natural History Museum.


2018 ◽  
Vol 28 (8) ◽  
pp. 1451-1484
Author(s):  
GUILLAUME BONFANTE ◽  
BRUNO GUILLAUME

A very large amount of work in Natural Language Processing (NLP) use tree structure as the first class citizen mathematical structures to represent linguistic structures, such as parsed sentences or feature structures. However, some linguistic phenomena do not cope properly with trees; for instance, in the sentence ‘Max decides to leave,’ ‘Max’ is the subject of the both predicates ‘to_decide’ and ‘to_leave’. Tree-based linguistic formalisms generally use some encoding to manage sentences like the previous example. In former papers (Bonfante et al. 2011; Guillaume and Perrier 2012), we discussed the interest to use graphs rather than trees to deal with linguistic structures, and we have shown how Graph Rewriting could be used for their processing, for instance in the transformation of the sentence syntax into its semantics. Our experiments have shown that Graph Rewriting applications to NLP do not require the full computational power of the general Graph Rewriting setting. The most important observation is that all graph vertices in the final structures are in some sense ‘predictable’ from the input data, and so we can consider the framework of Non-size increasing Graph Rewriting. In our previous papers, we have formally described the Graph Rewriting calculus we used and our purpose here is to study the theoretical aspect of termination with respect to this calculus. Given that termination is undecidable in general, we define termination criterions based on weight, we prove the termination of weighted rewriting systems, and we give complexity bounds on derivation lengths for these rewriting systems.


2012 ◽  
Vol 7 ◽  
Author(s):  
Sandra Kübler ◽  
Eric Baucom ◽  
Matthias Scheutz

In this paper, we introduce the syntactic annotation of the CReST corpus, a corpus of natural language dialogues obtained from humans performing a cooperative, remote search task. The corpus contains the speech signals as well as transcriptions of the dialogues, which are additionally annotated for dialogue structure, disfluencies, and for syntax. The syntactic annotation comprises POS annotation, Penn Treebank style constituent annotations, dependency annotations, and combinatory categorial grammar annotations. The corpus is the first of its kind, providing parallel syntactic annotation based on three different grammar formalisms for a dialogue corpus. All three annotations are manually corrected, thus providing a high quality resource for linguistic comparisons, but also for parser evaluation across frameworks.


Sign in / Sign up

Export Citation Format

Share Document