Dependency parsing of Polish

Abstract The predicate-argument structure transparently encoded in dependency-based syntactic representations supports machine translation, question answering, information extraction, etc. The quality of dependency parsing is therefore a crucial issue in natural language processing. In the current paper we discuss the fundamental ideas of the dependency theory and provide an overview of selected dependency-based resources for Polish. Furthermore, we present some state-of-the-art dependency parsing systems whose models can be estimated on correctly annotated data. In the experimental part, we provide an in-depth evaluation of these systems on Polish data. Our results show that graph-based parsers, even those without any neural component, are better suited for Polish than transition-based parsing systems.

Download Full-text

Multi-Graph Cooperative Learning Towards Distant Supervised Relation Extraction

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3466560 ◽

2021 ◽

Vol 12 (5) ◽

pp. 1-21

Author(s):

Changsen Yuan ◽

Heyan Huang ◽

Chong Feng

Keyword(s):

Cooperative Learning ◽

State Of The Art ◽

Relation Extraction ◽

Sentence Length ◽

Universal Relation ◽

Dependency Parsing ◽

Convolutional Network ◽

Syntactic Features ◽

Use Dependency

The Graph Convolutional Network (GCN) is a universal relation extraction method that can predict relations of entity pairs by capturing sentences’ syntactic features. However, existing GCN methods often use dependency parsing to generate graph matrices and learn syntactic features. The quality of the dependency parsing will directly affect the accuracy of the graph matrix and change the whole GCN’s performance. Because of the influence of noisy words and sentence length in the distant supervised dataset, using dependency parsing on sentences causes errors and leads to unreliable information. Therefore, it is difficult to obtain credible graph matrices and relational features for some special sentences. In this article, we present a Multi-Graph Cooperative Learning model (MGCL), which focuses on extracting the reliable syntactic features of relations by different graphs and harnessing them to improve the representations of sentences. We conduct experiments on a widely used real-world dataset, and the experimental results show that our model achieves the state-of-the-art performance of relation extraction.

Download Full-text

Deep learning for brain disorders: from data processing to disease treatment

Briefings in Bioinformatics ◽

10.1093/bib/bbaa310 ◽

2020 ◽

Author(s):

Ninon Burgos ◽

Simona Bottani ◽

Johann Faouzi ◽

Elina Thibeau-Sutre ◽

Olivier Colliot

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Language Processing ◽

State Of The Art ◽

Imaging Genetics ◽

Environmental Data ◽

Brain Disorders ◽

Disease Treatment ◽

Clinical Routine

Abstract In order to reach precision medicine and improve patients’ quality of life, machine learning is increasingly used in medicine. Brain disorders are often complex and heterogeneous, and several modalities such as demographic, clinical, imaging, genetics and environmental data have been studied to improve their understanding. Deep learning, a subpart of machine learning, provides complex algorithms that can learn from such various data. It has become state of the art in numerous fields, including computer vision and natural language processing, and is also growingly applied in medicine. In this article, we review the use of deep learning for brain disorders. More specifically, we identify the main applications, the concerned disorders and the types of architectures and data used. Finally, we provide guidelines to bridge the gap between research studies and clinical routine.

Download Full-text

Multilingual Open Information Extraction: Challenges and Opportunities

Information ◽

10.3390/info10070228 ◽

2019 ◽

Vol 10 (7) ◽

pp. 228 ◽

Cited By ~ 4

Author(s):

Daniela Barreiro Claro ◽

Marlo Souza ◽

Clarissa Castellã Xavier ◽

Leandro Oliveira

Keyword(s):

Information Extraction ◽

Language Processing ◽

State Of The Art ◽

Transfer Of Knowledge ◽

Linguistic Resources ◽

Open Information Extraction ◽

General Rules ◽

Challenges And Opportunities ◽

Multilingual Approach

The number of documents published on the Web in languages other than English grows every year. As a consequence, the need to extract useful information from different languages increases, highlighting the importance of research into Open Information Extraction (OIE) techniques. Different OIE methods have dealt with features from a unique language; however, few approaches tackle multilingual aspects. In those approaches, multilingualism is restricted to processing text in different languages, rather than exploring cross-linguistic resources, which results in low precision due to the use of general rules. Multilingual methods have been applied to numerous problems in Natural Language Processing, achieving satisfactory results and demonstrating that knowledge acquisition for a language can be transferred to other languages to improve the quality of the facts extracted. We argue that a multilingual approach can enhance OIE methods as it is ideal to evaluate and compare OIE systems, and therefore can be applied to the collected facts. In this work, we discuss how the transfer knowledge between languages can increase acquisition from multilingual approaches. We provide a roadmap of the Multilingual Open IE area concerning state of the art studies. Additionally, we evaluate the transfer of knowledge to improve the quality of the facts extracted in each language. Moreover, we discuss the importance of a parallel corpus to evaluate and compare multilingual systems.

Download Full-text

Towards relation extraction from Arabic text: a review

International Robotics & Automation Journal ◽

10.15406/iratj.2019.05.00195 ◽

2019 ◽

Vol 5 (5) ◽

pp. 212-215

Author(s):

Abeer AlArfaj

Keyword(s):

Language Processing ◽

Question Answering ◽

State Of The Art ◽

Relation Extraction ◽

Arabic Language ◽

Extraction Methods ◽

The State ◽

Semantic Relations ◽

Arabic Text ◽

Taxonomic Relation

Semantic relation extraction is an important component of ontologies that can support many applications e.g. text mining, question answering, and information extraction. However, extracting semantic relations between concepts is not trivial and one of the main challenges in Natural Language Processing (NLP) Field. The Arabic language has complex morphological, grammatical, and semantic aspects since it is a highly inflectional and derivational language, which makes task even more challenging. In this paper, we present a review of the state of the art for relation extraction from texts, addressing the progress and difficulties in this field. We discuss several aspects related to this task, considering the taxonomic and non-taxonomic relation extraction methods. Majority of relation extraction approaches implement a combination of statistical and linguistic techniques to extract semantic relations from text. We also give special attention to the state of the work on relation extraction from Arabic texts, which need further progress.

Download Full-text

Combined Distributional and Logical Semantics

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00219 ◽

2013 ◽

Vol 1 ◽

pp. 179-192 ◽

Cited By ~ 26

Author(s):

Mike Lewis ◽

Mark Steedman

Keyword(s):

Argument Structure ◽

Clustering Algorithm ◽

Question Answering ◽

Formal Semantics ◽

Function Words ◽

New Approach ◽

Predicate Argument Structure ◽

Logical Semantics ◽

Mapping Language

We introduce a new approach to semantics which combines the benefits of distributional and formal logical semantics. Distributional models have been successful in modelling the meanings of content words, but logical semantics is necessary to adequately represent many function words. We follow formal semantics in mapping language to logical representations, but differ in that the relational constants used are induced by offline distributional clustering at the level of predicate-argument structure. Our clustering algorithm is highly scalable, allowing us to run on corpora the size of Gigaword. Different senses of a word are disambiguated based on their induced types. We outperform a variety of existing approaches on a wide-coverage question answering task, and demonstrate the ability to make complex multi-sentence inferences involving quantifiers on the FraCaS suite.

Download Full-text

Labeling Chinese Predicates with Semantic Roles

Computational Linguistics ◽

10.1162/coli.2008.34.2.225 ◽

2008 ◽

Vol 34 (2) ◽

pp. 225-255 ◽

Cited By ~ 48

Author(s):

Nianwen Xue

Keyword(s):

Gold Standard ◽

Argument Structure ◽

High Performance ◽

State Of The Art ◽

Semantic Role ◽

Semantic Role Labeling ◽

Pos Tagging ◽

Fully Automatic ◽

Syntactic Annotation ◽

Predicate Argument Structure

In this article we report work on Chinese semantic role labeling, taking advantage of two recently completed corpora, the Chinese PropBank, a semantically annotated corpus of Chinese verbs, and the Chinese Nombank, a companion corpus that annotates the predicate-argument structure of nominalized predicates. Because the semantic role labels are assigned to the constituents in a parse tree, we first report experiments in which semantic role labels are automatically assigned to hand-crafted parses in the Chinese Treebank. This gives us a measure of the extent to which semantic role labels can be bootstrapped from the syntactic annotation provided in the treebank. We then report experiments using automatic parses with decreasing levels of human annotation in the input to the syntactic parser: parses that use gold-standard segmentation and POS-tagging, parses that use only gold-standard segmentation, and fully automatic parses. These experiments gauge how successful semantic role labeling for Chinese can be in more realistic situations. Our results show that when hand-crafted parses are used, semantic role labeling accuracy for Chinese is comparable to what has been reported for the state-of-the-art English semantic role labeling systems trained and tested on the English PropBank, even though the Chinese PropBank is significantly smaller in size. When an automatic parser is used, however, the accuracy of our system is significantly lower than the English state of the art. This indicates that an improvement in Chinese parsing is critical to high-performance semantic role labeling for Chinese.

Download Full-text

Instructor-aided asynchronous question answering system for online education and distance learning

The International Review of Research in Open and Distributed Learning ◽

10.19173/irrodl.v13i5.1269 ◽

2012 ◽

Vol 13 (5) ◽

pp. 102 ◽

Cited By ~ 2

Author(s):

Dunwei Wen ◽

John Cuzzola ◽

Lorna Brown ◽

Dr. Kinshuk

Keyword(s):

Natural Language Processing ◽

Distance Learning ◽

Online Education ◽

Language Processing ◽

Question Answering ◽

Prototype System ◽

Learning Situation ◽

Question Answering System ◽

Question Answering Systems

Question answering systems have frequently been explored for educational use. However, their value was somewhat limited due to the quality of the answers returned to the student. Recent question answering (QA) research has started to incorporate deep natural language processing (NLP) in order to improve these answers. However, current NLP technology involves intensive computing and thus it is hard to meet the real-time demand of traditional search. This paper introduces a question answering (QA) system particularly suited for delayed-answered questions that are typical in certain asynchronous online and distance learning settings. We exploit the communication delay between student and instructor and propose a solution that integrates into an organization’s existing learning management system. We present how our system fits into an online and distance learning situation and how it can better assist supporting students. The prototype system and its running results show the perspective and potential of this research.<br /><br />

Download Full-text

Multilingual Open Information Extraction: Challenges and Opportunities

10.20944/preprints201905.0029.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Daniela Barreiro Claro ◽

Marlo Souza ◽

Clarissa Castellã Xavier ◽

Leandro Oliveira

Keyword(s):

Information Extraction ◽

Language Processing ◽

Extraction Method ◽

State Of The Art ◽

Transfer Of Knowledge ◽

Open Information Extraction ◽

General Rules ◽

Challenges And Opportunities ◽

Multilingual Approach

The number of documents published on the Web other languages than English grows every year. As a consequence, it increases the necessity of extracting useful information from different languages, pointing out the importance of researching Open Information Extraction (OIE) techniques. Different OIE methods have been dealing with features from a unique language. On the other hand, few approaches tackle multilingual aspects. In such approaches, multilingual is only treated as an extraction method, which results in low precision due to the use of general rules. Multilingual methods have been applied to a vast amount of problems in Natural Language Processing achieving satisfactory results and demonstrating that knowledge acquisition for a language can be transferred to other languages to improve the quality of the facts extracted. We state that a multilingual approach can enhance OIE methods, being ideal to evaluate and compare OIE systems, and as a consequence, to applying it to the collected facts. In this work, we discuss how the transfer knowledge between languages can increase the acquisition from multilingual approaches. We provide a roadmap of the Multilingual Open IE area concerning the state of the art studies. Additionally, we evaluate the transfer of knowledge to improve the quality of the facts extracted in each language. Moreover, we discuss the importance of a parallel corpus to evaluate and compare multilingual systems.

Download Full-text

A Survey of Citation Recommendation Tasks and Methods

Journal of Computing and Information Technology ◽

10.20532/cit.2020.1005160 ◽

2021 ◽

Vol 28 (3) ◽

pp. 183-205

Keyword(s):

Machine Learning ◽

Language Processing ◽

State Of The Art ◽

Scientific Production ◽

Machine Learning Methods ◽

Citation Function ◽

Key Aspects ◽

Global And Local ◽

Machine Learning Models

Scientific articles store vast amounts of knowledge amassed through many decades of research. They serve to communicate research results among scientists but also for learning and tracking progress in the field. However, scientific production has risen to levels that make it difficult even for experts to keep up with work in their field. As a remedy, specialized search engines are being deployed, incorporating novel natural language processing and machine learning methods. The task of citation recommendation, in particular, has attracted much interest as it holds promise for improving the quality of scientific production. In this paper, we present the state-of-the-art in citation recommendation: we survey the methods for global and local approaches to the task, the evaluation setups and datasets, and the most successful machine learning models. In addition, we overview two tasks complementary to citation recommendation: extraction of key aspects and entities from articles and citation function classification. With this survey, we hope to provide the ground for understanding current efforts and stimulate further research in this exciting and promising field.

Download Full-text

Relation Extraction With Clause-Based Open Information Extraction

10.32920/17303840.v1 ◽

2021 ◽

Author(s):

Duc Thuan Vo

Keyword(s):

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Question Answering ◽

Relation Extraction ◽

Linguistic Knowledge ◽

Dependency Parsing ◽

Grammatical Structure ◽

Open Information Extraction ◽

Wide Range

Information Extraction (IE) is one of the challenging tasks in natural language processing. The goal of relation extraction is to discover the relevant segments of information in large numbers of textual documents such that they can be used for structuring data. IE aims at discovering various semantic relations in natural language text and has a wide range of applications such as question answering, information retrieval, knowledge presentation, among others. This thesis proposes approaches for relation extraction with clause-based Open Information Extraction that use linguistic knowledge to capture a variety of information including semantic concepts, words, POS tags, shallow and full syntax, dependency parsing in rich syntactic and semantic structures.<div>Within the plethora of Open Information Extraction that focus on the use of syntactic and dependency parsing for the purposes of detecting relations, incoherent and uninformative relation extractions can still be found. The extracted relations can be erroneous at times and fail to have a meaningful interpretation. As such, we first propose refinements to the grammatical structure of syntactic and dependency parsing with clause structures and clause types in an effort to generate propositions that can be deemed as meaningful extractable relations. Second, considering that choosing the most efficient seeds are pivotal to the success of the bootstrapping process when extracting relations, we propose an extended clause-based pattern extraction method with selftraining for unsupervised relation extraction. The proposed self-training algorithm relies on the clause-based approach to extract a small set of seed instances in order to identify and derive new patterns. Third, we employ matrix factorization and collaborative filtering for relation extraction. To avoid the need for manually predefined schemas, we employ the notion of universal schemas that is formed as a collection of patterns derived from Open Information Extraction tools as well as from relation schemas of pre-existing datasets. While previous systems have trained relations only for entities, we exploit advanced features from relation characteristics such as clause types and semantic topics for predicting new relation instances. Finally, we present an event network representation for temporal and causal event relation extraction that benefits from existing Open IE systems to generate a set of triple relations that are then used to build an event network. The event network is bootstrapped by labeling the temporal and causal disposition of events that are directly linked to each other. The event network can be systematically traversed to identify temporal and causal relations between indirectly connected events. <br></div>

Download Full-text