Event Extraction via Rules and Machine Learning

The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by using machine learning classifiers. The dataset contains more than 0.1 million (102,962) labeled instances of twelve (12) different types of events. The title, its length, and the last four words of a sentence are used as features to classify the events. The Term Frequency-Inverse Document Frequency (tf-idf) showed the best results as a feature vector to evaluate the performance of the six popular machine learning classifiers. Random Forest (RF) and K-Nearest Neighbor (KNN) are among the classifiers that out-performed among other classifiers by achieving 98.00% and 99.00% accuracy, respectively. The novelty lies in the fact that the features aforementioned are not applied, up to the best of our knowledge, in the event extraction of the text written in the Urdu language.

Download Full-text

A deep-learning model for semantic role labelling in medical documents

Science and Technology Development Journal - Natural Sciences ◽

10.32508/stdjns.v5i2.928 ◽

2021 ◽

Vol 5 (2) ◽

pp. first

Author(s):

Tuấn Nguyên Hoài Đức ◽

Trần Tiện Lợi Long Tứ ◽

Lê Đình Việt Huy

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Argument Structure ◽

Learning Model ◽

Event Extraction ◽

Training Data ◽

Main Task ◽

Learning Method ◽

Task Learning ◽

Predicate Argument Structure

We built a model labelling the Predicate Argument Structure (PAS) for biomedical documents. PAS is an important semantic information of any document, because it reveals the main event mentioned in each sentence. Extracting PAS in a sentence is an important premise for the computer to solve a series of other problems related to the semantics in text such as event extraction, named entity extraction, question answering system… The predicate argument structure is domain dependent. Therefore, in Biomedical field, it is required to define a completely new Predicate Argument frame compared to the general field. For a machine learning model to work well with a new argument frame, identifying a new feature set is required. This is difficult, manual and requires a lot of expert labor. To address this challenge, we chose to train our model with Deep Learning method utilizing Bi-directional Long Short Term Memory. Deep learning is a machine learning method that does not require defining the feature sets manually. In addition, we also integrate Highway Connection between hidden neuron layers to minimize derivative loss. Besides, to overcome the problem of small training corpus, we integrate Deep Learning with Multi-task Learning technique. Multi-task Learning helps the main task (PAS tagging) to be complemented with knowledge learnt from a closely related task, the NER. Our model achieved F1 = 75.13% without any manually designed feature, thereby showing the prospect of Deep Learning in this domain. In addition, the experiment results also show that Multi-task Learning is an appropriate technique to overcome the problem of little training data in biomedical fields, by improving the F1 score.

Download Full-text

Embedding assisted prediction architecture for event trigger identification

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720015410012 ◽

2015 ◽

Vol 13 (03) ◽

pp. 1541001 ◽

Cited By ~ 19

Author(s):

Yifan Nie ◽

Wenge Rong ◽

Yiyuan Zhang ◽

Yuanxin Ouyang ◽

Zhang Xiong

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Word Embedding ◽

Event Extraction ◽

Biological Interactions ◽

Rule Based ◽

Molecular Events ◽

Syntactic Information ◽

Event Trigger ◽

Rule Based Approach

Molecular events normally have significant meanings since they describe important biological interactions or alternations such as binding of a protein. As a crucial step of biological event extraction, event trigger identification has attracted much attention and many methods have been proposed. Traditionally those methods can be categorised into rule-based approach and machine learning approach and machine learning-based approaches have demonstrated its potential and outperformed rule-based approaches in many situations. However, machine learning-based approaches still face several challenges among which a notable one is how to model semantic and syntactic information of different words and incorporate it into the prediction model. There exist many ways to model semantic and syntactic information, among which word embedding is an effective one. Therefore, in order to address this challenge, in this study, a word embedding assisted neural network prediction model is proposed to conduct event trigger identification. The experimental study on commonly used dataset has shown its potential. It is believed that this study could offer researchers insights into semantic-aware solutions for event trigger identification.

Download Full-text

Exploiting Multilingual Grammars and Machine Learning Techniques to Build an Event Extraction System for Portuguese

Lecture Notes in Computer Science - Computational Processing of the Portuguese Language ◽

10.1007/978-3-642-12320-7_3 ◽

2010 ◽

pp. 21-24 ◽

Cited By ~ 1

Author(s):

Vanni Zavarella ◽

Hristo Tanev ◽

Jens Linge ◽

Jakub Piskorski ◽

Martin Atkinson ◽

...

Keyword(s):

Machine Learning ◽

Event Extraction ◽

Machine Learning Techniques ◽

Extraction System ◽

Learning Techniques

Download Full-text

SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news

Language Resources and Evaluation ◽

10.1007/s10579-021-09562-4 ◽

2021 ◽

Author(s):

Gilles Jacobs ◽

Véronique Hoste

Keyword(s):

Machine Learning ◽

Event Extraction ◽

Training Data ◽

Supervised Machine Learning ◽

Annotation Scheme ◽

Fine Grained ◽

Business News ◽

Financial News ◽

Gold Standard Dataset ◽

Benchmark Datasets

AbstractWe present SENTiVENT, a corpus of fine-grained company-specific events in English economic news articles. The domain of event processing is highly productive and various general domain, fine-grained event extraction corpora are freely available but economically-focused resources are lacking. This work fills a large need for a manually annotated dataset for economic and financial text mining applications. A representative corpus of business news is crawled and an annotation scheme developed with an iteratively refined economic event typology. The annotations are compatible with benchmark datasets (ACE/ERE) so state-of-the-art event extraction systems can be readily applied. This results in a gold-standard dataset annotated with event triggers, participant arguments, event co-reference, and event attributes such as type, subtype, negation, and modality. An adjudicated reference test set is created for use in annotator and system evaluation. Agreement scores are substantial and annotator performance adequate, indicating that the annotation scheme produces consistent event annotations of high quality. In an event detection pilot study, satisfactory results were obtained with a macro-averaged $$F_1$$ F 1 -score of $$59\%$$ 59 % validating the dataset for machine learning purposes. This dataset thus provides a rich resource on events as training data for supervised machine learning for economic and financial applications. The dataset and related source code is made available at https://osf.io/8jec2/.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text