Knowledge Derived From Wikipedia For Computing Semantic Relatedness

Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.

Download Full-text

Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias throughout the Machine Learning Lifecyle

Computational Linguistics ◽

10.1162/coli_a_00413 ◽

2021 ◽

pp. 1-47

Author(s):

Yang Trista Cao ◽

Hal Daumé

Keyword(s):

Machine Learning ◽

Quality Of Service ◽

English Text ◽

Coreference Resolution ◽

Resolution System ◽

Building Systems ◽

Systematic Biases ◽

Build Systems

Abstract Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systematic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and investigate where in the machine learning pipeline such biases can enter a coreference resolution system. We inspect many existing datasets for trans-exclusionary biases, and develop two new datasets for interrogating bias in both crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we will build systems that fail for: quality of service, stereotyping, and over- or under-representation, especially for binary and non-binary trans users.

Download Full-text

Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v1i2.372 ◽

2015 ◽

Vol 1 (2) ◽

Cited By ~ 2

Author(s):

Oscar Karnalim

Keyword(s):

Vector Space ◽

Search Engine ◽

Vector Space Model ◽

Semantic Relatedness ◽

Space Model

Download Full-text

Coreference resolution of Korean anaphoric zero objects: Towards a supervised machine learning approach

International Journal of Computer Science and Information Technology for Education ◽

10.21742/ijcsite.2016.1.01 ◽

2016 ◽

Vol 1 (1) ◽

pp. 1-6

Author(s):

Euhee Kim ◽

◽

Myung-Kwan Park ◽

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Approach ◽

Coreference Resolution ◽

Machine Learning Approach

Download Full-text

A Machine Learning Based Modeling of the Cytokine Storm as it Relates to COVID-19 Using a Virtual Clinical Semantic Network (vCSN)

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378284 ◽

2020 ◽

Author(s):

Abrar Rahman ◽

John Kriak ◽

Rick Meyer ◽

Sidney Goldblatt ◽

Fuad Rahman

Keyword(s):

Machine Learning ◽

Semantic Network ◽

Cytokine Storm

Download Full-text

Pattern discovery and disentanglement on relational datasets

Scientific Reports ◽

10.1038/s41598-021-84869-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Andrew K. C. Wong ◽

Pei-Yuan Zhou ◽

Zahid A. Butt

Keyword(s):

Machine Learning ◽

Knowledge Base ◽

Case Studies ◽

Prediction Accuracy ◽

Pattern Discovery ◽

Explicit Representation ◽

Human Cognition ◽

Source Level ◽

Low Volume

AbstractMachine Learning has made impressive advances in many applications akin to human cognition for discernment. However, success has been limited in the areas of relational datasets, particularly for data with low volume, imbalanced groups, and mislabeled cases, with outputs that typically lack transparency and interpretability. The difficulties arise from the subtle overlapping and entanglement of functional and statistical relations at the source level. Hence, we have developed Pattern Discovery and Disentanglement System (PDD), which is able to discover explicit patterns from the data with various sizes, imbalanced groups, and screen out anomalies. We present herein four case studies on biomedical datasets to substantiate the efficacy of PDD. It improves prediction accuracy and facilitates transparent interpretation of discovered knowledge in an explicit representation framework PDD Knowledge Base that links the sources, the patterns, and individual patients. Hence, PDD promises broad and ground-breaking applications in genomic and biomedical machine learning.

Download Full-text

Integrating performance of web search engine with Machine Learning approach

2016 2nd International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB) ◽

10.1109/aeeicb.2016.7538344 ◽

2016 ◽

Cited By ~ 1

Author(s):

Payal A. Jadhav ◽

Prashant N. Chatur ◽

Kishor P. Wagh

Keyword(s):

Machine Learning ◽

Search Engine ◽

Web Search ◽

Learning Approach ◽

Web Search Engine ◽

Machine Learning Approach

Download Full-text

Evaluation-driven design of a robust coreference resolution system

Natural Language Engineering ◽

10.1017/s135132490300319x ◽

2003 ◽

Vol 9 (3) ◽

pp. 281-306 ◽

Cited By ~ 1

Author(s):

ANDREI POPESCU-BELIS

Keyword(s):

Working Memory ◽

Theoretical Analysis ◽

Coreference Resolution ◽

Knowledge Sources ◽

Narrative Texts ◽

Data Annotation ◽

Evaluation Measures ◽

Optimal Behavior ◽

Resolution System

In this paper, we describe a system for coreference resolution and emphasize the role of evaluation for its design. The goal of the system is to group referring expressions (identified beforehand in narrative texts) into sets of coreferring expressions that correspond to discourse entities. Several knowledge sources are distinguished, such as referential compatibility between a referring expression and a discourse entity, activation factors for discourse entities, size of working memory, or meta-rules for the creation of discourse entities. For each of them, the theoretical analysis of its relevance is compared to scores obtained through evaluation. After looping through all knowledge sources, an optimal behavior is chosen, then evaluated on test data. The paper also discusses evaluation measures as well as data annotation, and compares the present approach to others in the field.

Download Full-text

Entity Type Recognition for Heterogeneous Semantic Graphs

AI Magazine ◽

10.1609/aimag.v36i1.2569 ◽

2015 ◽

Vol 36 (1) ◽

pp. 75-86 ◽

Cited By ~ 4

Author(s):

Jennifer Sleeman ◽

Tim Finin ◽

Anupam Joshi

Keyword(s):

Machine Learning ◽

Background Knowledge ◽

Knowledge Bases ◽

Heterogeneous Data ◽

Unstructured Data ◽

Supervised Machine Learning ◽

Coreference Resolution ◽

Multiple Sources ◽

Fine Grained ◽

High Level

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.

Download Full-text

A Machine Learning Approach to Coreference Resolution of Noun Phrases

Computational Linguistics ◽

10.1162/089120101753342653 ◽

2001 ◽

Vol 27 (4) ◽

pp. 521-544 ◽

Cited By ~ 287

Author(s):

Wee Meng Soon ◽

Hwee Tou Ng ◽

Daniel Chung Yong Lim

Keyword(s):

Machine Learning ◽

Noun Phrase ◽

State Of The Art ◽

Noun Phrases ◽

Learning Approach ◽

Data Sets ◽

Coreference Resolution ◽

Machine Learning Approach

In this paper, we present a learning approach to coreference resolution of noun phrases in unrestricted text. The approach learns from a small, annotated corpus and the task includes resolving not just a certain type of noun phrase (e.g., pronouns) but rather general noun phrases. It also does not restrict the entity types of the noun phrases; that is, coreference is assigned whether they are of “organization,” “person,” or other types. We evaluate our approach on common data sets (namely, the MUC-6 and MUC-7 coreference corpora) and obtain encouraging results, indicating that on the general noun phrase coreference task, the learning approach holds promise and achieves accuracy comparable to that of nonlearning approaches. Our system is the first learning-based system that offers performance comparable to that of state-of-the-art nonlearning systems on these data sets.

Download Full-text

Sosiaalisen median hyödyntäminen nuorten palvelujen yhteiskehittämisessä

Hallinnon Tutkimus ◽

10.37450/ht.107611 ◽

2021 ◽

Vol 40 (1) ◽

pp. 5-22

Author(s):

Harri Jalonen ◽

Jussi Kokkola ◽

Valtteri Kaartemo ◽

Miika Vähämaa

Keyword(s):

Machine Learning ◽

Social Media ◽

Discourse Analysis ◽

Knowledge Base ◽

Empirical Data ◽

Discussion Forum ◽

Discussion Board ◽

Dynamic Relationship ◽

The Everyday ◽

Using Data

Co-creation assumes an interactive and dynamic relationship where value is created at the nexus of interaction. Co-creating value is challenging with marginalized youths. In this article, social media is seen as an underutilized resource for developing services. This article approaches social media as a context from which it is possible to derive information that would otherwise be unattainable. Using data from a Finnish discussion board, this article answers the following question: How can the experiences of socially withdrawn youth shared on social media be used to enrich the knowledge base on service co-creation processes? The empirical data consist of messages on the Hikikomero discussion forum, which were analysed using a combination of unsupervised machine learning and discourse analysis. The results show that social media provides a window into the everyday lives of socially withdrawn youths, offering information that could be used to develop public services

Download Full-text