scholarly journals Knowledge Derived From Wikipedia For Computing Semantic Relatedness

2007 ◽  
Vol 30 ◽  
pp. 181-212 ◽  
Author(s):  
S. P. Ponzetto ◽  
M. Strube

Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.

2021 ◽  
pp. 1-47
Author(s):  
Yang Trista Cao ◽  
Hal Daumé

Abstract Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systematic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and investigate where in the machine learning pipeline such biases can enter a coreference resolution system. We inspect many existing datasets for trans-exclusionary biases, and develop two new datasets for interrogating bias in both crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we will build systems that fail for: quality of service, stereotyping, and over- or under-representation, especially for binary and non-binary trans users.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Andrew K. C. Wong ◽  
Pei-Yuan Zhou ◽  
Zahid A. Butt

AbstractMachine Learning has made impressive advances in many applications akin to human cognition for discernment. However, success has been limited in the areas of relational datasets, particularly for data with low volume, imbalanced groups, and mislabeled cases, with outputs that typically lack transparency and interpretability. The difficulties arise from the subtle overlapping and entanglement of functional and statistical relations at the source level. Hence, we have developed Pattern Discovery and Disentanglement System (PDD), which is able to discover explicit patterns from the data with various sizes, imbalanced groups, and screen out anomalies. We present herein four case studies on biomedical datasets to substantiate the efficacy of PDD. It improves prediction accuracy and facilitates transparent interpretation of discovered knowledge in an explicit representation framework PDD Knowledge Base that links the sources, the patterns, and individual patients. Hence, PDD promises broad and ground-breaking applications in genomic and biomedical machine learning.


2003 ◽  
Vol 9 (3) ◽  
pp. 281-306 ◽  
Author(s):  
ANDREI POPESCU-BELIS

In this paper, we describe a system for coreference resolution and emphasize the role of evaluation for its design. The goal of the system is to group referring expressions (identified beforehand in narrative texts) into sets of coreferring expressions that correspond to discourse entities. Several knowledge sources are distinguished, such as referential compatibility between a referring expression and a discourse entity, activation factors for discourse entities, size of working memory, or meta-rules for the creation of discourse entities. For each of them, the theoretical analysis of its relevance is compared to scores obtained through evaluation. After looping through all knowledge sources, an optimal behavior is chosen, then evaluated on test data. The paper also discusses evaluation measures as well as data annotation, and compares the present approach to others in the field.


AI Magazine ◽  
2015 ◽  
Vol 36 (1) ◽  
pp. 75-86 ◽  
Author(s):  
Jennifer Sleeman ◽  
Tim Finin ◽  
Anupam Joshi

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.


2001 ◽  
Vol 27 (4) ◽  
pp. 521-544 ◽  
Author(s):  
Wee Meng Soon ◽  
Hwee Tou Ng ◽  
Daniel Chung Yong Lim

In this paper, we present a learning approach to coreference resolution of noun phrases in unrestricted text. The approach learns from a small, annotated corpus and the task includes resolving not just a certain type of noun phrase (e.g., pronouns) but rather general noun phrases. It also does not restrict the entity types of the noun phrases; that is, coreference is assigned whether they are of “organization,” “person,” or other types. We evaluate our approach on common data sets (namely, the MUC-6 and MUC-7 coreference corpora) and obtain encouraging results, indicating that on the general noun phrase coreference task, the learning approach holds promise and achieves accuracy comparable to that of nonlearning approaches. Our system is the first learning-based system that offers performance comparable to that of state-of-the-art nonlearning systems on these data sets.


2021 ◽  
Vol 40 (1) ◽  
pp. 5-22
Author(s):  
Harri Jalonen ◽  
Jussi Kokkola ◽  
Valtteri Kaartemo ◽  
Miika Vähämaa

Co-creation assumes an interactive and dynamic relationship where value is created at the nexus of interaction. Co-creating value is challenging with marginalized youths. In this article, social media is seen as an underutilized resource for developing services. This article approaches social media as a context from which it is possible to derive information that would otherwise be unattainable. Using data from a Finnish discussion board, this article answers the following question: How can the experiences of socially withdrawn youth shared on social media be used to enrich the knowledge base on service co-creation processes? The empirical data consist of messages on the Hikikomero discussion forum, which were analysed using a combination of unsupervised machine learning and discourse analysis. The results show that social media provides a window into the everyday lives of socially withdrawn youths, offering information that could be used to develop public services


Sign in / Sign up

Export Citation Format

Share Document