Ontology Learning from Text

2015 ◽  
Vol 4 (2) ◽  
pp. 1-14 ◽  
Author(s):  
Abel Browarnik ◽  
Oded Maimon

The goal of Ontology Learning from Text is to learn ontologies that represent domains or applications that change often. Manually learning and updating such ontologies is too expensive. This is the reason for the Ontology Learning discipline's emergence. The leading approach to Ontology Learning from Text is the Ontology Learning Layer Cake. This approach splits the task into four or five sequential tasks. Each of the tasks may use diverse methods, ranging from uses of Linguistic knowledge to Machine Learning. The authors review the shortcomings of the Ontology Learning Layer Cake approach and conclude that the approach is not viable for Ontology Learning from Text. They suggest alternative approaches that may help learning ontologies in an efficient, effective way.

Author(s):  
Abel Browarnik ◽  
Oded Maimon

In this chapter we analyze Ontology Learning and its goals, as well as the input expected when learning ontologies - peer-reviewed scientific papers in English. After reviewing the Ontology Learning Layer Cake model's shortcomings we suggest an alternative model based on linguistic knowledge. The suggested model would find the meaning of simple components of text – statements. From them it is easy to derive cases and roles that map the reality as a set of entities and relationships or RDF triples, somehow equivalent to Entity-relationship diagrams. Time complexity for the suggested ontology learning framework is constant (O(1)) for a sentence, and O(n) for an ontology with n sentences. We conclude that the Ontology Learning Layer Cake is not adequate for Ontology Learning from text.


2021 ◽  
Author(s):  
Vu-Linh Nguyen ◽  
Mohammad Hossein Shaker ◽  
Eyke Hüllermeier

AbstractVarious strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.


Author(s):  
Giovanni Semeraro ◽  
Pierpaolo Basile ◽  
Marco de Gemmis ◽  
Pasquale Lops

Exploring digital collections to find information relevant to a user’s interests is a challenging task. Information preferences vary greatly across users; therefore, filtering systems must be highly personalized to serve the individual interests of the user. Algorithms designed to solve this problem base their relevance computations on user profiles in which representations of the users’ interests are maintained. The main focus of this chapter is the adoption of machine learning to build user profiles that capture user interests from documents. Profiles are used for intelligent document filtering in digital libraries. This work suggests the exploiting of knowledge stored in machine-readable dictionaries to obtain accurate user profiles that describe user interests by referring to concepts in those dictionaries. The main aim of the proposed approach is to show a real-world scenario in which the combination of machine learning techniques and linguistic knowledge is helpful to achieve intelligent document filtering.


Author(s):  
Katrin Erk

Computational semantics performs automatic meaning analysis of natural language. Research in computational semantics designs meaning representations and develops mechanisms for automatically assigning those representations and reasoning over them. Computational semantics is not a single monolithic task but consists of many subtasks, including word sense disambiguation, multi-word expression analysis, semantic role labeling, the construction of sentence semantic structure, coreference resolution, and the automatic induction of semantic information from data. The development of manually constructed resources has been vastly important in driving the field forward. Examples include WordNet, PropBank, FrameNet, VerbNet, and TimeBank. These resources specify the linguistic structures to be targeted in automatic analysis, and they provide high-quality human-generated data that can be used to train machine learning systems. Supervised machine learning based on manually constructed resources is a widely used technique. A second core strand has been the induction of lexical knowledge from text data. For example, words can be represented through the contexts in which they appear (called distributional vectors or embeddings), such that semantically similar words have similar representations. Or semantic relations between words can be inferred from patterns of words that link them. Wide-coverage semantic analysis always needs more data, both lexical knowledge and world knowledge, and automatic induction at least alleviates the problem. Compositionality is a third core theme: the systematic construction of structural meaning representations of larger expressions from the meaning representations of their parts. The representations typically use logics of varying expressivity, which makes them well suited to performing automatic inferences with theorem provers. Manual specification and automatic acquisition of knowledge are closely intertwined. Manually created resources are automatically extended or merged. The automatic induction of semantic information is guided and constrained by manually specified information, which is much more reliable. And for restricted domains, the construction of logical representations is learned from data. It is at the intersection of manual specification and machine learning that some of the current larger questions of computational semantics are located. For instance, should we build general-purpose semantic representations, or is lexical knowledge simply too domain-specific, and would we be better off learning task-specific representations every time? When performing inference, is it more beneficial to have the solid ground of a human-generated ontology, or is it better to reason directly with text snippets for more fine-grained and gradual inference? Do we obtain a better and deeper semantic analysis as we use better and deeper manually specified linguistic knowledge, or is the future in powerful learning paradigms that learn to carry out an entire task from natural language input and output alone, without pre-specified linguistic knowledge?


Kokborok named entity recognition using the rules based approach is being studied in this paper. Named entity recognition is one of the applications of natural language processing. It is considered a subtask for information extraction. Named entity recognition is the means of identifying the named entity for some specific task. We have studied the named entity recognition system for the Kokborok language. Kokborok is the official language of the state of Tripura situated in the north eastern part of India. It is also widely spoken in other part of the north eastern state of India and adjoining areas of Bangladesh. The named entities are like the name of person, organization, location etc. Named entity recognitions are studied using the machine learning approach, rule based approach or the hybrid approach combining the machine learning and rule based approaches. Rule based named entity recognitions are influence by the linguistic knowledge of the language. Machine learning approach requires a large number of training data. Kokborok being a low resource language has very limited number of training data. The rule based approach requires linguistic rules and the results are not depended on the size of data available. We have framed a heuristic rules for identifying the named entity based on linguistic knowledge of the language. An encouraging result is obtained after we test our data with the rule based approach. We also tried to study and frame the rules for the counting system in Kokborok in this paper. The rule based approach to named entity recognition is found suitable for low resource language with limited digital work and absence of named entity tagged data. We have framed a suitable algorithm using the rules for solving the named entity recognition task for obtaining a desirable result.


Sign in / Sign up

Export Citation Format

Share Document