Information Extraction (IE) is one of the challenging tasks in natural language
processing. The goal of relation extraction is to discover the relevant segments of
information in large numbers of textual documents such that they can be used for
structuring data. IE aims at discovering various semantic relations in natural language
text and has a wide range of applications such as question answering, information
retrieval, knowledge presentation, among others. This thesis proposes approaches for
relation extraction with clause-based Open Information Extraction that use linguistic
knowledge to capture a variety of information including semantic concepts, words, POS
tags, shallow and full syntax, dependency parsing in rich syntactic and semantic
structures.<div>Within the plethora of Open Information Extraction that focus on the use of
syntactic and dependency parsing for the purposes of detecting relations, incoherent
and uninformative relation extractions can still be found. The extracted relations can be
erroneous at times and fail to have a meaningful interpretation. As such, we first
propose refinements to the grammatical structure of syntactic and dependency parsing
with clause structures and clause types in an effort to generate propositions that can be
deemed as meaningful extractable relations. Second, considering that choosing the most
efficient seeds are pivotal to the success of the bootstrapping process when extracting
relations, we propose an extended clause-based pattern extraction method with selftraining for unsupervised relation extraction. The proposed self-training algorithm
relies on the clause-based approach to extract a small set of seed instances in order to
identify and derive new patterns. Third, we employ matrix factorization and
collaborative filtering for relation extraction. To avoid the need for manually predefined schemas, we employ the notion of universal schemas that is formed as a collection of patterns derived from Open Information Extraction tools as well as from
relation schemas of pre-existing datasets. While previous systems have trained relations
only for entities, we exploit advanced features from relation characteristics such as
clause types and semantic topics for predicting new relation instances. Finally, we
present an event network representation for temporal and causal event relation
extraction that benefits from existing Open IE systems to generate a set of triple
relations that are then used to build an event network. The event network is
bootstrapped by labeling the temporal and causal disposition of events that are directly
linked to each other. The event network can be systematically traversed to identify
temporal and causal relations between indirectly connected events. <br></div>