A corpus-based approach for Korean nominal compound analysis based on linguistic and statistical information

2001 ◽  
Vol 7 (3) ◽  
pp. 251-270 ◽  
Author(s):  
JUNTAE YOON ◽  
KEY-SUN CHOI ◽  
MANSUK SONG

The syntactic structure of a nominal compound must be analyzed first for its semantic interpretation. In addition, the syntactic analysis of nominal compounds is very useful for NLP application such as information extraction, since a nominal compound often has a similar linguistic structure with a simple sentence, as well as representing concrete and compound meaning of an object with several nouns combined. In this paper, we present a novel model for structural analysis of nominal compounds using linguistic and statistical knowledge which is coupled based on lexical information. That is, the syntactic relations defined between nouns (complement-predicate and modifier-head relation) are obtained from large corpora and again used to analyze the structures of nominal compounds and identify the underlying relations between nouns. Experiments show that the model gives good results, and can be effectively used for application systems which do not require deep semantic information.

Author(s):  
Agustín Vera Luján

RESUMEN: El objetivo de nuestro trabajo es realizar un análisis sintáctico de las construcciones de reformulación, desde un punto de vista estrictamente funcional. Basándonos en dos conceptos inspirados en la tagmémica, como son los de unidad/relación abstracta vs. unidad/relación concreta, y concibiendo las relaciones sintácticas como equivalentes de las funciones hjelmslevianas, proponemos que las construcciones de reformulación se consideren como construcciones abstractas de tipo coordinado, basando en esta estructura sintáctica la capacidad de sus constituyentes para funcionar discursivamente como elementos reformulados.ABSTRACT: Our work aims to perform a syntactic analysis of reformulation constructions from a strictly functional point of view. Based on two concepts inspired by tagmémics, such as those of abstract unity/function vs. concrete unity/function, and conceiving syntactic relations as equivalents of the Hjelmslevian functions, we propose that reformulation constructions must be considered as abstract coordinated constructions, basing on this syntactic structure its constituents capacity to work discursively as reformulated elements.


2021 ◽  
Vol 9 (14) ◽  
pp. 1-32
Author(s):  
Im Hong-Pin ◽  

This paper aims to make it clear that syntactic analysis should be based on the lexical information given in the lexicon. For this purpose, lexical information of the syntactic argument is to be taken the form like [VP NKP, _, DKP, AKP] for the ditransitive verb give in English. The argument structure projects to syntactic structure. The NKP in this structure becomes VP-subject, but there is another subject called S-subject (Sentence-Subject) below S node. This amounts to Two-Subject Hypothesis for English. Between these two subjects, there intervene Conjugation-Like Elements, enriched by close examination of English verbal conjugation. Two-Subject Hypothesis perfectly accounts for peculiarities of the Expletive There (ET)construction. Restructuring can also explain the so-called Long Distance Wh-interrogative without introducing Wh-movement, and it can also explain why the imperative verbs are taking the base forms. It can also explain the characteristics of adjective imperatives by the same principles as applied to verbal imperatives. We try to deal with the other subtle problems, to get fruitful results. Restructuring approach, we think, provides more convincing explanations than the movement one.


2019 ◽  
pp. 117-129
Author(s):  
Natalia Darchuk

The purpose of this study is to construct an automatic syntactic analysis (ASA) and, as a result, to compile a dictionary of models of multicomponent complex sentences for studying the fectures of the linear structure of Ukrainian text. The process includes two-stages: the first stage is an automatic syntactic analysis of the hierarchical type which results in building of a dependency tree (DT), in the second stage, the sentence structure information is automatically extracted from the obtained graph. ASA is a package of operations performed with a string of morphological information (the result of AMA work) representing the incoming text for determination of syntactic relations between text units. The outgoing text for the ACA is a string of information reduced after the AMA to wordforms. We have studied features of the linear structure of 2000 Ukrainian language sentences in journalistic genre (selection of 52000 words use). Based on the obtained results, we have constructed the real models of the syntactic structure of sentences, in which the relations between simple clauses were presented. All grammatical situations of the linear context were possible manifestations of models in the text. Based on that data, the algorithm for the automatic generation of a complex sentence model was created. These models are linear syntax grammar. All types of syntactic connection between the main and subordinate clauses are recorded algorithmically. Thus, it is possible to build the interpretations of the linear structure of the Ukrainian language sentence almost not using lexical-semantic information. The theoretical value of the paper is in extension of our knowledge about the structure of the syntactic level of the language and the variety of mechanisms functioning at that level. The applied value, is first of all, in creation of the dictionary of compatibility of compound (coordinated) and complex (subordinated) sentences, and in the possibility of constructing requests to the Ukrainian language Corpus in order to mine from the text definite models sentences, creating own dictionaries of authors and styles.


Author(s):  
Liana Mumrikoh ◽  
Eka Agustina ◽  
Hastuti Retno Kuspiyah

This study is an analysis of sentence structure by using a syntactic approach that portrayed in the tree diagram. This study focused only on the discussion covering the identification of types of sentence and sentence structure. It was found that there are 191 sentences from the six selected text in the textbook consisting of the simple sentence which has 53 sentences (27,75%), the compound sentence has 79 sentences (41,36%), the complex sentence has 33 sentences (17,27% ) and the compound-complex sentence has 26 sentences (13,61%) from the total number of the data. The finding of the analysis shows that the English textbook has all types of sentences, based on both several clauses and their syntactic properties. A drawing tree diagram is a fundamental skill in the study of syntactic structure; it is a common practice to provide a visual representation of the internal structure of phrase and clause. Tree diagrams are a clear way of representing syntactic structure graphically.


Author(s):  
John Carroll

This article introduces the concepts and techniques for natural language (NL) parsing, which signifies, using a grammar to assign a syntactic analysis to a string of words, a lattice of word hypotheses output by a speech recognizer or similar. The level of detail required depends on the language processing task being performed and the particular approach to the task that is being pursued. This article further describes approaches that produce ‘shallow’ analyses. It also outlines approaches to parsing that analyse the input in terms of labelled dependencies between words. Producing hierarchical phrase structure requires grammars that have at least context-free (CF) power. CF algorithms that are widely used in parsing of NL are described in this article. To support detailed semantic interpretation more powerful grammar formalisms are required, but these are usually parsed using extensions of CF parsing algorithms. Furthermore, this article describes unification-based parsing. Finally, it discusses three important issues that have to be tackled in real-world applications of parsing: evaluation of parser accuracy, parser efficiency, and measurement of grammar/parser coverage.


Author(s):  
Veneeta Dayal ◽  
Deepak Alok

Natural language allows questioning into embedded clauses. One strategy for doing so involves structures like the following: [CP-1 whi [TP DP V [CP-2 … ti …]]], where a wh-phrase that thematically belongs to the embedded clause appears in the matrix scope position. A possible answer to such a question must specify values for the fronted wh-phrase. This is the extraction strategy seen in languages like English. An alternative strategy involves a structure in which there is a distinct wh-phrase in the matrix clause. It is manifested in two types of structures. One is a close analog of extraction, but for the extra wh-phrase: [CP-1 whi [TP DP V [CP-2 whj [TP…t­j­…]]]]. The other simply juxtaposes two questions, rather than syntactically subordinating the second one: [CP-3 [CP-1 whi [TP…]] [CP-2 whj [TP…]]]. In both versions of the second strategy, the wh-phrase in CP-1 is invariant, typically corresponding to the wh-phrase used to question propositional arguments. There is no restriction on the type or number of wh-phrases in CP-2. Possible answers must specify values for all the wh-phrases in CP-2. This strategy is variously known as scope marking, partial wh movement or expletive wh questions. Both strategies can occur in the same language. German, for example, instantiates all three possibilities: extraction, subordinated, as well as sequential scope marking. The scope marking strategy is also manifested in in-situ languages. Scope marking has been subjected to 30 years of research and much is known at this time about its syntactic and semantic properties. Its pragmatics properties, however, are relatively under-studied. The acquisition of scope marking, in relation to extraction, is another area of ongoing research. One of the reasons why scope marking has intrigued linguists is because it seems to defy central tenets about the nature of wh scope taking. For example, it presents an apparent mismatch between the number of wh expressions in the question and the number of expressions whose values are specified in the answer. It poses a challenge for our understanding of how syntactic structure feeds semantic interpretation and how alternative strategies with similar functions relate to each other.


2012 ◽  
Vol 38 (3) ◽  
pp. 631-671 ◽  
Author(s):  
Ming Tan ◽  
Wenli Zhou ◽  
Lei Zheng ◽  
Shaojun Wang

This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and “readability” of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.


2007 ◽  
Vol 19 (3) ◽  
pp. 386-400 ◽  
Author(s):  
Anna S. Hasting ◽  
Sonja A. Kotz ◽  
Angela D. Friederici

The present study investigated the automaticity of morphosyntactic processes and processes of syntactic structure building using event-related brain potentials. Two experiments were conducted, which contrasted the impact of local subject-verb agreement violations (Experiment 1) and word category violations (Experiment 2) on the mismatch negativity, an early event-related brain potential component reflecting automatic auditory change detection. The two violation types were realized in two-word utterances comparable with regard to acoustic parameters and structural complexity. The grammaticality of the utterances modulated the mismatch negativity response in both experiments, suggesting that both types of syntactic violations were detected automatically within 200 msec after the violation point. However, the topographical distribution of the grammaticality effect varied as a function of violation type, which indicates that the brain mechanisms underlying the processing of subject-verb agreement and word category information may be functionally distinct even at this earliest stage of syntactic analysis. The findings are discussed against the background of studies investigating syntax processing beyond the level of two-word utterances.


Author(s):  
Agnes Jäger

AbstractThe aim of this paper is to give a syntactic analysis of sentential negation in the history of German with special emphasis on Old High German. This analysis attributes the main changes in the syntax of negation not to a change in syntactic structure but to changes in the lexical filling of the head and specifier of NegP. In addition, the more specific question of negative indefinites and negative concord (NC) in Old High German is discussed. It is argued that negative indefinites should be analysed as semantically non-negative but simply formally neg-marked. It is assumed that there is no obligatory movement of n-indefinites to SpecNegP, neither overtly nor covertly.


2020 ◽  
pp. 82-102
Author(s):  
Nataliia Darchuk ◽  

Abstract: The article describes functional features of the syntactic module of computer-based Ukrainian grammar AGAT. This is a linguistic type of computer-aided syntactic analysis, which provides full information about syntactic units and categories, in particular, predicativity, coordinate and subordinate clauses, the categories of subject and predicate etc. The developed linguistic software provides syntactic analysis of a whole sentence in the form of a dependency tree and indicates the types of syntactic relations and links. The AGAT-syntax task is to identify all varieties of compatibility – predicative, subordinate, and coordinate – of each word in the text. The grammatical characteristics of the phrase directly depend on which part of the language its keyword belongs to. The lexical and grammatical nature of the word determines its compatibility to the other words. Accordingly, phrases can be divided into substantive, adjective, pronouns, numeral, verbal and adverbial. Computer sub-grammars of valencies of the said parts of the language are built by us on a single principle: a lexema is indicated, preposition that participates in government and a case of a substantive word form in the shape of a two-letter code. In theory, according to their composition words combinations (phrases) are divided into simple, complex and combined. Dependency tree is built from two elements – nodes and connections. Nodes are wordforms and connections are relationships between the main element (“master”) and dependent element (“slave”). It enables to describe a configuration, a form, external parameters of a sentence but this is not sufficient to describe a sentence structure. Thus, the syntactic analysis has two levels: the first one attributes to each binary pair a type of syntactic relationships on the level of morphological way of expression of a “master”; the second level attributes to the connection a type of syntactic relationships, which include: subjective, objective, attributive, adverbial, completive and appositive modifying.. In such a way, the cycle of automated syntactic analysis of Ukrainian texts is completed by determining the syntactic word-combination, identifying a type of syntactic link and a type of relationship. It provides full range of characteristics that can be used for systemic study of semantic and syntactic problems. Keywords: automated syntactic analysis, dependency tree, syntactic relations, syntactic links.


Sign in / Sign up

Export Citation Format

Share Document