scholarly journals Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Author(s):  
Chanatip Saetia ◽  
Tawunrat Chalothorn ◽  
Ekapol Chuangsuwanich ◽  
Peerapon Vateekul



1978 ◽  
Vol 21 (4) ◽  
pp. 793-808 ◽  
Author(s):  
John M. Carroll ◽  
Michael K. Tanenhaus

In two experiments subjects listened to a sentence containing a brief tone, then wrote out the sentence and marked the location of the tone. The experimental sentences were biclausal with the tone placed before or after the clause break. The initial clause was either functionally complete or functionally incomplete. Functionally complete clauses contain a complete set of fully specified grammatical relations, while functionally incomplete clauses do not. In Experiment 1 tones were mislocated toward the clause break and the final word of the first clause significantly more often for functionally complete clauses. Experiment 2 replicated this finding holding deep-and surface-structure variables constant. The resulis indicate that functionally complete clauses are better segmentation units during sentence perception than functionally incomplete clauses. Purely structural theories of the units of sentence perception cannot account for this finding.



Author(s):  
Shengqin Xu ◽  
Fang Kong ◽  
Peifeng Li ◽  
Qiaoming Zhu


Author(s):  
Jáchym Kolář ◽  
Elizabeth Shriberg ◽  
Yang Liu


2017 ◽  
Vol 43 (1) ◽  
pp. 1-30 ◽  
Author(s):  
Claire Gardent ◽  
Laura Perez-Beltrachini

Although there has been much work in recent years on data-driven natural language generation, little attention has been paid to the fine-grained interactions that arise during microplanning between aggregation, surface realization, and sentence segmentation. In this article, we propose a hybrid symbolic/statistical approach to jointly model the constraints regulating these interactions. Our approach integrates a small handwritten grammar, a statistical hypertagger, and a surface realization algorithm. It is applied to the verbalization of knowledge base queries and tested on 13 knowledge bases to demonstrate domain independence. We evaluate our approach in several ways. A quantitative analysis shows that the hybrid approach outperforms a purely symbolic approach in terms of both speed and coverage. Results from a human study indicate that users find the output of this hybrid statistic/symbolic system more fluent than both a template-based and a purely symbolic grammar-based approach. Finally, we illustrate by means of examples that our approach can account for various factors impacting aggregation, sentence segmentation, and surface realization.



2014 ◽  
Vol 513-517 ◽  
pp. 4605-4609
Author(s):  
Li Fang Xu ◽  
Yun Zhu ◽  
Li Jiao Yang ◽  
Yao Hong Jin

The processing of long sentences is a difficult problem in machine translation. Previous researchers used punctuation to deal with it. In this paper, we presented a rule-based method for sentence segmentation with conjunctions to improve the performance of long sentence machine translation in patent text. We divided conjunctions into different LEVELs according to semantic features of verbs which are before and behind them. Then, we formulated a number of rules based on the LEVELs of conjunctions to segment long Chinese sentence into separated shorter ones. We conducted experiments on 10 intact patent documents which contain 901 conjunctions. Consequently, our method achieves an accuracy of over 89% overall. The result indicates that our method can efficiently improve the performance of long patent sentence translation.



2015 ◽  
Author(s):  
Kathleen C. Fraser ◽  
Naama Ben-David ◽  
Graeme Hirst ◽  
Naida Graham ◽  
Elizabeth Rochon


Sign in / Sign up

Export Citation Format

Share Document