Automatically Annotate TV Series Subtitles for Dialogue Corpus Construction

Author(s):  
Leilan Zhang ◽  
Qiang Zhou
Keyword(s):  
2020 ◽  
Vol 54 (3) ◽  
pp. 581-613
Author(s):  
Abbie Hantgan

Abstract The purpose of this study is to re-evaluate the interpretation of a particle that has hitherto been analyzed as a marker either of addressee or the subject of a quoted clause in Ben Tey (Dogon, Mali). As both of these interpretations are typologically rare if not unique, a broader conceptualization for the particle as a quotative topic marker is proposed here. Data are from a newly compiled cross-linguistic annotated corpus of discourse reports within textual contexts. Along with data presentation and analysis, a methodology is illustrated for multilingual comparative corpus construction for the analysis of discourse reporting strategies.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Ali Hamid Meftah ◽  
Mustafa Qamhan ◽  
Yasser Seddiq ◽  
Yousef A. Alotaibi ◽  
Sid-Ahmed Selouani

2015 ◽  
Vol 76 (3) ◽  
pp. 4123-4139 ◽  
Author(s):  
Yuping Lin ◽  
Yonghong Song ◽  
Yingyu Li ◽  
Fang Wang ◽  
Kai He

2021 ◽  
Vol 336 ◽  
pp. 06013
Author(s):  
Jizhaxi Dao ◽  
Zhijie Cai ◽  
Rangzhuoma Cai ◽  
Maocuo San ◽  
Mabao Ban

Corpus serves as an indispensable ingredient for statistical NLP research and real-world applications, therefore corpus construction method has a direct impact on various downstream tasks. This paper proposes a method to construct Tibetan text classification corpus based on a syllable-level processing technique which we refer as TC_TCCNL. Empirical evidence indicates that the algorithm is able to produce a promising performance, which may lay a starting point for research on Tibetan text classification in the future.


2021 ◽  
Vol 251 ◽  
pp. 01030
Author(s):  
Qinqi Kang ◽  
Zhao Kang

With the rapid development of artificial intelligence in the current era of big data, the construction of translation corpus has become a key factor in effectively achieving a highly intelligent translation. In the era of big data, the data sources and data types of translation corpus are becoming more and more diversified, which will inevitably bring about a new revolution in the construction of translation corpus. The construction of the translation corpus in the era of big data can fully rely on third-party open source data, crowd-sourcing translation, machine closed-loop, human-machine collaboration and other multiple modes to comprehensively improve the quality of translation corpus construction to better serve translation practice.


Sign in / Sign up

Export Citation Format

Share Document