Mobile Interface for Domain Specific Machine Translation Using Short Messaging Service

In this chapter, the authors discuss several pertinent aspects of an automatic system that generates summaries in multiple languages for sets of topic-related news articles (multilingual multi-document summarisation), gathered by news aggregation systems. The discussion follows a framework based on Latent Semantic Analysis (LSA) because LSA was shown to be a high-performing method across many different languages. Starting from a sentence-extractive approach, the authors show how domain-specific aspects can be used and how a compression and paraphrasing method can be plugged in. They also discuss the challenging problem of summarisation evaluation in different languages. In particular, the authors describe two approaches: the first uses a parallel corpus and the second statistical machine translation.

Download Full-text

Domain-Specific Hybrid Machine Translation from English to Portuguese

Lecture Notes in Computer Science - Computational Processing of the Portuguese Language ◽

10.1007/978-3-319-41552-9_5 ◽

2016 ◽

pp. 50-61

Author(s):

João Rodrigues ◽

Luís Gomes ◽

Steven Neale ◽

Andreia Querido ◽

Nuno Rendeiro ◽

...

Keyword(s):

Machine Translation ◽

Domain Specific ◽

Hybrid Machine ◽

Specific Hybrid ◽

Hybrid Machine Translation

Download Full-text

Using Statistical Machine Translation Model to Improve Domain-Specific Metasearch Engines

2007 IEEE International Conference on Control and Automation ◽

10.1109/icca.2007.4376679 ◽

2007 ◽

Author(s):

Kunhui Lin

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Model ◽

Domain Specific

Download Full-text

A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge

Methods of Information in Medicine ◽

10.3414/me17-01-0028 ◽

2017 ◽

Vol 56 (05) ◽

pp. 370-376 ◽

Cited By ~ 1

Author(s):

Roberto Pérez-Rodríguez ◽

Luis E. Anido-Rifón ◽

Marcos A. Mouriño-García

Keyword(s):

Machine Translation ◽

Semantic Analysis ◽

State Of The Art ◽

Statistical Significance ◽

Text Documents ◽

Domain Specific ◽

Specific Concept ◽

Cross Language ◽

Better Than

SummaryObjectives: The ability to efficiently review the existing literature is essential for the rapid progress of research. This paper describes a classifier of text documents, represented as vectors in spaces of Wikipedia concepts, and analyses its suitability for classification of Spanish biomedical documents when only English documents are available for training. We propose the cross-language concept matching (CLCM) technique, which relies on Wikipedia interlanguage links to convert concept vectors from the Spanish to the English space.Methods: The performance of the classifier is compared to several baselines: a classifier based on machine translation, a classifier that represents documents after performing Explicit Semantic Analysis (ESA), and a classifier that uses a domain-specific semantic an- notator (MetaMap). The corpus used for the experiments (Cross-Language UVigoMED) was purpose-built for this study, and it is composed of 12,832 English and 2,184 Spanish MEDLINE abstracts.Results: The performance of our approach is superior to any other state-of-the art classifier in the benchmark, with performance increases up to: 124% over classical machine translation, 332% over MetaMap, and 60 times over the classifier based on ESA. The results have statistical significance, showing p-values < 0.0001.Conclusion: Using knowledge mined from Wikipedia to represent documents as vectors in a space of Wikipedia concepts and translating vectors between language-specific concept spaces, a cross-language classifier can be built, and it performs better than several state-of-the-art classifiers.

Download Full-text