corpus generation
Recently Published Documents


TOTAL DOCUMENTS

21
(FIVE YEARS 8)

H-INDEX

2
(FIVE YEARS 0)

Author(s):  
Ghazeefa Fatima ◽  
Rao Muhammad Adeel Nawab ◽  
Muhammad Salman Khan ◽  
Ali Saeed

Semantic word similarity is a quantitative measure of how much two words are contextually similar. Evaluation of semantic word similarity models requires a benchmark corpus. However, despite the millions of speakers and the large digital text of the Urdu language on the Internet, there is a lack of benchmark corpus for the Cross-lingual Semantic Word Similarity task for the Urdu language. This article reports our efforts in developing such a corpus. The newly developed corpus is based on the SemEval-2017 task 2 English dataset, and it contains 1,945 cross-lingual English–Urdu word pairs. For each of these pairs of words, semantic similarity scores were assigned by 11 native Urdu speakers. In addition to corpus generation, this article also reports the evaluation results of a baseline approach, namely “Translation Plus Monolingual Analysis” for automated identification of semantic similarity between English–Urdu word pairs. The results showed that the path length similarity measure performs better for the Google and Bing translated words. The newly created corpus and evaluation results are freely available online for further research and development.


2021 ◽  
pp. 86-99
Author(s):  
Jan Christian Blaise Cruz ◽  
Jose Kristian Resabal ◽  
James Lin ◽  
Dan John Velasco ◽  
Charibeth Cheng

Author(s):  
Boris Velichkov ◽  
Kristina Ivanova ◽  
Valeri Hristov ◽  
Ivan Borisov ◽  
Alexander Peychev ◽  
...  

2020 ◽  
pp. 1496-1512
Author(s):  
Usha B. Biradar ◽  
Harsha Gurulingappa ◽  
Lokanath Khamari ◽  
Shashikala Giriyan

Identification of chemical named entities in text and subsequent linkage of information to biological events is of immense value to fulfill the knowledge needs of pharmaceutical and chemical R&D. A significant amount of investigation has been carried out since a decade for identifying chemical named entities at morphological level. However, a barrier still remains in terms of value proposition to scientists at chemistry level. Therefore, the work described here aims to circumvent the information barrier by adaptation of a Conditional Random Fields-based approach for identifying chemical named entities at various levels namely generic chemical level, morphological level, and chemistry level. Substantial effort has been invested on generation of suitable multi-level annotated corpora. Recommended machine learning practices such as active learning-based training corpus generation and feature optimization have been systematically performed. Evaluation of system performance and benchmarking against the other state-of-the-approaches showed improved results.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 116247-116255
Author(s):  
Yuan Sun ◽  
Chaofan Chen ◽  
Tianci Xia ◽  
Xiaobing Zhao

Author(s):  
Niladri Sekhar Dash ◽  
L. Ramamoorthy

Author(s):  
Niladri Sekhar Dash ◽  
L. Ramamoorthy

Sign in / Sign up

Export Citation Format

Share Document