POS Tagging and Structural Annotation of Handwritten Text Image Corpus of Devnagari Script

Author(s):  
Maninder Singh Nehra ◽  
Neeta Nain ◽  
Mushtaq Ahmed ◽  
Deepa Modi
Author(s):  
Nindian Puspa Dewi ◽  
Ubaidi Ubaidi

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1372
Author(s):  
Sanjanasri JP ◽  
Vijay Krishna Menon ◽  
Soman KP ◽  
Rajendran S ◽  
Agnieszka Wolk

Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.


Author(s):  
S. Nagesh Bhattu ◽  
Satya Krishna Nunna ◽  
D. V. L. N. Somayajulu ◽  
Binay Pradhan
Keyword(s):  

2020 ◽  
Vol 22 (1) ◽  
pp. 51-55
Author(s):  
Dawn Behrend

Poverty, Philanthropy and Social Conditions in Victorian Britain published by Adam Matthew Digital is comprised of primary digital materials culled from three major archives in Britain and the UK focused on the experience of poverty in Victorian Britain and efforts involving economic, government, and social reform such as the Poor Law, workhouses, settlement houses, and philanthropic initiatives. Content is derived from the National Archives at Kew, British Library, and Senate House Library and includes pamphlets, correspondence, newspaper clippings, books, and other resources. A small portion of the collection utilizes Adam Matthew Digital’s Handwritten Text Recognition (HTR) to enable keyword searching of handwritten documents. The digitized images and documents are clear, searchable, and user-friendly to access, save, and share. Contract provisions are standard to the product with authenticated access across institutional locations and guidelines for Interlibrary Loan sharing. Pricing is determined by institutional size and enrollment. While the product is a one-time purchase, annual hosting fees apply for ongoing access. Content is currently heavily derived from one archive, the Senate House Library, with pamphlets from this source making up nearly half of the total holdings. Users seeking access to a more extensive collection of similar material may prefer subscribing to JSTOR which includes JSTOR 19th Century British Pamphlets with over 26,000 pamphlets along with secondary scholarly journals and eBooks on the Victorian era. While not providing the primary sources of Poverty, Philanthropy and Social Conditions in Victorian Britain or JSTOR, Historical Abstracts may be an alternative resource in providing access to notable scholarly resources on the period.


Sign in / Sign up

Export Citation Format

Share Document