Technical Challenges and Design Issues in Bangla Language Processing
Latest Publications


TOTAL DOCUMENTS

16
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

Published By IGI Global

9781466639706, 9781466639713

Author(s):  
Syed Akhter Hossain ◽  
M. Lutfar Rahman ◽  
Faruk Ahmed ◽  
M. Abdus Sobhan

The aim of this chapter is to clearly understand the salient features of Bangla vowels and the sources of acoustic variability in Bangla vowels, and to suggest classification of vowels based on normalized acoustic parameters. Possible applications in automatic speech recognition and speech enhancement have made the classification of vowels an important problem to study. However, Bangla vowels spoken by different native speakers show great variations in their respective formant values. This brings further complications in the acoustic comparison of vowels due to different dialect and language backgrounds of the speakers. This variation necessitates the use of normalization procedures to remove the effect of non-linguistic factors. Although several researchers found a number of acoustical and perceptual correlates of vowels, acoustic parameters that work well in a speaker-independent manner are yet to be found. Besides, study of acoustic features of Bangla dental consonants to identify the spectral differences between different consonants and to parameterize them for the synthesis of the segments is another problem area for study. The extracted features for both Bangla vowels and dental consonants are tested and found with good synthetic representations that demonstrate the quality of acoustic features.


Author(s):  
Utpal Garain ◽  
Sankar De

A grammar-driven dependency parsing has been attempted for Bangla (Bengali). The free-word order nature of the language makes the development of an accurate parser very difficult. The Paninian grammatical model has been used to tackle the free-word order problem. The approach is to simplify complex and compound sentences and then to parse simple sentences by satisfying the Karaka demands of the Demand Groups (Verb Groups). Finally, parsed structures are rejoined with appropriate links and Karaka labels. The parser has been trained with a Treebank of 1000 annotated sentences and then evaluated with un-annotated test data of 150 sentences. The evaluation shows that the proposed approach achieves 90.32% and 79.81% accuracies for unlabeled and labeled attachments, respectively.


Author(s):  
Mohammed Nazrul Islam ◽  
Mohammad Ataul Karim

Automatic Bangla character recognition has been a great challenge for research and development because of the huge number of characters, change of shape in a word and in conjunctive characters, and other similar reasons. An optical joint transform correlation-based technique is developed for Bangla character recognition which involves a simple architecture, but can operate at a very high speed because of optics, and offer a very high level of accuracy with negligible false alarms. The proposed correlation technique can successfully identify a target character in a given input scene by producing a single correlation peak per target at the target location. The discrimination between target and non-target correlation peaks is found to be very high even in noisy conditions. The recognition performance of the proposed technique is observed to be insensitive to the type and number of targets. Further improvement of the technique is made by incorporating a synthetic discriminant function, which is created from distorted images of the target character and hence can make the system efficiently recognize Bangla characters in different practical scenarios.


Author(s):  
Maxim Roy

Machine Translation (MT) from Bangla to English has recently become a priority task for the Bangla Natural Language Processing (NLP) community. Statistical Machine Translation (SMT) systems require a significant amount of bilingual data between language pairs to achieve significant translation accuracy. However, being a low-density language, such resources are not available in Bangla. In this chapter, the authors discuss how machine learning approaches can help to improve translation quality within as SMT system without requiring a huge increase in resources. They provide a novel semi-supervised learning and active learning framework for SMT, which utilizes both labeled and unlabeled data. The authors discuss sentence selection strategies in detail and perform detailed experimental evaluations on the sentence selection methods. In semi-supervised settings, reversed model approach outperformed all other approaches for Bangla-English SMT, and in active learning setting, geometric 4-gram and geometric phrase sentence selection strategies proved most useful based on BLEU score results over baseline approaches. Overall, in this chapter, the authors demonstrate that for low-density language like Bangla, these machine-learning approaches can improve translation quality.


Author(s):  
Mohammed Rokibul Alam Kotwal ◽  
Foyzul Hassan ◽  
Mohammad Nurul Huda

This chapter presents Bangla (widely known as Bengali) Automatic Speech Recognition (ASR) techniques by evaluating the different speech features, such as Mel Frequency Cepstral Coefficients (MFCCs), Local Features (LFs), phoneme probabilities extracted by time delay artificial neural networks of different architectures. Moreover, canonicalization of speech features is also performed for Gender-Independent (GI) ASR. In the canonicalization process, the authors have designed three classifiers by male, female, and GI speakers, and extracted the output probabilities from these classifiers for measuring the maximum. The maximization of output probabilities for each speech file provides higher correctness and accuracies for GI speech recognition. Besides, dynamic parameters (velocity and acceleration coefficients) are also used in the experiments for obtaining higher accuracy in phoneme recognition. From the experiments, it is also shown that dynamic parameters with hybrid features also increase the phoneme recognition performance in a certain extent. These parameters not only increase the accuracy of the ASR system, but also reduce the computation complexity of Hidden Markov Model (HMM)-based classifiers with fewer mixture components.


Author(s):  
Al-Mahmud ◽  
Bishnu Sarker ◽  
K. M. Azharul Hasan

Parsing plays a very prominent role in computational linguistics. Parsing a Bangla sentence is a primary need in Bangla language processing. This chapter describes the Context Free Grammar (CFG) for parsing Bangla language, and hence, a Bangla parser is proposed based on the Bangla grammar. This approach is very simple to apply in Bangla sentences, and the method is well accepted for parsing grammar. This chapter introduces a parser for Bangla language, which is, by nature, a predictive parser, and the parse table is constructed for recognizing Bangla grammar. Parse table is an important tool to recognize syntactical mistakes of Bangla sentences when there is no entry for a terminal in the parse table. If a natural language can be successfully parsed then grammar checking of this language becomes possible. The parsing scheme in this chapter works based on a top-down parsing method. CFG suffers from a major problem called left recursion. The technique of left factoring is applied to avoid the problem.


Author(s):  
K. M. Azharul Hasan ◽  
Sajidul Islam ◽  
G. M. Mashrur-E-Elahi ◽  
Mohammad Navid Izhar

Sentiment analysis is a very important area of the natural language processing. In general, sentiment classification means the analysis to determine the expression of a speaker whether he or she holds positive or negative opinion to a specific subject. With the rapid growth of e-commerce, sentiment analysis can greatly influence everyone in their real life. For example, product reviews on the Web have become an important source of information for customers’ decision making when they want to buy any product. As the reviews are often too many for customers to go through, how to automatically classify and detect the sentiment from them has become an important research problem. In this chapter, the authors present a Sentiment Analyzer that recognizes the Bangla sentiment or opinion about a subject from Bangla text. They construct some phrase patterns and calculate their sentiment orientation. They add tags to words in the Bangla text to construct the phrase pattern for positive and negative sentiment. Then the authors match the phrase pattern in Bangla text with their predefined phrase pattern and cumulate the sentiment orientation of each sentence.


Author(s):  
Suprabhat Das ◽  
Anupam Basu ◽  
Pabitra Mitra

Rabindranath Tagore is one of the most prolific authors of Bengali literature. He has added a vast amount of richness in style and language to the Bengali text. The present study aims at a quantitative study of vocabulary size and lexical richness as well as effective search engine for his works. Several statistical measures of term distribution have been used to measure lexical richness. An initial attempt has been made to build a search engine, Anwesan, for Rabindra Rachanabali collection. The first complete digital Rabindra Rachanabali released by Society for Natural Language Technology Research, Kolkata, in 2010, has been used in the study. It was observed that a high lexical richness value was characteristics of most of Rabindranath Tagore’s work.


Author(s):  
Debasis Ganguly ◽  
Johannes Leveling ◽  
Gareth J.F. Jones

This chapter introduces Bengali Information Retrieval (IR) to students by explaining the fundamental concepts of IR such as indexing, retrieval, and evaluation metrics. This chapter also provides a survey of and comparisons between various Bengali language-specific methodologies, and hence can serve researchers particularly interested in the state-of-the-art developments in Bengali IR. It can also act as a guideline for application developers on how to set up an information retrieval system for the Bengali language. All steps for creating and evaluating an information retrieval system are introduced, including content processing, indexing, retrieval models, and evaluation. Special attention is given to language-specific aspects of Bengali information retrieval. In addition, the chapter discusses cross-lingual information retrieval, where queries are entered in English with an objective to retrieving Bengali documents.


Author(s):  
Shahina Haque

The chapter provides an overview of the theory of speech production, analysis, and synthesis, and status of Bangla speech processing. As nasality is a distinctive feature of Bangla and all the vowels have their nasal counterpart, both Bangla vowels and nasality are also considered. The chapter reviews the state-of-the-art of nasal vowel research, cross language perception of vowel nasality, and vowel nasality transformation to be used in a speech synthesizer.


Sign in / Sign up

Export Citation Format

Share Document