Machine and web translator for English to Bangla using natural language processes

Mohammad Hasibul Haque; Md Fokhray Hossain; ANM Fauzul Hossain

doi:10.3329/diujst.v5i1.4382

Machine and web translator for English to Bangla using natural language processes

Daffodil International University Journal of Science and Technology ◽

10.3329/diujst.v5i1.4382 ◽

1970 ◽

Vol 5 (1) ◽

pp. 53-61

Author(s):

Mohammad Hasibul Haque ◽

Md Fokhray Hossain ◽

ANM Fauzul Hossain

Keyword(s):

Natural Language ◽

Language Processing ◽

Parse Tree ◽

Translation System ◽

Web Pages ◽

Natural Languages ◽

Pos Tagging ◽

Web Contents ◽

Massive Number ◽

Translation Methods

The modern web contents are mostly written in English and developing a system with the facility of translating web pages from English to Bangla that can aid the massive number of people of Bangladesh. It is very important to introduce Natural Language Processing (NLP) and is required to developing a solution of web translator. It is a technique that deals with understanding natural languages and natural language generation. It is really a challenging job to building a Web Translator with 100% efficiency and our proposed Web Translator basically uses Machine Translator as its mother concern. This paper represents an optimal way for English to Bangla machine and the Web translation & translation methods are used by translator. Naturally there are three stages for MT but here we propose a translation system which includes 4 stages, such as, POS tagging, Generating parse tree, Transfer English parse tree to Bengali parse tree and Translate English to Bangla and apply AI. An innovation initiative has scope of being upgraded in future and hopefully this work will assist to develop more improved English to Bangla Web Translator. Keywords: Machine Translator, Web Translator, POS Tagging, Parsing, HTML Parsing, Verb Mapping DOI: 10.3329/diujst.v5i1.4382 Daffodil International University Journal of Science and Technology Vol.5(1) 2010 pp.53-61

Download Full-text

Direct Machine Translation System from Punjabi to Hindi for Newspapers headlines Domain

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v8i3.3402 ◽

2013 ◽

Vol 8 (3) ◽

pp. 908-912 ◽

Cited By ~ 1

Author(s):

Sumita Rani ◽

Dr. Vijay Luxmi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Translation System ◽

Natural Languages ◽

Machine Translation System ◽

Common Parent

Machine Translation System is an important area in Natural Language Processing. The Direct MT system is based upon the utilization of syntactic and vocabulary similarities between more or few related natural languages. The relation between two or more languages is based upon their common parent language. The similarity between Punjabi and Hindi languages is due to their parent language Sanskrit. Punjabi and Hindi are closely related languages with lots of similarities in syntax and vocabulary. In the present paper, Direct Machine Translation System from Punjabi to Hindi has been developed and its output is evaluated in order to get the suitability of the system.

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

Formalising Natural Languages: Applications to Natural Language Processing and Digital Humanities

10.1007/978-3-030-70629-6 ◽

2021 ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Digital Humanities ◽

Natural Languages

Download Full-text

Natural Language Processing by Enhanced Honey Encryption Technique

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l1048.10812s19 ◽

2019 ◽

Vol 8 (12S) ◽

pp. 159-163

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Cyber Attacks ◽

Binary Form ◽

Brute Force ◽

Natural Languages ◽

Cipher Text ◽

The Right ◽

Binary Strings

Traditional encryption systems and techniques have always been vulnerable to brute force cyber-attacks. This is due to bytes encoding of characters utf8 also known as ASCII characters. Therefore, an opponent who intercepts a cipher text and attempts to decrypt the signal by applying brute force with a faulty pass key can detect some of the decrypted signals by employing a mixture of symbols that are not uniformly dispersed and contain no meaningful significance. Honey encoding technique is suggested to curb this classical authentication weakness by developing cipher-texts that provide correct and evenly dispersed but untrue plaintexts after decryption with a false key. This technique is only suitable for passkeys and PINs. Its adjustment in order to promote the encoding of the texts of natural languages such as electronic mails, records generated by man, still remained an open-end drawback. Prevailing proposed schemes to expand the encryption of natural language messages schedule exposes fragments of the plaintext embedded with coded data, thus they are more prone to cipher text attacks. In this paper, amending honey encoded system is proposed to promote natural language message encryption. The main aim was to create a framework that would encrypt a signal fully in binary form. As an end result, most binary strings semantically generate the right texts to trick an opponent who tries to decipher an error key in the cipher text. The security of the suggested system is assessed..

Download Full-text

SEMCL

International Journal of Knowledge and Systems Science ◽

10.4018/jkss.2010070101 ◽

2010 ◽

Vol 1 (3) ◽

pp. 1-19 ◽

Cited By ~ 2

Author(s):

Weisen Guo ◽

Steven B. Kraines

Keyword(s):

Semantic Web ◽

Natural Language ◽

Knowledge Sharing ◽

Language Processing ◽

Global Knowledge ◽

Semantic Web Technologies ◽

Natural Languages ◽

Web Technologies ◽

Language Knowledge ◽

Cross Language

To promote global knowledge sharing, one should solve the problem that knowledge representation in diverse natural languages restricts knowledge sharing effectively. Traditional knowledge sharing models are based on natural language processing (NLP) technologies. The ambiguity of natural language is a problem for NLP; however, semantic web technologies can circumvent the problem by enabling human authors to specify meaning in a computer-interpretable form. In this paper, the authors propose a cross-language semantic model (SEMCL) for knowledge sharing, which uses semantic web technologies to provide a potential solution to the problem of ambiguity. Also, this model can match knowledge descriptions in diverse languages. First, the methods used to support searches at the semantic predicate level are given, and the authors present a cross-language approach. Finally, an implementation of the model for the general engineering domain is discussed, and a scenario describing how the model implementation handles semantic cross-language knowledge sharing is given.

Download Full-text

Visual Sensemaking of Massive Crowdsourced Data for Design Ideation

Proceedings of the Design Society: International Conference on Engineering Design ◽

10.1017/dsi.2019.44 ◽

2019 ◽

Vol 1 (1) ◽

pp. 409-418

Author(s):

Yuejun He ◽

Bradley Camburn ◽

Jianxi Luo ◽

Maria C. Yang ◽

Kristin L. Wood

Keyword(s):

Language Processing ◽

Future Research ◽

Natural Languages ◽

Concept Space ◽

Word Clouds ◽

Crowdsourced Data ◽

New Ideas ◽

Massive Number ◽

Future Research Directions ◽

Rich Information

AbstractTextual idea data from online crowdsourcing contains rich information of the concepts that underlie the original ideas and can be recombined to generate new ideas. But representing such information in a way that can stimulate new ideas is not a trivial task, because crowdsourced data are often vast and in unstructured natural languages. This paper introduces a method that uses natural language processing to summarize a massive number of idea descriptions and represents the underlying concept space as word clouds with a core-periphery structure to inspire recombinations of such concepts into new ideas. We report the use of this method in a real public-sector-sponsored project to explore ideas for future transportation system design. Word clouds that represent the concept space underlying original crowdsourced ideas are used as ideation aids and stimulate many new ideas with varied novelty, usefulness and feasibility. The new ideas suggest that the proposed method helps expand the idea space. Our analysis of these ideas and a survey with the designers who generated them shed light on how people perceive and use the word clouds as ideation aids and suggest future research directions.

Download Full-text

Mood and modality: out of theory and into the fray

Natural Language Engineering ◽

10.1017/s1351324903003279 ◽

2004 ◽

Vol 10 (1) ◽

pp. 57-89 ◽

Cited By ~ 2

Author(s):

MARJORIE MCSHANE ◽

SERGEI NIRENBURG ◽

RON ZACHARSKI

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Translation System ◽

Free Standing ◽

Indicative Conditional ◽

Tense And Aspect ◽

Language L ◽

Wide Range ◽

Value Sets

The topic of mood and modality (MOD) is a difficult aspect of language description because, among other reasons, the inventory of modal meanings is not stable across languages, moods do not map neatly from one language to another, modality may be realised morphologically or by free-standing words, and modality interacts in complex ways with other modules of the grammar, like tense and aspect. Describing MOD is especially difficult if one attempts to develop a unified approach that not only provides cross-linguistic coverage, but is also useful in practical natural language processing systems. This article discusses an approach to MOD that was developed for and implemented in the Boas Knowledge-Elicitation (KE) system. Boas elicits knowledge about any language, L, from an informant who need not be a trained linguist. That knowledge then serves as the static resources for an L-to-English translation system. The KE methodology used throughout Boas is driven by a resident inventory of parameters, value sets, and means of their realisation for a wide range of language phenomena. MOD is one of those parameters, whose values are the inventory of attested and not yet attested moods (e.g. indicative, conditional, imperative), and whose realisations include flective morphology, agglutinating morphology, isolating morphology, words, phrases and constructions. Developing the MOD elicitation procedures for Boas amounted to wedding the extensive theoretical and descriptive research on MOD with practical approaches to guiding an untrained informant through this non-trivial task. We believe that our experience in building the MOD module of Boas offers insights not only into cross-linguistic aspects of MOD that have not previously been detailed in the natural language processing literature, but also into KE methodologies that could be applied more broadly.

Download Full-text

NATURAL LANGUAGE PROCESSING WITHIN A SLOT GRAMMAR FRAMEWORK

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821309200020x ◽

1992 ◽

Vol 01 (02) ◽

pp. 229-277 ◽

Cited By ~ 2

Author(s):

MICHAEL MCCORD ◽

ARENDSE BERNTH ◽

SHALOM LAPPIN ◽

WLODEK ZADROZNY

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Logical Form ◽

Translation System ◽

Inference System ◽

Linguistic Rules ◽

Machine Translation System ◽

Single Structure ◽

Grammar Analysis

This paper contains brief descriptions of the latest form of Slot Grammar and four natural language processing systems developed in this framework. Slot Grammar is a lexicalist, dependency-oriented grammatical system, based on the systematic expression of linguistic rules and data in terms of slots (essentially grammatical relations) and slot frames. The exposition focuses on the kinds of analysis structures produced by the Slot Grammar parser. These structures offer convenient input to post-syntactic processing (in particular to the applications dealt with in the paper); they contain in a single structure a useful combination of surface structure and logical form. The four applications discussed are: (1) An anaphora resolution system dealing with both NP anaphora and VP anaphora (and combinations of the two). (2) A meaning postulate based inference system for natural language, in which inference is done directly with Slot Grammar analysis structures. (3) A new transfer system for the machine translation system LMT, based on a new representation for Slot Grammar analyses which allows more convenient tree exploration. (4) A parser of "constructions", viewed as an extension of the core grammar allowing one to handle some linguistic phenomena that are often labeled "extragrammatical", and to assign a semantics to them.

Download Full-text

Improving Brill's tagger lexical and transformation rule for Afaan Oromo language

10.7287/peerj.preprints.1225v1 ◽

2015 ◽

Author(s):

Abraham G Ayana

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Transformation Rule ◽

Initial State ◽

Training Corpus ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging

Natural Language Processing (NLP) refers to Human-like language processing which reveals that it is a discipline within the field of Artificial Intelligence (AI). However, the ultimate goal of research on Natural Language Processing is to parse and understand language, which is not fully achieved yet. For this reason, much research in NLP has focused on intermediate tasks that make sense of some of the structure inherent in language without requiring complete understanding. One such task is part-of-speech tagging, or simply tagging. Lack of standard part of speech tagger for Afaan Oromo will be the main obstacle for researchers in the area of machine translation, spell checkers, dictionary compilation and automatic sentence parsing and constructions. Even though several works have been done in POS tagging for Afaan Oromo, the performance of the tagger is not sufficiently improved yet. Hence,the aim of this thesis is to improve Brill’s tagger lexical and transformation rule for Afaan Oromo POS tagging with sufficiently large training corpus. Accordingly, Afaan Oromo literatures on grammar and morphology are reviewed to understand nature of the language and also to identify possible tagsets. As a result, 26 broad tagsets were identified and 17,473 words from around 1100 sentences containing 6750 distinct words were tagged for training and testing purpose. From which 258 sentences are taken from the previous work. Since there is only a few ready made standard corpuses, the manual tagging process to prepare corpus for this work was challenging and hence, it is recommended that a standard corpus is prepared. Transformation-based Error driven learning are adapted for Afaan Oromo part of speech tagging. Different experiments are conducted for the rule based approach taking 20% of the whole data for testing. A comparison with the previously adapted Brill’s Tagger made. The previously adapted Brill’s Tagger shows an accuracy of 80.08% whereas the improved Brill’s Tagger result shows an accuracy of 95.6% which has an improvement of 15.52%. Hence, it is found that the size of the training corpus, the rule generating system in the lexical rule learner, and moreover, using Afaan Oromo HMM tagger as initial state tagger have a significant effect on the improvement of the tagger.

Download Full-text

Foreword

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2011.3.2123 ◽

2011 ◽

Vol 6 (3) ◽

pp. 385 ◽

Cited By ~ 1

Author(s):

L.A. Zadeh

Keyword(s):

Fuzzy Logic ◽

Natural Language ◽

Language Processing ◽

Extension Principle ◽

Computing With Words ◽

World Knowledge ◽

Special Issue ◽

Large Measure ◽

Natural Languages ◽

Computer Scientists

I feel honored by the dedication of the Special Issue of IJCCC to me. I should like to express my deep appreciation to the distinguished Co-Editors and my good friends, Professors Balas, Dzitac and Teodorescu, and to distinguished contributors, for honoring me. The subjects which are addressed in the Special Issue are on the frontiers of fuzzy logic. The Foreword gives me an opportunity to share with the readers of the Journal my recent thoughts regarding a subject which I have been pondering about for many years - fuzzy logic and natural languages. The first step toward linking fuzzy logic and natural languages was my 1973 paper," Outline of a New Approach to the Analysis of Complex Systems and Decision Processes." Two key concepts were introduced in that paper. First, the concept of a linguistic variable - a variable which takes words as values; and second, the concept of a fuzzy if- then rule - a rule in which the antecedent and consequent involve linguistic variables. Today, close to forty years later, these concepts are widely used in most applications of fuzzy logic. The second step was my 1978 paper, "PRUF - a Meaning Representation Language for Natural Languages." This paper laid the foundation for a series of papers in the eighties in which a fairly complete theory of fuzzy - logic-based semantics of natural languages was developed. My theory did not attract many followers either within the fuzzy logic community or within the linguistics and philosophy of languages communities. There is a reason. The fuzzy logic community is largely a community of engineers, computer scientists and mathematicians - a community which has always shied away from semantics of natural languages. Symmetrically, the linguistics and philosophy of languages communities have shied away from fuzzy logic. In the early nineties, a thought that began to crystallize in my mind was that in most of the applications of fuzzy logic linguistic concepts play an important, if not very visible role. It is this thought that motivated the concept of Computing with Words (CW or CWW), introduced in my 1996 paper "Fuzzy Logic = Computing with Words." In essence, Computing with Words is a system of computation in which the objects of computation are words, phrases and propositions drawn from a natural language. The same can be said about Natural Language Processing (NLP.) In fact, CW and NLP have little in common and have altogether different agendas. In large measure, CW is concerned with solution of computational problems which are stated in a natural language. Simple example. Given: Probably John is tall. What is the probability that John is short? What is the probability that John is very short? What is the probability that John is not very tall? A less simple example. Given: Usually Robert leaves office at about 5 pm. Typically it takes Robert about an hour to get home from work. What is the probability that Robert is home at 6:l5 pm.? What should be noted is that CW is the only system of computation which has the capability to deal with problems of this kind. The problem-solving capability of CW rests on two key ideas. First, employment of so-called restriction-based semantics (RS) for translation of a natural language into a mathematical language in which the concept of a restriction plays a pivotal role; and second, employment of a calculus of restrictions - a calculus which is centered on the Extension Principle of fuzzy logic. What is thought-provoking is that neither traditional mathematics nor standard probability theory has the capability to deal with computational problems which are stated in a natural language. Not having this capability, it is traditional to dismiss such problems as ill-posed. In this perspective, perhaps the most remarkable contribution of CW is that it opens the door to empowering of mathematics with a fascinating capability - the capability to construct mathematical solutions of computational problems which are stated in a natural language. The basic importance of this capability derives from the fact that much of human knowledge, and especially world knowledge, is described in natural language. In conclusion, only recently did I begin to realize that the formalism of CW suggests a new and challenging direction in mathematics - mathematical solution of computational problems which are stated in a natural language. For mathematics, this is an unexplored territory.

Download Full-text