A New Concept of Electronic Text Based on Semantic Coding System for Machine Translation

Meftah Mohammed Charaf Eddine

doi:10.1145/3469655

A New Concept of Electronic Text Based on Semantic Coding System for Machine Translation

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3469655 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-16

Author(s):

Meftah Mohammed Charaf Eddine

Keyword(s):

Machine Translation ◽

Text Processing ◽

Electronic Text ◽

Web Pages ◽

Coding System ◽

Text Editor ◽

Accuracy Rate ◽

Structural Ambiguity ◽

Very High ◽

Structural Aspects

In the field of machine translation of texts, the ambiguity in both lexical (dictionary) and structural aspects is still one of the difficult problems. Researchers in this field use different approaches, the most important of which is machine learning in its various types. The goal of the approach that we propose in this article is to define a new concept of electronic text, which makes the electronic text free from any lexical or structural ambiguity. We used a semantic coding system that relies on attaching the original electronic text (via the text editor interface) with the meanings intended by the author. The author defines the meaning desired for each word that can be a source of ambiguity. The proposed approach in this article can be used with any type of electronic text (text processing applications, web pages, email text, etc.). Thanks to the approach that we propose and through the experiments that we have conducted using it, we can obtain a very high accuracy rate. We can say that the problem of lexical and structural ambiguity can be completely solved. With this new concept of electronic text, the text file contains not only the text but also with it the true sense of the exact meaning intended by the writer in the form of symbols. These semantic symbols are used during machine translation to obtain a translated text completely free of any lexical and structural ambiguity.

Download Full-text

The Extraction of Social Networks from Web Using Search Engines

Journal of Applied and Advanced Research ◽

10.21839/jaar.2017.v2i3.92 ◽

2017 ◽

Vol 2 (3) ◽

pp. 170

Author(s):

Faranak Salman Mohajer

Keyword(s):

Social Network ◽

Search Engine ◽

Text Processing ◽

Web Pages ◽

Large Collection ◽

The Social ◽

Processing Techniques ◽

Available Information ◽

Google Search ◽

Very High

In this paper, our purpose is to create a large collection of related vocabularies and concepts to the user’s favorite field (articles, people, conferences, books, etc.) from the available information on the infinite and vast source of web which is expressed in the form of social network. In the other words, we introduced a way to help the researchers to be able to specify their favorite topic in a particular field and by this way, observe and extract the social network of the related concepts to that topic. In order to extract the nodes of this network, we used the sampling of web pages through the Google search engine, text processing techniques, and information retrieval. The topic of the extracted social network in this research is the scientific conferences in the field of computer sciences. In order to evaluate the effectiveness of this method, the extracted network from the results of the search engine is compared with the scientific conferences available in the DBLP[1] database. The obtained results from the social network analysis showed that the extracted network is of very high accuracy.[1] Digital Bibliography and Library Project

Download Full-text

Peran Text Processing Dalam Aplikasi Penerjemah Multi Bahasa Menggunakan Ajax API Google

Sainstech: Jurnal Penelitian dan Pengkajian Sains dan Teknologi ◽

10.37277/stch.v28i1.270 ◽

2018 ◽

Vol 28 (1) ◽

Author(s):

Afrizal Zein

Keyword(s):

Machine Translation ◽

Text Processing ◽

Neural Machine Translation ◽

End To End

Mesin penerjemah adalah alat penerjemah otomatis pada sebuah teks yang dapat merubah dari satu bahasa ke bahasa yang berbeda. Mesin penerjemah adalah sebuah software dengan hasil terjemahan dihasilkan atas dasar model linier regresi yang parameter-parameternya diambil dari hasil analisis statistik teks bilingual. Sekarang kami memperkenalkan langkah berikutnya dalam membuat Mesin Penerjemah yang lebih baik menggunakan metode Neural Machine Translation.Cara Neural Machine Translation menerjemahkan seluruh kalimat dalam satu waktu, bukan hanya memenggal sepotong demi sepotong. Menggunakan konteks yang lebih luas untuk membantu mencari tahu terjemahan yang paling relevan, yang kemudian menata kembali dan menyesuaikan untuk menjadi lebih seperti layaknya berbicara dengan manusia menggunakan tata bahasa yang benar.Program aplikasi ini dibuat menggunakan Bahasa pemrograman C# ditambah pustaka AJAX API Google untuk menerjemahkan teks dan mengambil terjemahan dengan mengurai konten JSON.Dari hasil penelitian didapat sebuah terjemahan yang jauh lebih halus dan mudah dibaca, dan ini semua mungkin karena sistem pembelajaran end-to-end yang dibangun di atas Neural Machine Translation yang pada dasarnya berarti bahwa sistem belajar dari waktu ke waktu untuk membuat lebih baik, terjemahan yang lebih alami.

Download Full-text

Measuring the Extent of the Synonym Problem in Full-Text Searching

Evidence Based Library and Information Practice ◽

10.18438/b8mc85 ◽

2008 ◽

Vol 3 (4) ◽

pp. 18 ◽

Cited By ~ 2

Author(s):

Jeffrey Beall ◽

Karen Kafadar

Keyword(s):

Web Sites ◽

Full Text ◽

The Other ◽

Web Pages ◽

Retrieval Systems ◽

Common Term ◽

Information Retrieval Systems ◽

Search Field ◽

Text Searching ◽

Very High

Objective – This article measures the extent of the synonym problem in full-text searching. The synonym problem occurs when a search misses documents because the search was based on a synonym and not on a more familiar term. Methods – We considered a sample of 90 single word synonym pairs and searched for each word in the pair, both singly and jointly, in the Yahoo! database. We determined the number of web sites that were missed when only one but not the other term was included in the search field. Results – Depending upon how common the usage is of the synonym, the percentage of missed web sites can vary from almost 0% to almost 100%. When the search uses a very uncommon synonym ("diaconate"), a very high percentage of web pages can be missed (95%), versus the search using the more common term (only 9% are missed when searching web pages for the term "deacons"). If both terms in a word pair were nearly equal in usage ("cooks" and "chefs"), then a search on one term but not the other missed almost half the relevant web pages. Conclusion – Our results indicate great value for search engines to incorporate automatic synonym searching not only for user-specified terms but also for high usage synonyms. Moreover, the results demonstrate the value of information retrieval systems that use controlled vocabularies and cross references to generate search results.

Download Full-text

Cognitive Aspects Of Electronic Text Processing, Vol. LVIII In The Series Advances In Discourse Processes[Book Review]

IEEE Transactions on Professional Communication ◽

10.1109/tpc.1998.661638 ◽

1998 ◽

Vol 41 (1) ◽

pp. 77-79

Author(s):

J.C. Redish

Keyword(s):

Text Processing ◽

Electronic Text ◽

Discourse Processes

Download Full-text

Networds: The impact of electronic text-processing utilities on writing

Journal of Social and Evolutionary Systems ◽

10.1016/s1061-7361(05)80010-4 ◽

1994 ◽

Vol 17 (2) ◽

pp. 127-166

Author(s):

William Dubie

Keyword(s):

Text Processing ◽

Electronic Text ◽

The Impact

Download Full-text

Question Master: An Evaluation of a Web-Based Decision-Support System for Use in Reference Environments

College & Research Libraries ◽

10.5860/crl.59.1.29 ◽

1998 ◽

Vol 59 (1) ◽

pp. 29-37 ◽

Cited By ~ 3

Author(s):

John V. Richardson

Keyword(s):

Decision Support ◽

Decision Support System ◽

Support System ◽

User Satisfaction ◽

Web Pages ◽

End User ◽

Accuracy Rate ◽

Web Based ◽

Percent Accuracy ◽

Better Than

Designed for librarians, Question Master (QM) (at http://purl.org/net/Question_Master) is a decision-support system automating some of the more routine, fact-type reference questions encountered in libraries. A series of Web pages guides librarians through a set of clarifying questions before making recommendations of an appropriate electronic or relevant print resource from WorldCat, the OCLC Online Union Catalog. The goal is to improve the accuracy of reference transactions, which in turn should lead to increased end-user satisfaction. Based on usability studies of QM’s biographical module, this study found that although the system already was easy to use, its usability could be improved in several ways. Its ability to answer questions was 100 percent, with an accuracy rate of 66 percent compared to Weil’s 64 percent accuracy. In addition, QM accuracy was substantially better than most reported studies of real reference environments and certainly better than the Internet results of 20 percent for HotBot and 30 percent for AltaVista.

Download Full-text

Handling conjunctions in named entities

Lingvisticae Investigationes ◽

10.1075/li.30.1.05maz ◽

2007 ◽

Vol 30 (1) ◽

pp. 49-68

Author(s):

Pawel Mazur ◽

Robert Dale

Keyword(s):

Domain Knowledge ◽

Text Processing ◽

Supervised Machine Learning ◽

Data Set ◽

Named Entities ◽

Named Entity ◽

Legal Documents ◽

Machine Learning Approach ◽

Semantic Resources ◽

Very High

Although the literature contains reports of very high accuracy figures for the recognition of named entities in text, there are still some named entity phenomena that remain problematic for existing text processing systems. One of these is the ambiguity of conjunctions in candidate named entity strings, an all-too-prevalent problem in corporate and legal documents. In this paper, we distinguish four uses of the conjunction in these strings, and explore the use of a supervised machine learning approach to conjunction disambiguation trained on a very limited set of ‘name internal’ features that avoids the need for expensive lexical or semantic resources. We achieve 84% correctly classified examples using k-fold evaluation on a data set of 600 instances. We argue that further improvements are likely to require the use of wider domain knowledge and name external features.

Download Full-text

The Design of Legal Advisory Office Website of Samsul-Fatkhul-Serangkai (SFS) by Using Laravel Framework

International Journal of Computer and Information Technology(2279-0764) ◽

10.24203/ijcit.v9i6.50 ◽

2020 ◽

Vol 9 (6) ◽

Author(s):

Dewi Agushinta R. ◽

Sugiharti Binastuti ◽

Anita ◽

Prasastia Aryani Saliha

Keyword(s):

Black Box ◽

Web Pages ◽

Data Loss ◽

Text Editor ◽

Application Development ◽

Advisory Service ◽

Testing Method ◽

Rapid Application Development ◽

Black Box Testing ◽

Cascading Style Sheets

The SFS Law Office is one of the legal advisory offices located in Depok, West Java. As an advisory service, the data processing and management in the SFS Law office served manually so that errors and data loss often occurred. The purpose of this research is to build a website for this legal advisory office so that it can be a tool for the client to schedule meetings with lawyers. It can facilitate lawyers in managing their files and schedules so that it can help clients personally. This website is using Hypertext Preprocessor (PHP), JavaScript, and Cascading Style Sheets (CSS) with Laravel framework, Sublime Text as a text editor, and MySQL database. The method used in the research is Rapid Application Development (RAD). This website has three types of users, namely admin, member, and visitor. The black-box testing method results showed that all options and the web pages able to function as expected. We can access The SFS Law Office website at URL http://www.kantorkonsultanhukumsfs.com available in Bahasa. The assessment calculations from 37 respondents gave the percentage result of 86.76 %, which means that this website is in the "very good" category.

Download Full-text

An Investigation into Third Level Module Similarities and Link Analysis

Proceedings of the 3rd International Conference on Higher Education Advances ◽

10.4995/head17.2017.5528 ◽

2017 ◽

Author(s):

Michael Keane ◽

Markus Hofmann

Keyword(s):

Quality Assurance ◽

Academic Staff ◽

Visual Exploration ◽

Web Pages ◽

Practical Application ◽

Document Similarity ◽

Institute Of Technology ◽

Clear Differentiation ◽

National Framework ◽

Very High

The focus of this paper is on the extraction of knowledge from data contained within the content of web pages in relation to module descriptors as published on http://courses.itb.ie delivered within the School of Business in the Institute of Technology Blanchardstown. We show an automated similarity analysis highlighting visual exploration options. Resulting from this analysis are three issues of note. Firstly, modules although coded as being different and unique to their particular programme of study indicated substantial similarity. Secondly, substantial content overlap with a lack of clear differentiation between sequential modules was identified.. Thirdly, the document similarity statistics point to the existence of modules having very high similarity scores delivered across different years across different National Framework of Qualification (NFQ) levels of different programmes. These issues can be raised within the management structure of the School of Business and disseminated to the relevant programme boards for further consideration and action. Working within a climate of constrained resources with limited numbers of academic staff and lecture theatres the potential savings outside of the obvious quality assurance benefits illustrate a practical application of how text mining can be used to elicit new knowledge and provide business intelligence to support the quality assurance and decision making process within a higher educational environment.

Download Full-text

Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00231 ◽

2013 ◽

Vol 1 ◽

pp. 327-340 ◽

Cited By ~ 4

Author(s):

Arianna Bisazza ◽

Marcello Federico

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Search Space ◽

Input Word ◽

Binary Classifier ◽

Crucial Issue ◽

Trade Off ◽

Translation Quality ◽

Very High

Defining the reordering search space is a crucial issue in phrase-based SMT between distant languages. In fact, the optimal trade-off between accuracy and complexity of decoding is nowadays reached by harshly limiting the input permutation space. We propose a method to dynamically shape such space and, thus, capture long-range word movements without hurting translation quality nor decoding time. The space defined by loose reordering constraints is dynamically pruned through a binary classifier that predicts whether a given input word should be translated right after another. The integration of this model into a phrase-based decoder improves a strong Arabic-English baseline already including state-of-the-art early distortion cost (Moore and Quirk, 2007) and hierarchical phrase orientation models (Galley and Manning, 2008). Significant improvements in the reordering of verbs are achieved by a system that is notably faster than the baseline, while bleu and meteor remain stable, or even increase, at a very high distortion limit.

Download Full-text