scholarly journals CLOUD BASED MULTI-LANGUAGE INDEXING USING CROSS LINGUAL INFORMATION RETRIEVAL APPROACHES

2021 ◽  
Vol 9 (1) ◽  
pp. 1283-1293
Author(s):  
Chayapathi A R Et al.

The exponential growth of data sizes created by digital media (video/audio/images), physicalsimulations, scientific instruments and web authoring joins the new growth of interest in cloud computing. The options for distribution and parallelization of information in clouds make the retrieval and storage processes very complicated, especially when faced with real-time data management. The quantity of Web Users getting access to data over Internet is expanding step by step. An enormous measure of data on Internet is accessible in various languages which could be accessed by anyone whenever. The Information Retrieval (IR) manages finding valuable data from a huge assortment of unorganized, organized and semi-organized information. In the present situation, the variety of data and language boundaries are the difficult challenges for communication and social trade over the world. To tackle such obstructions, CLIR, the cross-language information retrieval frameworks, are these days in solid interest. The Query Expansion (Q.E.) is the way toward adding related and important terms to original inquiry to upgrade its indexing ability to improve the significance of recovered files in CLIR. In this exploration work, Q.E. has been investigated for a Hindi-English and Kannada-English CLIR in that Hindi and Kannada queries are utilized to look through English docs. After the interpretation of query, recovered outcomes are positioned making use of OkapiBM25 to organize the most important doc at the top for expanding the significance of recovered docs using QE. We proposed architecture for Hindi-English and Kannada-English CLIR making use of QE. to improve the importance of recovered reports. In the primary investigation, QE. is performed with and without OkapiBM25 ranking. The outcomes show that the pertinence of recovered archives is higher with OKapiBM25 as contrast with the one without positioning. The work docs plainly demonstrate that the presentation of Hindi-English and Kannada-English CLIR framework can be improved altogether with query development using fitting terms located at suitable place and the recovered Snippets can incredibly fill in as the continuous test collection.

2016 ◽  
Vol 68 (4) ◽  
pp. 448-477 ◽  
Author(s):  
Dong Zhou ◽  
Séamus Lawless ◽  
Xuan Wu ◽  
Wenyu Zhao ◽  
Jianxun Liu

Purpose – With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion. Design/methodology/approach – The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods. Findings – Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level. Originality/value – Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.


Author(s):  
V. P. Konovalov ◽  
◽  
P. A. Gulyaev ◽  
A. A. Sorokin ◽  
Y. M. Kuratov ◽  
...  

Multilingual BERT has been shown to generalize well in a zero-shot crosslingual setting. This generalization was measured on POS and NER tasks. We explore the multilingual BERT cross-language transferability on the reading comprehension task. We compare different modes of training of question-answering model for a non-English language using both English and language-specific data. We demonstrate that the model based on multilingual BERT is slightly behind the monolingual BERT-based on Russian data, however, it achieves comparable results with the language-specific variant on Chinese. We also show that training jointly on English data and additional 10,000 monolingual samples allows it to reach the performance comparable to the one trained on monolingual data only.


Author(s):  
Vasudeva Varma ◽  
Aditya Mogadala

In this chapter, the authors start their discussion highlighting the importance of Cross Lingual and Multilingual Information Retrieval and access research areas. They then discuss the distinction between Cross Language Information Retrieval (CLIR), Multilingual Information Retrieval (MLIR), Cross Language Information Access (CLIA), and Multilingual Information Access (MLIA) research areas. In addition, in further sections, issues and challenges in these areas are outlined, and various approaches, including machine learning-based and knowledge-based approaches to address the multilingual information access, are discussed. The authors describe various subsystems of a MLIA system ranging from query processing to output generation by sharing their experience of building a MLIA system and discuss its architecture. Then evaluation aspects of the MLIA and CLIA systems are discussed at the end of this chapter.


2021 ◽  
Vol 11 (5) ◽  
pp. 7598-7604
Author(s):  
H. V. T. Chi ◽  
D. L. Anh ◽  
N. L. Thanh ◽  
D. Dinh

Paraphrase identification is a crucial task in natural language understanding, especially in cross-language information retrieval. Nowadays, Multi-Task Deep Neural Network (MT-DNN) has become a state-of-the-art method that brings outstanding results in paraphrase identification [1]. In this paper, our proposed method based on MT-DNN [2] to detect similarities between English and Vietnamese sentences, is proposed. We changed the shared layers of the original MT-DNN from original the BERT [3] to other pre-trained multi-language models such as M-BERT [3] or XLM-R [4] so that our model could work on cross-language (in our case, English and Vietnamese) information retrieval. We also added some tasks as improvements to gain better results. As a result, we gained 2.3% and 2.5% increase in evaluated accuracy and F1. The proposed method was also implemented on other language pairs such as English – German and English – French. With those implementations, we got a 1.0%/0.7% improvement for English – German and a 0.7%/0.5% increase for English – French.


2019 ◽  
Vol 25 (3) ◽  
pp. 363-384
Author(s):  
Hosein Azarbonyad ◽  
Azadeh Shakery ◽  
Heshaam Faili

AbstractCross-language information retrieval (CLIR), finding information in one language in response to queries expressed in another language, has attracted much attention due to the explosive growth of multilingual information in the World Wide Web. One important issue in CLIR is how to apply monolingual information retrieval (IR) methods in cross-lingual environments. Recently, learning to rank (LTR) approach has been successfully employed in different IR tasks. In this paper, we use LTR for CLIR. In order to adapt monolingual LTR techniques in CLIR and pass the barrier of language difference, we map monolingual IR features to CLIR ones using translation information extracted from different translation resources. The performance of CLIR is highly dependent on the size and quality of available bilingual resources. Effective use of available resources is especially important in low-resource language pairs. In this paper, we further propose an LTR-based method for combining translation resources in CLIR. We have studied the effectiveness of the proposed approach using different translation resources. Our results also show that LTR can be used to successfully combine different translation resources to improve the CLIR performance. In the best scenario, the LTR-based combination method improves the performance of single-resource-based CLIR method by 6% in terms of Mean Average Precision.


2018 ◽  
Vol 2018 (1) ◽  
pp. 73-82
Author(s):  
Julia Genz

Digital media transform social options of access with regard to producers, recipients, and literary works of art themselves. New labels for new roles such as »prosumers « and »wreaders« attest to this. The »blogger« provides another interesting new social figure of literary authorship. Here, some old desiderata of Dadaism appear to find a belated realization. On the one hand, many web 2.0 formats of authorship amplify and widen the freedom of literary productivity while at the same time subjecting such production to a periodic schedule. In comparison to the received practices of authors and recipients many digital-cultural forms of narrating engender innovative metalepses (and also their sublation). Writing in the net for internet-publics enables the deliberate dissolution of the received autobiographical pact with the reader according to which the author’s genuine name authenticates the author’s writing. On the other hand, the digital-cultural potential of dissolving the autobiographical pact stimulates scandals of debunking and unmasking and makes questions of author-identity an issue of permanent contestation. Digital-cultural conditions of communication amplify both: the hideand- seek of authorship as well as the thwarting of this game by recipients who delight in playing detective. In effect, pace Foucault’s and Barthes’ postulates of the death of the author, the personality and biography of the author once again tend to become objects of high intrinsic value


2021 ◽  
Vol 5 (3) ◽  
pp. 63
Author(s):  
Emilia Bazhlekova

An initial-boundary-value problem is considered for the one-dimensional diffusion equation with a general convolutional derivative in time and nonclassical boundary conditions. We are concerned with the inverse source problem of recovery of a space-dependent source term from given final time data. Generalized eigenfunction expansions are used with respect to a biorthogonal pair of bases. Existence, uniqueness and stability estimates in Sobolev spaces are established.


Sign in / Sign up

Export Citation Format

Share Document