EMU – A European Multilingual Text Prediction Software

Web enables to retrieve concise information about specific entities including people, organizations, movies and their features. Additionally, large amount of Web resources generally lies on a unstructured form and it tackles to find critical information for specific entities. Text analysis approaches such as Named Entity Recognizer and Entity Linking aim to identify entities and link them to relevant entities in the given knowledge base. To evaluate these approaches, there are a vast amount of general purpose benchmark datasets. However, it is difficult to evaluate domain-specific approaches due to lack of evaluation datasets for specific domains. This study presents WeDGeM that is a multilingual evaluation set generator for specific domains exploiting Wikipedia category pages and DBpedia hierarchy. Also, Wikipedia disambiguation pages are used to adjust the ambiguity level of the generated texts. Based on this generated test data, a use case for well-known Entity Linking systems supporting Turkish texts are evaluated in the movie domain.

Download Full-text

Effective Term Weighting in ALT Text Prediction for Web Image Retrieval

Web Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-20291-9_24 ◽

2011 ◽

pp. 237-244 ◽

Cited By ~ 2

Author(s):

Vundavalli Srinivasarao ◽

Prasad Pingali ◽

Vasudeva Varma

Keyword(s):

Image Retrieval ◽

Term Weighting ◽

Text Prediction ◽

Web Image Retrieval

Download Full-text

Multilingual Web Content Mining

Intelligent Agents for Data Mining and Information Retrieval ◽

10.4018/978-1-59140-194-0.ch006 ◽

2004 ◽

pp. 88-100

Author(s):

Rowena Chau ◽

Chung-Hsing Yeh

Keyword(s):

Information Filtering ◽

User Profile ◽

Linguistic Knowledge ◽

Web Content ◽

Self Organizing Maps ◽

Web Documents ◽

Web Content Mining ◽

Concept Space ◽

Content Mining ◽

Multilingual Text

This chapter presents a novel user-oriented, concept-based approach to multilingual web content mining using self-organizing maps. The multilingual linguistic knowledge required for multilingual web content mining is made available by encoding all multilingual concept-term relationships using a multilingual concept space. With this linguistic knowledge base, a concept-based multilingual text classifier is developed. It reveals the conceptual content of multilingual web documents and forms concept categories of multilingual web documents on a concept-based browsing interface. To personalize multilingual web content mining, a concept-based user profile is generated from a user’s bookmark file to highlight the user’s topics of information interest on the browsing interface. As such, both explorative browsing and user-oriented, concept-focused information filtering in multilingual web are facilitated.

Download Full-text

Multilingual Text Categorization of Indo-Aryan Languages

2019 International Conference on Electrical, Computer and Communication Engineering (ECCE) ◽

10.1109/ecace.2019.8679445 ◽

2019 ◽

Author(s):

Nitesh Khadka ◽

Mir Ragib Ishraq ◽

Asif Mohammed Samir ◽

Mohammad Shahidur Rahman

Keyword(s):

Text Categorization ◽

Multilingual Text

Download Full-text

First Level Text Prediction using Data Mining and Letter Matching in IEEE 802.11 Mobile Devices

Innovations and Advances in Computer Sciences and Engineering ◽

10.1007/978-90-481-3658-2_55 ◽

2010 ◽

pp. 319-324

Author(s):

B. Issac

Keyword(s):

Data Mining ◽

Mobile Devices ◽

Ieee 802.11 ◽

Text Prediction ◽

Using Data

Download Full-text

Multilingual Text Classification Using Ontologies

Lecture Notes in Computer Science - Advances in Information Retrieval ◽

10.1007/978-3-540-71496-5_49 ◽

2007 ◽

pp. 541-548 ◽

Cited By ~ 11

Author(s):

Gerard de Melo ◽

Stefan Siersdorfer

Keyword(s):

Text Classification ◽

Multilingual Text

Download Full-text

Reply Using Past Replies—A Deep Learning-Based E-Mail Client

Electronics ◽

10.3390/electronics9091353 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1353

Author(s):

Yiwei Feng ◽

M. Asif Naeem ◽

Farhaan Mirza ◽

Ali Tahir

Keyword(s):

Deep Learning ◽

Hybrid Model ◽

Learning Algorithm ◽

Inverse Document Frequency ◽

Help Desk ◽

Deep Learning Algorithm ◽

Document Frequency ◽

Text Prediction ◽

E Mail ◽

Gated Recurrent Unit

Email is the most common and effective source of communication for most enterprises and individuals. In the corporate sector the volume of email received daily is significant while timely reply of each email is important. This generates a huge amount of work for the organisation, in particular for the staff located in the help-desk role. In this paper we present a novel Smart E-mail Management System (SEMS) for handling the issue of E-mail overload. The Term Frequency-Inverse Document Frequency (TF-IDF) model was used for designing a Smart Email Client in previous research. Since TF-IDF does not consider semantics between words, the replies suggested by the model are not very accurate. In this paper we apply Document to Vector (Doc2Vec) and introduce a novel Gated Recurrent Unit Sentence to Vector (GRU-Sent2Vec), which is a hybrid model by combining GRU and Sent2Vec. Both models are more intelligent as compared to TF-IDF. We compare our results from both models with TF-IDF. The Doc2Vec model performs the best on predicting a response for a similar new incoming Email. In our case, since the dataset is too small to require a deep learning algorithm model, the GRU-Sent2Vec hybrid model cannot produce ideal results, whereas in our understanding it is a robust method for long-text prediction.

Download Full-text

Overcoming Language Barriers: Assessing the Potential of Machine Translation and Topic Modeling for the Comparative Analysis of Multilingual Text Corpora

Communication Methods and Measures ◽

10.1080/19312458.2018.1555798 ◽

2018 ◽

Vol 13 (2) ◽

pp. 102-125 ◽

Cited By ~ 4

Author(s):

Ueli Reber

Keyword(s):

Comparative Analysis ◽

Machine Translation ◽

Topic Modeling ◽

Language Barriers ◽

Text Corpora ◽

Multilingual Text

Download Full-text