text ranking Latest Research Papers

Buku Pedoman Akademik FILKOM Universitas Brawijaya merupakan suatu kebutuhan informasi akademik yang cukup penting, dan juga buku penunjang pembelajaran seperti Free e-Book bagi para mahasiswa. Untuk memperoleh informasi yang relevan terhadap query yang diberikan seringkali belum sesuai dengan kebutuhan pencarian pengguna. Pengguna harus menguasai secara keseluruhan untuk mengetahui dokumen mana yang paling sesuai, dan proses ini akan memakan waktu yang banyak. Sistem ini mampu memberikan rekomendasi dokumen sesuai dengan hasil perhitungan pemeringkatan teks. Proses pemeringkatan teks dapat diselesaikan dengan algoritma PageRank, di mana dokumen yang memiliki bobot pemeringkatan terkecil, memiliki kata terbanyak pada dokumen tersebut. Algoritma ini telah dibuktikan mampu memeberikan feedback dokumen yang relevan melalui dua tahap pengujian. Evaluasi yang dilakukan terhadap dua buah pengujian menghasilkan rata-rata nilai recall tertinggi yaitu 80.6% pada data ke-1, dan data ke-2 didapatkan korelasi terbaik antara precision, recall dan f-measure sebesar 0,98, 0,99, 0,99. AbstractThe Brawijaya University FILKOM Academic Handbook is an important academic information need, as well as learning support books such as Free e-Books for students. To obtain information that is relevant to the query given is often not in accordance with the wishes of the user. Users must master the whole to find out which documents are most suitable, which is where the process will take a lot of time. This system is able to provide document recommendations in accordance with the results of the text ranking calculation. The process of ranking the text can be solved by the PageRank algorithm, where documents that have the smallest ranking weight, have the most words in the document. This algorithm has been proven to be able to provide feedback on relevant documents through two stages of testing. he evaluation conducted on the two tests resulted in the highest average recall value of 80.6% on the 1st dataset, and 2nd dataset the best correlation was obtained between precision, recall and f-measure of 0.98, 0.99, 0.99.

Download Full-text

Word-embedding Based Text Vectorization Using Clustering

Modeling and Analysis of Information Systems ◽

10.18255/1818-1015-2021-3-292-311 ◽

2021 ◽

Vol 28 (3) ◽

pp. 292-311

Author(s):

Vitaly I. Yuferev ◽

Nikolai A. Razin

Keyword(s):

Language Processing ◽

Word Embedding ◽

Vector Representation ◽

Optimal Parameters ◽

Ranking Problem ◽

Series Of Experiments ◽

Text Ranking ◽

Vector Representations ◽

Similar Elements ◽

Entire Text

It is known that in the tasks of natural language processing, the representation of texts by vectors of fixed length using word-embedding models makes sense in cases where the vectorized texts are short.The longer the texts being compared, the worse the approach works. This situation is due to the fact that when using word-embedding models, information is lost when converting the vector representations of the words that make up the text into a vector representation of the entire text, which usually has the same dimension as the vector of a single word.This paper proposes an alternative way for using pre-trained word-embedding models for text vectorization. The essence of the proposed method consists in combining semantically similar elements of the dictionary of the existing text corpus by clustering their (dictionary elements) embeddings, as a result of which a new dictionary is formed with a size smaller than the original one, each element of which corresponds to one cluster. The original corpus of texts is reformulated in terms of this new dictionary, after which vectorization is performed on the reformulated texts using one of the dictionary approaches (TF-IDF was used in the work). The resulting vector representation of the text can be additionally enriched using the vectors of words of the original dictionary obtained by decreasing the dimension of their embeddings for each cluster.A series of experiments to determine the optimal parameters of the method is described in the paper, the proposed approach is compared with other methods of text vectorization for the text ranking problem – averaging word embeddings with TF-IDF weighting and without weighting, as well as vectorization based on TF-IDF coefficients.

Download Full-text

Pretrained Transformers for Text Ranking: BERT and Beyond

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3462812 ◽

2021 ◽

Author(s):

Andrew Yates ◽

Rodrigo Nogueira ◽

Jimmy Lin

Keyword(s):

Text Ranking

Download Full-text

A novel scheme of domain transfer in document-level cross-domain sentiment classification

Journal of Information Science ◽

10.1177/01655515211012329 ◽

2021 ◽

pp. 016555152110123

Author(s):

Yueting Lei ◽

Yanting Li

Keyword(s):

Language Model ◽

Sentiment Classification ◽

Language Models ◽

Product Reviews ◽

Data Set ◽

Emotional Words ◽

Cross Domain ◽

Text Ranking ◽

Domain Transfer

The sentiment classification aims to learn sentiment features from the annotated corpus and automatically predict the sentiment polarity of new sentiment text. However, people have different ways of expressing feelings in different domains. Thus, there are important differences in the characteristics of sentimental distribution across different domains. At the same time, in certain specific domains, due to the high cost of corpus collection, there is no annotated corpus available for the classification of sentiment. Therefore, it is necessary to leverage or reuse existing annotated corpus for training. In this article, we proposed a new algorithm for extracting central sentiment sentences in product reviews, and improved the pre-trained language model Bidirectional Encoder Representations from Transformers (BERT) to achieve the domain transfer for cross-domain sentiment classification. We used various pre-training language models to prove the effectiveness of the newly proposed joint algorithm for text-ranking and emotional words extraction, and utilised Amazon product reviews data set to demonstrate the effectiveness of our proposed domain-transfer framework. The experimental results of 12 different cross-domain pairs showed that the new cross-domain classification method was significantly better than several popular cross-domain sentiment classification methods.

Download Full-text

Domain Model Discovery from Textbooks for Computer Programming Intelligent Tutors

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128561 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Rabin Banjade ◽

Priti Oli ◽

Lasang Jimba Tamang ◽

Jeevan Chapagain ◽

Vasile Rus

Keyword(s):

Intelligent Tutoring ◽

Intelligent Tutoring System ◽

Domain Model ◽

Tutoring Systems ◽

Novel Approach ◽

Ranking Strategy ◽

Domain Models ◽

Text Ranking ◽

Automated Discovery ◽

Key Phrases

We present a novel approach to intro-to-programming domain model discovery from textbooks using an over-generation and ranking strategy. We first extract candidate key phrases from each chapter in a Computer Science textbook focusing on intro-to-programming and then rank those concepts according to a number of metrics such as the standard tf-idf weight used in information retrieval and metrics produced by other text ranking algorithms. Specifically, we conduct our work in the context of developing an intelligent tutoring system for source code comprehension for which a specification of the key programming concepts is needed - the system monitors students' performance on those concepts and scaffolds their learning process until they show mastery of the concepts. Our experiments with programming concept instruction from Java textbooks indicate that the statistical methods such as KP Miner method are quite competitive compared to other more sophisticated methods. Automated discovery of domain models will lead to more scalable Intelligent Tutoring Systems (ITSs) across topics and domains, which is a major challenge that needs to be addressed if ITSs are to be widely used by millions of learners across many domains.

Download Full-text

Bug Report Summarization using Believability Score and Text Ranking

2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) ◽

10.1109/icaiic51459.2021.9415267 ◽

2021 ◽

Author(s):

Youngji Koh ◽

Sungwon Kang ◽

Seonah Lee

Keyword(s):

Bug Report ◽

Text Ranking

Download Full-text

Random Walks in Hypergraph

International Journal of Education and Information Technologies ◽

10.46300/9109.2021.15.2 ◽

2021 ◽

Vol 15 ◽

pp. 13-20

Author(s):

Abdelghani Bellaachia ◽

Mohammed Al-Dhelaan

Keyword(s):

Random Walk ◽

Random Walks ◽

Transition Probabilities ◽

Graph Representation ◽

Real World Data ◽

Data Set ◽

Random Walks On Graphs ◽

Text Ranking ◽

Relationship Structure ◽

The Relationship

Random walks on graphs have been extensively used for a variety of graph-based problems such as ranking vertices, predicting links, recommendations, and clustering. However, many complex problems mandate a high-order graph representation to accurately capture the relationship structure inherent in them. Hypergraphs are particularly useful for such models due to the density of information stored in their structure. In this paper, we propose a novel extension to defining random walks on hypergraphs. Our proposed approach combines the weights of destination vertices and hyperedges in a probabilistic manner to accurately capture transition probabilities. We study and analyze our generalized form of random walks suitable for the structure of hypergraphs. We show the effectiveness of our model by conducting a text ranking experiment on a real world data set with a 9% to 33% improvement in precision and a range of 7% to 50% improvement in Bpref over other random walk approaches.

Download Full-text