Personalized scientific and technological literature resources recommendation based on deep learning

To enable a quick and accurate access of targeted scientific and technological literature from massive stocks, here a deep content-based collaborative filtering method, namely DeepCCF, for personalized scientific and technological literature resources recommendation was proposed. By combining content-based filtering (CBF) and neural network-based collaborative filtering (NCF), the approach transforms the problem of scientific and technological literature recommendation into a binary classification task. Firstly, the word2vec is used to train the words embedding of the papers’ titles and abstracts. Secondly, an academic literature topic model is built using term frequency–inverse document frequency (TF-IDF) and word embedding. Thirdly, the search and view history and published papers of researchers are utilized to construct the model that portrays the interests of researchers. Deep neural networks (DNNs) are then used to learn the nonlinear and complicated high-order interaction features between users and papers, and the top k recommendation list is generated by predicting the outputs of the model. The experimental results show that our proposed method can quickly and accurately capture the latent relations between the interests of researchers and the topics of paper, and be able to acquire the researchers’ preferences effectively as well. The proposed method has tremendous implications in personalized academic paper recommendation, to propel technological progress.

Download Full-text

Sistem Rekomendasi Produk Pena Eksklusif Menggunakan Metode Content-Based Filtering dan TF-IDF

JOINTECS (Journal of Information Technology and Computer Science) ◽

10.31328/jointecs.v5i3.1563 ◽

2020 ◽

Vol 5 (3) ◽

pp. 229

Author(s):

Mariani Widia Putri ◽

Achmad Muchayan ◽

Made Kamisutara

Keyword(s):

Information Retrieval ◽

Customer Relationship Management ◽

Relationship Management ◽

Customer Relationship ◽

Brand Awareness ◽

Product Knowledge ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Content Based Filtering

Sistem rekomendasi saat ini sedang menjadi tren. Kebiasaan masyarakat yang saat ini lebih mengandalkan transaksi secara online dengan berbagai alasan pribadi. Sistem rekomendasi menawarkan cara yang lebih mudah dan cepat sehingga pengguna tidak perlu meluangkan waktu terlalu banyak untuk menemukan barang yang diinginkan. Persaingan antar pelaku bisnis pun berubah sehingga harus mengubah pendekatan agar bisa menjangkau calon pelanggan. Oleh karena itu dibutuhkan sebuah sistem yang dapat menunjang hal tersebut. Maka dalam penelitian ini, penulis membangun sistem rekomendasi produk menggunakan metode Content-Based Filtering dan Term Frequency Inverse Document Frequency (TF-IDF) dari model Information Retrieval (IR). Untuk memperoleh hasil yang efisien dan sesuai dengan kebutuhan solusi dalam meningkatkan Customer Relationship Management (CRM). Sistem rekomendasi dibangun dan diterapkan sebagai solusi agar dapat meningkatkan brand awareness pelanggan dan meminimalisir terjadinya gagal transaksi di karenakan kurang nya informasi yang dapat disampaikan secara langsung atau offline. Data yang digunakan terdiri dari 258 kode produk produk yang yang masing-masing memiliki delapan kategori dan 33 kata kunci pembentuk sesuai dengan product knowledge perusahaan. Hasil perhitungan TF-IDF menunjukkan nilai bobot 13,854 saat menampilkan rekomendasi produk terbaik pertama, dan memiliki keakuratan sebesar 96,5% dalam memberikan rekomendasi pena.

Download Full-text

Prospecting the Effect of Topic Modeling in Information Retrieval

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2021070102 ◽

2021 ◽

Vol 17 (3) ◽

pp. 18-34

Author(s):

Aakanksha Sharaff ◽

Jitesh Kumar Dewangan ◽

Dilip Singh Sisodia

Keyword(s):

Information Retrieval ◽

Topic Modeling ◽

Topic Model ◽

Language Model ◽

High Dimensionality ◽

Retrieval Process ◽

Coherence Measure ◽

Retrieval Task ◽

Inverse Document Frequency ◽

Document Frequency

Enormous records and data are gathered every day. Organization of this data is a challenging task. Topic modeling provides a way to categorize these documents, where high dimensionality of the corpus affects the result of topic model, making it important to apply feature selection or information retrieval process for dimensionality reduction. The requirement for efficient topic modeling includes the removal of unrelated words that might lead to specious coexistence of the unrelated words. This paper proposes an efficient framework for the generation of better topic coherence, where term frequency-inverse document frequency (TF-IDF) and parsimonious language model (PLM) are used for the information retrieval task. PLM extracts the important information and expels the general words from the corpus, whereas TF-IDF re-estimates the weightage of each word in the corpus. The work carried out in this paper improved the topic coherence measure to provide a better correlation among the actual topic and the topics generated from PLM.

Download Full-text

The classification of rumour standpoints in online social network based on combinatorial classifiers

Journal of Information Science ◽

10.1177/0165551519828619 ◽

2019 ◽

Vol 46 (2) ◽

pp. 191-204 ◽

Cited By ~ 4

Author(s):

Jing Ma ◽

Yongcong Luo

Keyword(s):

Social Network ◽

Online Social Networks ◽

Positive Impact ◽

Binary Classification ◽

Online Social Network ◽

Adaptive Boosting ◽

Inverse Document Frequency ◽

Term Frequency ◽

Model Combining ◽

Document Frequency

It is a fact that most of the rumours related to hot events or emergencies can be propagated rapidly on the hotbed of online social networks. In order to track the standpoints of the participants of rumour topics to regulate the development of rumour, we propose a multi-features model combining classifiers to classify the rumour standpoints, defined as classifying the standpoints of online social network conversations into one of ‘agree’, ‘disagree’, ‘comment’ or ‘query’ on previous comment about the rumour. Testing the performance of the combinatorial model – decision tree with adaptive boosting classifier and extremely randomised trees with adaptive boosting classifier – on different features, that is, structuring the weight matrix based on combination of term frequency (TF), inverse document frequency (IDF) and term frequency – inverse document frequency (TFIDF) method and constructing the features vector with Word2vec method. The experiments show that the combinatorial classifiers that exploit different combination features in the online social network conversations outperform binary classification; especially, the topology of the social network has a highly positive impact on the classification results. Furthermore, the ‘comment’ and ‘query’ of rumour standpoints have a better classification effect based on the features of different categories.

Download Full-text

A Big Data Text Coverless Information Hiding Based on Topic Distribution and TF-IDF

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.20210701.oa4 ◽

2021 ◽

Vol 13 (4) ◽

pp. 40-56

Author(s):

Jiaohua Qin ◽

Zhuo Zhou ◽

Yun Tan ◽

Xuyu Xiang ◽

Zhibin He

Keyword(s):

Big Data ◽

Information Hiding ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Hiding Capacity ◽

Inverse Document Frequency ◽

Secret Information ◽

Document Frequency ◽

Topic Distribution ◽

Coverless Information Hiding

Coverless information hiding has become a hot topic in recent years. The existing steganalysis tools are invalidated due to coverless steganography without any modification to the carrier. However, for the text coverless has relatively low hiding capacity, this paper proposed a big data text coverless information hiding method based on LDA (latent Dirichlet allocation) topic distribution and keyword TF-IDF (term frequency-inverse document frequency). Firstly, the sender and receiver build codebook, including word segmentation, word frequency and TF-IDF features, LDA topic model clustering. The sender then shreds the secret information, converts it into keyword ID through the keywords-index table, and searches the text containing the secret information keywords. Secondly, the searched text is taken as the index tag according to the topic distribution and TF-IDF features. At the same time, random numbers are introduced to control the keyword order of secret information.

Download Full-text

User opinions driven social recommendation system

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-210050 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-31

Author(s):

Lakshmikanth Paleti ◽

P. Radha Krishna ◽

J.V.R. Murthy

Keyword(s):

Social Networks ◽

Collaborative Filtering ◽

Recommendation System ◽

Recommendation Systems ◽

Social Recommendation ◽

Novel Approach ◽

Tripartite Graphs ◽

Content Based Filtering

Recommendation systems provide reliable and relevant recommendations to users and also enable users’ trust on the website. This is achieved by the opinions derived from reviews, feedbacks and preferences provided by the users when the product is purchased or viewed through social networks. This integrates interactions of social networks with recommendation systems which results in the behavior of users and user’s friends. The techniques used so far for recommendation systems are traditional, based on collaborative filtering and content based filtering. This paper provides a novel approach called User-Opinion-Rating (UOR) for building recommendation systems by taking user generated opinions over social networks as a dimension. Two tripartite graphs namely User-Item-Rating and User-Item-Opinion are constructed based on users’ opinion on items along with their ratings. Proposed approach quantifies the opinions of users and results obtained reveal the feasibility.

Download Full-text

Large expert-curated database for benchmarking document similarity detection in biomedical literature search

Database ◽

10.1093/database/baz085 ◽

2019 ◽

Vol 2019 ◽

Author(s):

Peter Brown ◽

Aik-Choon Tan ◽

Mohamed A El-Esawi ◽

Thomas Liehr ◽

Oliver Blanck ◽

...

Keyword(s):

Literature Search ◽

Relevant Literature ◽

Biomedical Literature ◽

Medical Subject Headings ◽

Document Similarity ◽

Inverse Document Frequency ◽

Research Fields ◽

Experience Levels ◽

Document Frequency ◽

Systematic Biases

Abstract Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.

Download Full-text

Poisson mixtures

Natural Language Engineering ◽

10.1017/s1351324900000139 ◽

1995 ◽

Vol 1 (2) ◽

pp. 163-190 ◽

Cited By ~ 146

Author(s):

Kenneth W. Church ◽

William A. Gale

Keyword(s):

Negative Binomial ◽

Probability Distributions ◽

Hidden Variables ◽

Heterogeneous Structure ◽

Text Compression ◽

Inverse Document Frequency ◽

Poisson Mixtures ◽

Document Frequency ◽

Wide Range ◽

Better Than

AbstractShannon (1948) showed that a wide range of practical problems can be reduced to the problem of estimating probability distributions of words and ngrams in text. It has become standard practice in text compression, speech recognition, information retrieval and many other applications of Shannon's theory to introduce a “bag-of-words” assumption. But obviously, word rates vary from genre to genre, author to author, topic to topic, document to document, section to section, and paragraph to paragraph. The proposed Poisson mixture captures much of this heterogeneous structure by allowing the Poisson parameter θ to vary over documents subject to a density function φ. φ is intended to capture dependencies on hidden variables such genre, author, topic, etc. (The Negative Binomial is a well-known special case where φ is a Г distribution.) Poisson mixtures fit the data better than standard Poissons, producing more accurate estimates of the variance over documents (σ2), entropy (H), inverse document frequency (IDF), and adaptation (Pr(x ≥ 2/x ≥ 1)).

Download Full-text

Inverse document frequency-based sensitivity scoring for privacy analysis

Signal Image and Video Processing ◽

10.1007/s11760-021-02013-1 ◽

2021 ◽

Author(s):

Onder Coban ◽

Ali Inan ◽

Selma Ayse Ozel

Keyword(s):

Inverse Document Frequency ◽

Document Frequency ◽

Privacy Analysis

Download Full-text

Efficient natural language classification algorithm for detecting duplicate unsupervised features

Informatics and Automation - Информатика и автоматизация ◽

10.15622/ia.2021.3.5 ◽

2021 ◽

Vol 20 (3) ◽

pp. 623-653

Author(s):

Saud Altaf ◽

Sofia Iqbal ◽

Muhammad Waseem Soomro

Keyword(s):

Natural Language ◽

Short Term Memory ◽

Short Term ◽

Vocabulary Size ◽

Language Understanding ◽

Inverse Document Frequency ◽

Classification Technique ◽

Document Frequency ◽

Text Features ◽

Long Short Term Memory

This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR. The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.

Download Full-text

A Study on the Pivoted Inverse Document Frequency Weighting Method

Journal of the Korean Society for information Management ◽

10.3743/kosim.2003.20.4.233 ◽

2003 ◽

Vol 20 (4) ◽

pp. 233-248 ◽

Cited By ~ 4

Keyword(s):

Weighting Method ◽

Inverse Document Frequency ◽

Document Frequency ◽

Frequency Weighting

Download Full-text