Pemodelan Topik dengan LDA untuk Temu Kembali Informasi dalam Rekomendasi Tugas Akhir

Undergraduate thesis as the final project, or in Indonesian called as Tugas Akhir, for each undergraduate student is a pre-requisite before student graduation and the successfulness in finishing the project becomes as one of learning outcomes among others. Determining the topic of the final project according to the ability of students is an important thing. One strategy to decide the topic is reading some literatures but it takes up more time. There is a need for a recommendation system to help students in determining the topic according to their abilities or subject understanding which is based on their academic transcripts. This study focused on a system for final project topic recommendations based on evaluating competencies in previous academic transcripts of graduated students. Collected data of previous final projects, namely titles and abstracts weighted by term occurences of TF-IDF (term frequency–inverse document frequency) and grouped by using K-Means Clustering. From each cluster result, we prepared candidates for recommended topics using Latent Dirichlet Allocation (LDA) with Gibbs Sampling that focusing on the word distribution of each topic in the cluster. Some evaluations were performed to evaluate the optimal cluster number, topic number and then made more thorough exploration on the recommendation results. Our experiments showed that the proposed system could recommend final project topic ideas based on student competence represented in their academic transcripts.

Download Full-text

Network Course Recommendation System Based on Double-Layer Attention Mechanism

Scientific Programming ◽

10.1155/2021/7613511 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Qianyao Zhu

Keyword(s):

Double Layer ◽

Recommendation System ◽

Attention Mechanism ◽

Inverse Document Frequency ◽

Network Teaching ◽

Teaching Platform ◽

Document Frequency ◽

Different Types ◽

Recommendation Accuracy ◽

Selection Of

In view of the lack of accurate recommendation and selection of courses on the network teaching platform in the new form of higher education, a network course recommendation system based on the double-layer attention mechanism is proposed. First of all, the collected data are preprocessed, while the data of students and course information are normalized and classified. Then, the dual attention mechanism is introduced into the parallel neural network recommendation model so as to improve the model’s ability to mine important features. TF-IDF (term frequency-inverse document frequency) based on the student score and course category is improved. The recommendation results are classified according to the weight of course categories, so as to construct different types of course groups and complete the recommendation. The experimental results show that the proposed algorithm can effectively improve the model recommendation accuracy compared with other algorithms.

Download Full-text

Sistem Perekomendasi Dosen Pembimbing berdasarkan Relevansi Topik Tugas Akhir menggunakan Metode Okapi BM25

Repositor ◽

10.22219/repositor.v2i9.672 ◽

2020 ◽

Vol 2 (9) ◽

Author(s):

Meilina Agustina ◽

Yufiz Azhar ◽

Nur Hayatin

Keyword(s):

Recommendation System ◽

School Teacher ◽

Primary School Teacher ◽

Education Department ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Teacher Education Department ◽

Primary School Teacher Education ◽

The University

AbstrakSistem rekomendasi adalah sebuah perangkat lunak untuk memberikan rekomendasi kepada pengguna mengenai produk yang dapat digunakannya. Masalah administrasi di kantor jurusan Pendidikan Guru Sekolah Dasar Universitas Muhammadiyah Malang merupakan salah satu permasalahan yang selalu dihadapi oleh para staf TU dan part timer. Penggunaan sistem manual yang masih berjalan saat ini dinilai kurang efektif terhadap waktu, tempat, dan tenaga sehingga diperlukan adanya bantuan berupa sistem informasi. Pada perancangan sistem informasi ini akan menggunakan metode Okapi BM25 dimana metode ini merupakan fungsi peringkat yang digunakan oleh mesin pencari (search engine) untuk peringkat dokumen pencocokan sesuai relevansinya dengan permintaan pencarian yaitu berupa topik tugas akhir. BM25 memiliki fungsi yang sesuai dengan 3 prinsip pembobotan yang baik, yaitu memiliki inverse document frequency (idf), term frequency (tf), dan memiliki fungsi normalisasi dari panjang dokumen (document length normalization).Abstract The recommendation system is a software to provide recommendations to users about the products they can use. The administrative problem in the office of the Primary School Teacher Education department at the University of Muhammadiyah Malang is one of the problems faced by the Administration staff and part timers. The use of manual systems that are still running at this time is considered to be less effective against time, place, and energy, so that assistance in the form of information systems is needed. In designing this information system will use the Okapi BM25 method where this method is a ranking function used by search engines for matching document rankings according to their relevance to search queries, namely in the form of final assignment topics. BM25 has functions that are in accordance with the 3 principles of good weighting, which has an inverse document frequency (idf), term frequency (tf), and has a document length normalization function.

Download Full-text

Design and Implementation of a Big Data Evaluator Recommendation System Using Deep Learning Methodology

Applied Sciences ◽

10.3390/app10228000 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8000

Author(s):

Sukil Cha ◽

Mun Y. Yi ◽

Sekyoung Youm

Keyword(s):

Big Data ◽

Deep Learning ◽

Full Text ◽

Recommendation System ◽

Selection Process ◽

Korean Literature ◽

Inverse Document Frequency ◽

Design And Implementation ◽

Research Fields ◽

Document Frequency

As the number of researchers in South Korea has grown, there is increasing dissatisfaction with the selection process for national research and development (R&D) projects among unsuccessful applicants. In this study, we designed a system that can recommend the best possible R&D evaluators using big data that are collected from related systems, refined, and analyzed. Our big data recommendation system compares keywords extracted from applications and from the full-text of the achievements of the evaluator candidates. Weights for different keywords are scored using the term frequency–inverse document frequency algorithm. Comparing the keywords extracted from the achievement of the evaluator candidates’, a project comparison module searches, scores, and ranks these achievements similarly to the project applications. The similarity scoring module calculates the overall similarity scores for different candidates based on the project comparison module scores. To assess the performance of the evaluator candidate recommendation system, 61 applications in three Review Board (RB) research fields (system fusion, organic biochemistry, and Korean literature) were recommended as the evaluator candidates by the recommendation system in the same manner as the RB’s recommendation. Our tests reveal that the evaluator candidates recommended by the Korean Review Board and those recommended by our system for 61 applications in different areas, were the same. However, our system performed the recommendation in less time with no bias and fewer personnel. The system requiresrevisions to reflect qualitative indicators, such as journal reputation, before it can entirely replace the current evaluator recommendation process.

Download Full-text

Hybrid News Recommendation System using TF-IDF and Similarity Weight Index

International Journal of Soft Computing and Engineering - Regular Issue ◽

10.35940/ijsce.c3471.1110320 ◽

2020 ◽

Vol 10 (3) ◽

pp. 5-9

Keyword(s):

Recommendation System ◽

Comparison Study ◽

Important Place ◽

Inverse Document Frequency ◽

Fast Running ◽

Java Technology ◽

Document Frequency ◽

Current Affairs ◽

Digital News ◽

News Recommendation

As the usage of internet is increasing, we are getting more dependent on it in our daily life. The Internet plays an essential role to simplify our tight schedules. In such tough lives, it is very important to stay aware of current affairs. Now for different people coming from different backgrounds and professions, the preferences are different too. Here come Data mining techniques in the picture, which gives us “Recommender system” as the output, capable of delivering more relevant and worthy outcomes. Newspapers are the basic obligation asked by almost every person to stay updated and aware of the world. But as we observe that nowadays, various solutions are been developed to convert paper news system to digital news and raise the bar of the quick news. And that’s how News Recommender systems are have made an important place in our fast running lives.This research paper has investigated the News Recommendation solution right from its core, including the importance, performance, and improvement suggestions. This paper talks about enhancing the performance of states solution by using modified Term Frequency-Inverse Document Frequency (TF-IDF) algorithms. Proposed solution advocates the usage of JAVA technology which reflects fruitful results in the final graphs of accuracy, precision, and F-score. Here, BBC dataset has been used for comparison study purposes.

Download Full-text

IDENTIFYING DISCIPLINE-SPECIFIC VOCABULARY ON ENGINEERING EXAMS

Proceedings of the Canadian Engineering Education Association (CEEA) ◽

10.24908/pceea.v0i0.4684 ◽

2012 ◽

Author(s):

Chirag Variawa ◽

Susan Mccahan

Keyword(s):

Academic Performance ◽

Learning Outcomes ◽

Assessment Tools ◽

Test Cases ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

The goal of engineering examinations is to measure the academic performance of our students with respect to learning outcomes. On exams, the students are often asked contextualized questions using vocabulary that might be unfamiliar. A study is being conducted that investigates the accessibility of language on engineering exams with the goal of making language clearer for all students.Specifically, a Term-Frequency Inverse-Document Frequency (TF-IDF) algorithm is used to characterize words on a given engineering exam. By comparing data across different test cases, the TFIDF algorithm appears to accurately distinguish discipline-specific vocabulary from non-disciplinespecific vocabulary. These results inform an approach that could maintain the integrity of engineering exams and create more accessible assessment tools.

Download Full-text

Recommendation System Using Weighted TF-IDF and Naive Bayes Classifiers on RSS Contents

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2010.p0631 ◽

2010 ◽

Vol 14 (6) ◽

pp. 631-637

Author(s):

Incheon Paik ◽

◽

Hiroshi Mizugai ◽

Keyword(s):

Machine Learning ◽

Recommendation System ◽

Naive Bayes ◽

Naïve Bayes ◽

Bayes Classifier ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Rss Feeds ◽

Enormous Quantity

A recent increase in RDF Site Summary (RSS) feeds, used for news updates and blogs, has been caused by the widespread use of blogs. This means that much effort is now needed to search the contents of RSS feeds because of this enormous quantity of material. To solve this problem, recommendation systems enable users to obtain relevant RSS contents easily and quickly. In previous research, an RSS recommendation system was proposed that used the similarity between the Term Frequency (TF) of the RSS contents and the TF derived from the contents of the user’s browsing history for RSS feeds. In this paper, we use Term Frequency-Inverse Document Frequency (TF-IDF) calculations to propose a Weighted TF-IDF method, which focuses on the terms folded by the title tags in RSS contents as characteristic terms. In addition, we propose a new recommendation method, which uses a Naive Bayes classifier in a Machine Learning-based approach. Via experiments, we compare the proposed methods and the existing method in a prototype recommendation system, and we show that the proposed methods outperform the existing method with respect to several evaluation measurements.

Download Full-text

A Big Data Text Coverless Information Hiding Based on Topic Distribution and TF-IDF

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.20210701.oa4 ◽

2021 ◽

Vol 13 (4) ◽

pp. 40-56

Author(s):

Jiaohua Qin ◽

Zhuo Zhou ◽

Yun Tan ◽

Xuyu Xiang ◽

Zhibin He

Keyword(s):

Big Data ◽

Information Hiding ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Hiding Capacity ◽

Inverse Document Frequency ◽

Secret Information ◽

Document Frequency ◽

Topic Distribution ◽

Coverless Information Hiding

Coverless information hiding has become a hot topic in recent years. The existing steganalysis tools are invalidated due to coverless steganography without any modification to the carrier. However, for the text coverless has relatively low hiding capacity, this paper proposed a big data text coverless information hiding method based on LDA (latent Dirichlet allocation) topic distribution and keyword TF-IDF (term frequency-inverse document frequency). Firstly, the sender and receiver build codebook, including word segmentation, word frequency and TF-IDF features, LDA topic model clustering. The sender then shreds the secret information, converts it into keyword ID through the keywords-index table, and searches the text containing the secret information keywords. Secondly, the searched text is taken as the index tag according to the topic distribution and TF-IDF features. At the same time, random numbers are introduced to control the keyword order of secret information.

Download Full-text

Large expert-curated database for benchmarking document similarity detection in biomedical literature search

Database ◽

10.1093/database/baz085 ◽

2019 ◽

Vol 2019 ◽

Author(s):

Peter Brown ◽

Aik-Choon Tan ◽

Mohamed A El-Esawi ◽

Thomas Liehr ◽

Oliver Blanck ◽

...

Keyword(s):

Literature Search ◽

Relevant Literature ◽

Biomedical Literature ◽

Medical Subject Headings ◽

Document Similarity ◽

Inverse Document Frequency ◽

Research Fields ◽

Experience Levels ◽

Document Frequency ◽

Systematic Biases

Abstract Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.

Download Full-text

Poisson mixtures

Natural Language Engineering ◽

10.1017/s1351324900000139 ◽

1995 ◽

Vol 1 (2) ◽

pp. 163-190 ◽

Cited By ~ 146

Author(s):

Kenneth W. Church ◽

William A. Gale

Keyword(s):

Negative Binomial ◽

Probability Distributions ◽

Hidden Variables ◽

Heterogeneous Structure ◽

Text Compression ◽

Inverse Document Frequency ◽

Poisson Mixtures ◽

Document Frequency ◽

Wide Range ◽

Better Than

AbstractShannon (1948) showed that a wide range of practical problems can be reduced to the problem of estimating probability distributions of words and ngrams in text. It has become standard practice in text compression, speech recognition, information retrieval and many other applications of Shannon's theory to introduce a “bag-of-words” assumption. But obviously, word rates vary from genre to genre, author to author, topic to topic, document to document, section to section, and paragraph to paragraph. The proposed Poisson mixture captures much of this heterogeneous structure by allowing the Poisson parameter θ to vary over documents subject to a density function φ. φ is intended to capture dependencies on hidden variables such genre, author, topic, etc. (The Negative Binomial is a well-known special case where φ is a Г distribution.) Poisson mixtures fit the data better than standard Poissons, producing more accurate estimates of the variance over documents (σ2), entropy (H), inverse document frequency (IDF), and adaptation (Pr(x ≥ 2/x ≥ 1)).

Download Full-text

Inverse document frequency-based sensitivity scoring for privacy analysis

Signal Image and Video Processing ◽

10.1007/s11760-021-02013-1 ◽

2021 ◽

Author(s):

Onder Coban ◽

Ali Inan ◽

Selma Ayse Ozel

Keyword(s):

Inverse Document Frequency ◽

Document Frequency ◽

Privacy Analysis

Download Full-text