scholarly journals Pemodelan Topik dengan LDA untuk Temu Kembali Informasi dalam Rekomendasi Tugas Akhir

2021 ◽  
Vol 5 (3) ◽  
pp. 421-428
Author(s):  
Diana Purwitasari ◽  
Aida Muflichah ◽  
Novrindah Alvi Hasanah ◽  
Agus Zainal Arifin

Undergraduate thesis as the final project, or in Indonesian called as Tugas Akhir, for each undergraduate student is a pre-requisite before student graduation and the successfulness in finishing the project becomes as one of learning outcomes among others. Determining the topic of the final project according to the ability of students is an important thing. One strategy to decide the topic is reading some literatures but it takes up more time. There is a need for a recommendation system to help students in determining the topic according to their abilities or subject understanding which is based on their academic transcripts. This study focused on a system for final project topic recommendations based on evaluating competencies in previous academic transcripts of graduated students. Collected data of previous final projects, namely titles and abstracts weighted by term occurences of TF-IDF (term frequency–inverse document frequency) and grouped by using K-Means Clustering. From each cluster result, we prepared candidates for recommended topics using Latent Dirichlet Allocation (LDA) with Gibbs Sampling that focusing on the word distribution of each topic in the cluster. Some evaluations were performed to evaluate the optimal cluster number, topic number and then made more thorough exploration on the recommendation results. Our experiments showed that the proposed system could recommend final project topic ideas based on student competence represented in their academic transcripts.

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Qianyao Zhu

In view of the lack of accurate recommendation and selection of courses on the network teaching platform in the new form of higher education, a network course recommendation system based on the double-layer attention mechanism is proposed. First of all, the collected data are preprocessed, while the data of students and course information are normalized and classified. Then, the dual attention mechanism is introduced into the parallel neural network recommendation model so as to improve the model’s ability to mine important features. TF-IDF (term frequency-inverse document frequency) based on the student score and course category is improved. The recommendation results are classified according to the weight of course categories, so as to construct different types of course groups and complete the recommendation. The experimental results show that the proposed algorithm can effectively improve the model recommendation accuracy compared with other algorithms.


Repositor ◽  
2020 ◽  
Vol 2 (9) ◽  
Author(s):  
Meilina Agustina ◽  
Yufiz Azhar ◽  
Nur Hayatin

AbstrakSistem rekomendasi adalah sebuah perangkat lunak untuk memberikan rekomendasi kepada pengguna mengenai produk yang dapat digunakannya. Masalah administrasi di kantor jurusan Pendidikan Guru Sekolah Dasar Universitas Muhammadiyah Malang merupakan salah satu permasalahan yang selalu dihadapi oleh para staf TU dan part timer. Penggunaan sistem manual yang masih berjalan saat ini dinilai kurang efektif terhadap waktu, tempat, dan tenaga sehingga diperlukan adanya bantuan berupa sistem informasi. Pada perancangan sistem informasi ini akan menggunakan metode Okapi BM25 dimana metode ini merupakan fungsi peringkat yang digunakan oleh mesin pencari (search engine) untuk peringkat dokumen pencocokan sesuai relevansinya dengan permintaan pencarian yaitu berupa topik tugas akhir. BM25 memiliki fungsi yang sesuai dengan 3 prinsip pembobotan yang baik, yaitu memiliki inverse document frequency (idf), term frequency (tf), dan memiliki fungsi normalisasi dari panjang dokumen (document length normalization).Abstract The recommendation system is a software to provide recommendations to users about the products they can use. The administrative problem in the office of the Primary School Teacher Education department at the University of Muhammadiyah Malang is one of the problems faced by the Administration staff and part timers. The use of manual systems that are still running at this time is considered to be less effective against time, place, and energy, so that assistance in the form of information systems is needed. In designing this information system will use the Okapi BM25 method where this method is a ranking function used by search engines for matching document rankings according to their relevance to search queries, namely in the form of final assignment topics. BM25 has functions that are in accordance with the 3 principles of good weighting, which has an inverse document frequency (idf), term frequency (tf), and has a document length normalization function.


2020 ◽  
Vol 10 (22) ◽  
pp. 8000
Author(s):  
Sukil Cha ◽  
Mun Y. Yi ◽  
Sekyoung Youm

As the number of researchers in South Korea has grown, there is increasing dissatisfaction with the selection process for national research and development (R&D) projects among unsuccessful applicants. In this study, we designed a system that can recommend the best possible R&D evaluators using big data that are collected from related systems, refined, and analyzed. Our big data recommendation system compares keywords extracted from applications and from the full-text of the achievements of the evaluator candidates. Weights for different keywords are scored using the term frequency–inverse document frequency algorithm. Comparing the keywords extracted from the achievement of the evaluator candidates’, a project comparison module searches, scores, and ranks these achievements similarly to the project applications. The similarity scoring module calculates the overall similarity scores for different candidates based on the project comparison module scores. To assess the performance of the evaluator candidate recommendation system, 61 applications in three Review Board (RB) research fields (system fusion, organic biochemistry, and Korean literature) were recommended as the evaluator candidates by the recommendation system in the same manner as the RB’s recommendation. Our tests reveal that the evaluator candidates recommended by the Korean Review Board and those recommended by our system for 61 applications in different areas, were the same. However, our system performed the recommendation in less time with no bias and fewer personnel. The system requiresrevisions to reflect qualitative indicators, such as journal reputation, before it can entirely replace the current evaluator recommendation process.


As the usage of internet is increasing, we are getting more dependent on it in our daily life. The Internet plays an essential role to simplify our tight schedules. In such tough lives, it is very important to stay aware of current affairs. Now for different people coming from different backgrounds and professions, the preferences are different too. Here come Data mining techniques in the picture, which gives us “Recommender system” as the output, capable of delivering more relevant and worthy outcomes. Newspapers are the basic obligation asked by almost every person to stay updated and aware of the world. But as we observe that nowadays, various solutions are been developed to convert paper news system to digital news and raise the bar of the quick news. And that’s how News Recommender systems are have made an important place in our fast running lives.This research paper has investigated the News Recommendation solution right from its core, including the importance, performance, and improvement suggestions. This paper talks about enhancing the performance of states solution by using modified Term Frequency-Inverse Document Frequency (TF-IDF) algorithms. Proposed solution advocates the usage of JAVA technology which reflects fruitful results in the final graphs of accuracy, precision, and F-score. Here, BBC dataset has been used for comparison study purposes.


Author(s):  
Chirag Variawa ◽  
Susan Mccahan

The goal of engineering examinations is to measure the academic performance of our students with respect to learning outcomes. On exams, the students are often asked contextualized questions using vocabulary that might be unfamiliar. A study is being conducted that investigates the accessibility of language on engineering exams with the goal of making language clearer for all students.Specifically, a Term-Frequency Inverse-Document Frequency (TF-IDF) algorithm is used to characterize words on a given engineering exam. By comparing data across different test cases, the TFIDF algorithm appears to accurately distinguish discipline-specific vocabulary from non-disciplinespecific vocabulary. These results inform an approach that could maintain the integrity of engineering exams and create more accessible assessment tools.


Author(s):  
Incheon Paik ◽  
◽  
Hiroshi Mizugai ◽  

A recent increase in RDF Site Summary (RSS) feeds, used for news updates and blogs, has been caused by the widespread use of blogs. This means that much effort is now needed to search the contents of RSS feeds because of this enormous quantity of material. To solve this problem, recommendation systems enable users to obtain relevant RSS contents easily and quickly. In previous research, an RSS recommendation system was proposed that used the similarity between the Term Frequency (TF) of the RSS contents and the TF derived from the contents of the user’s browsing history for RSS feeds. In this paper, we use Term Frequency-Inverse Document Frequency (TF-IDF) calculations to propose a Weighted TF-IDF method, which focuses on the terms folded by the title tags in RSS contents as characteristic terms. In addition, we propose a new recommendation method, which uses a Naive Bayes classifier in a Machine Learning-based approach. Via experiments, we compare the proposed methods and the existing method in a prototype recommendation system, and we show that the proposed methods outperform the existing method with respect to several evaluation measurements.


2021 ◽  
Vol 13 (4) ◽  
pp. 40-56
Author(s):  
Jiaohua Qin ◽  
Zhuo Zhou ◽  
Yun Tan ◽  
Xuyu Xiang ◽  
Zhibin He

Coverless information hiding has become a hot topic in recent years. The existing steganalysis tools are invalidated due to coverless steganography without any modification to the carrier. However, for the text coverless has relatively low hiding capacity, this paper proposed a big data text coverless information hiding method based on LDA (latent Dirichlet allocation) topic distribution and keyword TF-IDF (term frequency-inverse document frequency). Firstly, the sender and receiver build codebook, including word segmentation, word frequency and TF-IDF features, LDA topic model clustering. The sender then shreds the secret information, converts it into keyword ID through the keywords-index table, and searches the text containing the secret information keywords. Secondly, the searched text is taken as the index tag according to the topic distribution and TF-IDF features. At the same time, random numbers are introduced to control the keyword order of secret information.


Database ◽  
2019 ◽  
Vol 2019 ◽  
Author(s):  
Peter Brown ◽  
Aik-Choon Tan ◽  
Mohamed A El-Esawi ◽  
Thomas Liehr ◽  
Oliver Blanck ◽  
...  

Abstract Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.


1995 ◽  
Vol 1 (2) ◽  
pp. 163-190 ◽  
Author(s):  
Kenneth W. Church ◽  
William A. Gale

AbstractShannon (1948) showed that a wide range of practical problems can be reduced to the problem of estimating probability distributions of words and ngrams in text. It has become standard practice in text compression, speech recognition, information retrieval and many other applications of Shannon's theory to introduce a “bag-of-words” assumption. But obviously, word rates vary from genre to genre, author to author, topic to topic, document to document, section to section, and paragraph to paragraph. The proposed Poisson mixture captures much of this heterogeneous structure by allowing the Poisson parameter θ to vary over documents subject to a density function φ. φ is intended to capture dependencies on hidden variables such genre, author, topic, etc. (The Negative Binomial is a well-known special case where φ is a Г distribution.) Poisson mixtures fit the data better than standard Poissons, producing more accurate estimates of the variance over documents (σ2), entropy (H), inverse document frequency (IDF), and adaptation (Pr(x ≥ 2/x ≥ 1)).


Sign in / Sign up

Export Citation Format

Share Document