inverse document frequency
Recently Published Documents


TOTAL DOCUMENTS

224
(FIVE YEARS 152)

H-INDEX

12
(FIVE YEARS 4)

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Retrieving keywords in a text is attracting researchers for a long time as it forms a base for many natural language applications like information retrieval, text summarization, document categorization etc. A text is a collection of words that represent the theme of the text naturally and to bring the naturalism under certain rules is itself a challenging task. In the present paper, the authors evaluate different spatial distribution based keyword extraction methods available in the literature on three standard scientific texts. The authors choose the first few high-frequency words for evaluation to reduce the complexity as all the methods are somehow based on frequency. The authors find that the methods are not providing good results particularly in the case of the first few retrieved words. Thus, the authors propose a new measure based on frequency, inverse document frequency, variance, and Tsallis entropy. Evaluation of different methods is done on the basis of precision, recall, and F-measure. Results show that the proposed method provides improved results.


Author(s):  
Charan Lokku

Abstract: To avoid fraudulent Job postings on the internet, we target to minimize the number of such frauds through the Machine Learning approach to predict the chances of a job being fake so that the candidate can stay alert and make informed decisions if required. The model will use NLP to analyze the sentiments and pattern in the job posting and TF-IDF vectorizer for feature extraction. In this model, we are going to use Synthetic Minority Oversampling Technique (SMOTE) to balance the data and for classification, we used Random Forest to predict output with high accuracy, even for the large dataset it runs efficiently, and it enhances the accuracy of the model and prevents the overfitting issue. The final model will take in any relevant job posting data and produce a result determining whether the job is real or fake. Keywords: Natural Language Processing (NLP), Term Frequency-Inverse Document Frequency (TF-IDF), Synthetic Minority Oversampling Technique (SMOTE), Random Forest.


2021 ◽  
Vol 6 (3) ◽  
pp. 236-251
Author(s):  
Novira Azpiranda ◽  
Ahmad Afif Supianto ◽  
Nanang Yudi Setiawan ◽  
Endang Suryawati ◽  
R. Sandra Yuwana ◽  
...  

Al-Ghiff Steak is a restaurant located in Cirebon City that offers quality steaks at affordable prices. For maintaining a competitive Al-Ghiff Steak advantage and reputation, it is important to build a good relationship with customers and have a business strategy that considers customer opinions. However, in its implementation, Al-Ghiff Steak has difficulty when collecting and processing customer review data manually. Therefore, it is necessary to conduct sentiment analysis by utilizing Google Reviews to determine customer perspectives regarding Al-Ghiff Steak products and services. This analysis was conducted on 968 Google Review reviews from 2016 to 2020 using the Support Vector Machine (SVM) and Term Frequency-Inverse Document Frequency (TF-IDF) methods. Classification testing is done with a confusion matrix against four parameters: accuracy, precision, recall, and f1-score. SVM with TF-IDF gets accuracy value 83%, precision 64%, recall 60% and f1-score 59%. The sentiment classification result is then visualized in the form of a dashboard. We utilize the System Usability Scale (SUS) for usability testing, which produces a value of 77.5. This result achieve the Acceptable category and an Excellent rating.


2021 ◽  
Vol 02 (02) ◽  
Author(s):  
Mohammed A. Ahmed ◽  
◽  
Hanif Baharin ◽  
Puteri N. E. Nohuddin ◽  
◽  
...  

Al-Quran is the primary text of Muslims’ religion and practise. Millions of Muslims around the world use al-Quran as their reference guide, and so knowledge can be obtained from it by Muslims and Islamic scholars in general. Al-Quran has been reinterpreted to various languages in the world, for example, English and has been written by several translators. Each translator has ideas, comments and statements to translate the verses from which he has obtained (Tafseer). Therefore, this paper tries to cluster the translation of the Tafseer using text clustering. Text clustering is the text mining method that needs to be clustered in the same section of related documents. The study adapted (mini-batch k-means and k-means) algorithms of clustering techniques to explain and to define the link between keywords known as features or concepts for Al-Baqarah chapter of 286 verses. For this dataset, data preprocessing and extraction of features using Term Frequency-Inverse Document Frequency (TF-IDF) and Principal Component Analysis (PCA) applied. Results showed that two/three-dimensional clustering plotting assigning seven cluster categories (k = 7) for the Tafseer. The implementation time of the mini-batch k-means algorithm (0.05485s) outperformed the time of the k-means algorithm (0.23334s). Finally, the features ‘god’, ‘people’, and ‘believe’ was the most frequent features.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Qianyao Zhu

In view of the lack of accurate recommendation and selection of courses on the network teaching platform in the new form of higher education, a network course recommendation system based on the double-layer attention mechanism is proposed. First of all, the collected data are preprocessed, while the data of students and course information are normalized and classified. Then, the dual attention mechanism is introduced into the parallel neural network recommendation model so as to improve the model’s ability to mine important features. TF-IDF (term frequency-inverse document frequency) based on the student score and course category is improved. The recommendation results are classified according to the weight of course categories, so as to construct different types of course groups and complete the recommendation. The experimental results show that the proposed algorithm can effectively improve the model recommendation accuracy compared with other algorithms.


Information ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 486
Author(s):  
Xiaoyan Zhang ◽  
Qiang Yan ◽  
Simin Zhou ◽  
Linye Ma ◽  
Siran Wang

The number of consumers playing virtual reality games is booming. To speed up product iteration, the user experience team needs to collect and analyze unsatisfying experiences in time. In this paper, we aim to detect the unsatisfying experiences hidden in online reviews of virtual reality exergames using a deep learning method and find out the unmet psychological needs of users based on self-determination theory. Convolutional neural networks for sentence classification (textCNN) are used in this study to classify online reviews with unsatisfying experiences. For comparison, we set eXtreme gradient boosting (XGBoost) with lexical features as the baseline of machine learning. Term frequency-inverse document frequency (TF-IDF) is used to extract keywords from every set of classified reviews. The micro-F1 score of textCNN classifier is 90.00, which is better than 82.69 of XGBoost. The top 10 keywords of every set of reviews reflect relevant topics of unmet psychological needs. This paper explores the potential problems causing unsatisfying experiences and unmet psychological needs in virtual reality exergames through text mining and makes a supplement for experimental studies about virtual reality exergames.


Author(s):  
E. Sri Vishva ◽  
D. Aju

Fundamentally, phishing is a common cybercrime that is indulged by the intruders or hackers on naive and credible individuals and make them to reveal their unique and sensitive information through fictitious websites. The primary intension of this kind of cybercrime is to gain access to the ad hominem or classified information from the recipients. The obtained data comprises of information that can very well utilized to recognize an individual. The purloined personal or sensitive information is commonly marketed in the online dark market and subsequently these information will be bought by the personal identity brigands. Depending upon the sensitivity and the importance of the stolen information, the price of a single piece of purloined information would vary from few dollars to thousands of dollars. Machine learning (ML) as well as Deep Learning (DL) are powerful methods to analyse and endeavour against these phishing attacks. A machine learning based phishing detection system is proposed to protect the website and users from such attacks. In order to optimize the results in a better way, the TF-IDF (Term Frequency-Inverse Document Frequency) value of webpages is employed within the system. ML methods such as LR (Logistic Regression), RF (Random Forest), SVM (Support Vector Machine), NB (Naive Bayes) and SGD (Stochastic Gradient Descent) are applied for training and testing the obtained dataset. Henceforth, a robust phishing website detection system is developed with 90.68% accuracy.


Author(s):  
Siwadol Sateanpattanakul ◽  
Duangpen Jetpipattanapong ◽  
Seksan Mathulaprangsan

Decompilation is the main process of software development, which is very important when a program tries to retrieve lost source codes. Although decompiling Java bytecode is easier than bytecode, many Java decompilers cannot recover originally lost sources, especially the selection statement, i.e., if statement. This deficiency affects directly decompilation performance. In this paper, we propose the methodology for guiding Java decompiler to deal with the aforementioned problem. In the framework, Java bytecode is transformed into two kinds of features called frame feature and latent semantic feature. The former is extracted directly from the bytecode. The latter is achieved by two-step transforming the Java bytecode to bigram and then term frequency-inverse document frequency (TFIDF). After that, both of them are fed to the genetic algorithm to reduce their dimensions. The proposed feature is achieved by converting the selected TFIDF to a latent semantic feature and concatenating it with the selected frame feature. Finally, KNN is used to classify the proposed feature. The experimental results show that the decompilation accuracy is 93.68 percent, which is obviously better than Java Decompiler.


Author(s):  
Irawan Dwi Wahyono ◽  
Khoirudin Asfani ◽  
Mohd Murtadha Mohamad ◽  
Djoko Saryono ◽  
Hari Putranto ◽  
...  

2021 ◽  
Vol 4 (3) ◽  
pp. 19-29
Author(s):  
Tanish Maheshwari ◽  
◽  
Tarpara Nisarg Bhaveshbhai ◽  
Mitali Halder ◽  
◽  
...  

The number of songs are increasing at a very high rate around the globe. Out of the songs released every year, only the top few songs make it to the billboard hit charts .The lyrics of the songs place an important role in making them big hits combined with various other factors like loudness, liveness, speech ness, pop, etc. The artists are faced with the problem of finding the most desired topics to create song lyrics on. This problem is further amplified in selecting the most unique, catchy words which if added, could create more powerful lyrics for the songs. We propose a solution of finding the bag of unique evergreen words using the term frequency-inverse document frequency (TF-IDF) technique of natural language processing. The words from this bag of unique evergreen words could be added in the lyrics of the songs to create more powerful lyrics in the future.


Sign in / Sign up

Export Citation Format

Share Document