TEXT MINING MENGGUNAKAN GENERATE ASSOCIATION RULE WITH WEIGHT (GARW) ALGORITHM UNTUK ANALISIS TEKS WEB CRAWLER

Zulkifli Arsyad

doi:10.32627/internal.v2i2.86

TEXT MINING MENGGUNAKAN GENERATE ASSOCIATION RULE WITH WEIGHT (GARW) ALGORITHM UNTUK ANALISIS TEKS WEB CRAWLER

INTERNAL (Information System Journal) ◽

10.32627/internal.v2i2.86 ◽

2020 ◽

Vol 2 (2) ◽

pp. 153-171

Author(s):

Zulkifli Arsyad

Keyword(s):

Text Mining ◽

Association Rule ◽

A Priori ◽

Relevant Information ◽

Web Content ◽

Web Crawler ◽

Search Results ◽

Textual Data ◽

Hidden Patterns ◽

The Many

Text mining is widely used to find hidden patterns and information in a large number of semi and unstructured texts. Text mining extracts interesting patterns to explore knowledge from textual data sources. Association rule extraction GARW (Generating Association Rule using Weighting Scheme) can be used to find knowledge from a collection of web content without having to read all the web content manually from the many search results of crawlers. The GARW algorithm is a development of a priori to produce relevant association rules. From the results of this knowledge discovery can facilitate netizens users in finding relevant information from search keywords without having to review one by one web content generated from search engine searches.

Download Full-text

Biclustering and coclustering: concepts, algorithms and viability for text mining

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.89063 ◽

2019 ◽

Vol 26 (2) ◽

pp. 81-117

Author(s):

Alexandra Katiuska Ramos Diaz ◽

Sarajane Marques Peres

Keyword(s):

Text Mining ◽

Real World ◽

Vector Space Model ◽

Relevant Information ◽

Similarity Criteria ◽

Vector Spaces ◽

Real World Data ◽

Space Model ◽

Textual Data ◽

Real World Problems

Biclustering and coclustering are data mining tasks capable of extracting relevant information from data by applying similarity criteria simultaneously to rows and columns of data matrices. Algorithms used to accomplish these tasks simultaneously cluster objects and attributes, enabling the discovery of biclusters or coclusters. Although similar, the natures and aims of these tasks are different, and coclustering can be seen as a generalization of biclustering. An accurate study on algorithms related to biclustering and coclustering is essential to achieve effectiveness when solving real-world problems. Determining the values appropriate for the parameters of these algorithms is even more difficult when complex real-world data are analyzed. For example, when biclustering or coclustering is applied to textual data (i.e., in text mining), a representation through a vector space model is required. Such representation usually generates vector spaces with a high number of dimensions and high sparsity, which influences the performance of many algorithms. This tutorial aims to didactically present concepts related to the biclustering and coclustering tasks and how two basic algorithms address these concepts. In addition, experiments are presented in data contexts with a high number of dimensions and high sparsity, represented by both a synthetic dataset and a corpus of real-world news. In general and comparative terms, the results obtained show the algorithm used for coclustering (i.e., NBVD) as the most appropriate for the experiments’ context. Although the biclustering algorithm (i.e., Cheng and Church) was responsible for producing less relevant results in textual data than NBVD, its application in data with a high number of dimensions and high sparsity provided a suitable study environment to understand its operation.

Download Full-text

The Role of Global Appearance of Omnidirectional Images in Relative Distance and Orientation Retrieval

Sensors ◽

10.3390/s21103327 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3327

Author(s):

Vicente Román ◽

Luis Payá ◽

Adrián Peidró ◽

Mónica Ballesta ◽

Oscar Reinoso

Keyword(s):

Mobile Robotics ◽

A Priori ◽

Computational Cost ◽

Relevant Information ◽

Local Features ◽

New Family ◽

Lighting Conditions ◽

Omnidirectional Images ◽

Great Development

Over the last few years, mobile robotics has experienced a great development thanks to the wide variety of problems that can be solved with this technology. An autonomous mobile robot must be able to operate in a priori unknown environments, planning its trajectory and navigating to the required target points. With this aim, it is crucial solving the mapping and localization problems with accuracy and acceptable computational cost. The use of omnidirectional vision systems has emerged as a robust choice thanks to the big quantity of information they can extract from the environment. The images must be processed to obtain relevant information that permits solving robustly the mapping and localization problems. The classical frameworks to address this problem are based on the extraction, description and tracking of local features or landmarks. However, more recently, a new family of methods has emerged as a robust alternative in mobile robotics. It consists of describing each image as a whole, what leads to conceptually simpler algorithms. While methods based on local features have been extensively studied and compared in the literature, those based on global appearance still merit a deep study to uncover their performance. In this work, a comparative evaluation of six global-appearance description techniques in localization tasks is carried out, both in terms of accuracy and computational cost. Some sets of images captured in a real environment are used with this aim, including some typical phenomena such as changes in lighting conditions, visual aliasing, partial occlusions and noise.

Download Full-text

Enriching Textual Search Results at Query Time Using Entity Mining, Linked Data and Link Analysis

International Journal of Semantic Computing ◽

10.1142/s1793351x14400170 ◽

2014 ◽

Vol 08 (04) ◽

pp. 515-544 ◽

Cited By ~ 3

Author(s):

Pavlos Fafalios ◽

Panagiotis Papadakos ◽

Yannis Tzitzikas

Keyword(s):

Semantic Information ◽

A Priori ◽

Open Data ◽

Search Space ◽

Link Analysis ◽

Query Time ◽

Search Results ◽

Ranking Scheme ◽

Web Of Data ◽

Comparative Results

The integration of the classical Web (of documents) with the emerging Web of Data is a challenging vision. In this paper we focus on an integration approach during searching which aims at enriching the responses of non-semantic search systems with semantic information, i.e. Linked Open Data (LOD), and exploiting the outcome for offering advanced exploratory search services which provide an overview of the search space and allow the users to explore the related LOD. We use named entities identified in the search results for automatically connecting search hits with LOD and we consider a scenario where this entity-based integration is performed at query time with no human effort and no a-priori indexing which is beneficial in terms of configurability and freshness. However, the number of identified entities can be high and the same is true for the semantic information about these entities that can be fetched from the available LOD. To this end, in this paper we propose a Link Analysis-based method which is used for ranking (and thus selecting to show) the more important semantic information related to the search results. We report the results of a survey regarding the marine domain with promising results, and comparative results that illustrate the effectiveness of the proposed (PageRank-based) ranking scheme. Finally, we report experimental results regarding efficiency showing that the proposed functionality can be offered even at query time.

Download Full-text

Semantic Web mining for Content-Based Online Shopping Recommender Systems

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2019100103 ◽

2019 ◽

Vol 15 (4) ◽

pp. 41-56 ◽

Cited By ~ 1

Author(s):

Ibukun Tolulope Afolabi ◽

Opeyemi Samuel Makinde ◽

Olufunke Oyejoke Oladipupo

Keyword(s):

Semantic Web ◽

Recommender Systems ◽

Web Mining ◽

Online Shopping ◽

Semantic Analysis ◽

Second Phase ◽

Web Content ◽

Textual Data ◽

Content Mining ◽

Bayes Algorithm

Currently, for content-based recommendations, semantic analysis of text from webpages seems to be a major problem. In this research, we present a semantic web content mining approach for recommender systems in online shopping. The methodology is based on two major phases. The first phase is the semantic preprocessing of textual data using the combination of a developed ontology and an existing ontology. The second phase uses the Naïve Bayes algorithm to make the recommendations. The output of the system is evaluated using precision, recall and f-measure. The results from the system showed that the semantic preprocessing improved the recommendation accuracy of the recommender system by 5.2% over the existing approach. Also, the developed system is able to provide a platform for content-based recommendation in online shopping. This system has an edge over the existing recommender approaches because it is able to analyze the textual contents of users feedback on a product in order to provide the necessary product recommendation.

Download Full-text

Combining Text Mining and Data Visualization Techniques to Understand Consumer Experiences of Electronic Cigarettes and Hookah in Online Forums

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v7i1.5783 ◽

2015 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Annie T. Chen ◽

Shu-Hong Zhu ◽

Mike Conway

Keyword(s):

Text Mining ◽

Data Visualization ◽

Electronic Cigarettes ◽

Discussion Forums ◽

Online Forums ◽

Textual Data ◽

Consumer Experiences ◽

Visualization Techniques

Our aim in this work is to apply text mining and novel visualization techniques to textual data derived from online health discussion forums in order to better understand consumers experiences and perceptions of electronic cigarettes and hookah.

Download Full-text

Exploring Automated Text Classification to Improve Keyword Corpus Search Results for Bioinspired Design

Journal of Mechanical Design ◽

10.1115/1.4028167 ◽

2014 ◽

Vol 136 (11) ◽

Cited By ~ 8

Author(s):

Michael W. Glier ◽

Daniel A. McAdams ◽

Julie S. Linsey

Keyword(s):

Text Mining ◽

Text Classification ◽

Keyword Search ◽

Idea Generation ◽

Support Vector ◽

Biological Knowledge ◽

Svm Classifier ◽

Search Results ◽

Bioinspired Design ◽

Mining Algorithms

Bioinspired design is the adaptation of methods, strategies, or principles found in nature to solve engineering problems. One formalized approach to bioinspired solution seeking is the abstraction of the engineering problem into a functional need and then seeking solutions to this function using a keyword type search method on text based biological knowledge. These function keyword search approaches have shown potential for success, but as with many text based search methods, they produce a large number of results, many of little relevance to the problem in question. In this paper, we develop a method to train a computer to identify text passages more likely to suggest a solution to a human designer. The work presented examines the possibility of filtering biological keyword search results by using text mining algorithms to automatically identify which results are likely to be useful to a designer. The text mining algorithms are trained on a pair of surveys administered to human subjects to empirically identify a large number of sentences that are, or are not, helpful for idea generation. We develop and evaluate three text classification algorithms, namely, a Naïve Bayes (NB) classifier, a k nearest neighbors (kNN) classifier, and a support vector machine (SVM) classifier. Of these methods, the NB classifier generally had the best performance. Based on the analysis of 60 word stems, a NB classifier's precision is 0.87, recall is 0.52, and F score is 0.65. We find that word stem features that describe a physical action or process are correlated with helpful sentences. Similarly, we find biological jargon feature words are correlated with unhelpful sentences.

Download Full-text

IMPLEMENTASI ALGORITMA GOOGLE LATENT SEMANTIC DISTANCE UNTUK EKSTRAKSI RANGKAIAN KATA KUNCI ARTIKEL JURNAL ILMIAH

Computatio : Journal of Computer Science and Information Systems ◽

10.24912/computatio.v2i2.2569 ◽

2018 ◽

Vol 2 (2) ◽

pp. 186

Author(s):

Novario Jaya Perdana

Keyword(s):

Search Engine ◽

Search Engines ◽

Semantic Distance ◽

Relevant Information ◽

High Accuracy ◽

Hard Work ◽

The Internet ◽

Search Results ◽

Search Result

The accuracy of search result using search engine depends on the keywords that are used. Lack of the information provided on the keywords can lead to reduced accuracy of the search result. This means searching information on the internet is a hard work. In this research, a software has been built to create document keywords sequences. The software uses Google Latent Semantic Distance which can extract relevant information from the document. The information is expressed in the form of specific words sequences which could be used as keyword recommendations in search engines. The result shows that the implementation of the method for creating document keyword recommendation achieved high accuracy and could finds the most relevant information in the top search results.

Download Full-text

Novel IT Technologies on the Digital Battlefield: The Application of Big Data and Data Mining Technologies

Hadmérnök ◽

10.32567/hm.2020.4.10 ◽

2020 ◽

Vol 15 (4) ◽

pp. 141-158

Author(s):

Eszter Katalin Bognár

Keyword(s):

Data Mining ◽

Big Data ◽

Data Processing ◽

Relevant Information ◽

Military Operations ◽

Textual Data ◽

Modern Warfare ◽

Tools And Techniques

In modern warfare, the most important innovation to date has been the utilisation of information as a weapon. The basis of successful military operations is the ability to correctly assess a situation based on credible collected information. In today’s military, the primary challenge is not the actual collection of data. It has become more important to extract relevant information from that data. This requirement cannot be successfully completed without necessary improvements in tools and techniques to support the acquisition and analysis of data. This study defines Big Data and its concept as applied to military reconnaissance, focusing on the processing of imagery and textual data, bringing to light modern data processing and analytics methods that enable effective processing.

Download Full-text

A Text Mining Analysis of Central Bank Monetary Policy Communication in Nigeria

Central Bank of Nigeria Journal of Applied Statistics ◽

10.33429/cjas.10219.3/6 ◽

2020 ◽

pp. 73-107

Author(s):

Mohammed M. Tumala ◽

Babatunde S. Omotosho

Keyword(s):

Monetary Policy ◽

Text Mining ◽

Central Bank ◽

Communication Strategy ◽

Central Bank Communication ◽

Monetary Policy Committee ◽

Textual Data ◽

Policy Communication ◽

Policy Objectives ◽

Monetary Policy Transparency

This paper employs text-mining techniques to analyse the communication strategy of the Central Bank of Nigeria (CBN) during the period 2004-2019. Since the policy communique released after each meeting of the CBN’s monetary policy committee (MPC) represents an important tool of central bank communication, we construct a corpus based on 87 policy communiques with a total of 123, 353 words. Having processed the textual data into a form suitable for analysis, we examined the readability, sentiments, and topics of the policy documents. While the CBN’s communication has increased substantially over the years, implying increased monetary policy transparency; the computed Coleman and Liau readability index shows that the word and sentence structures of the policy communiques have become more complex, thus reducing its readability. In terms of monetary policy sentiments, we find an average net score of -10.5 per cent, reflecting the level of policy uncertainties faced by the MPC over the sample period. In addition, our results indicate that the topics driving the linguistic contents of the communiques were influenced by the Bank’s policy objectives as well as the nature of shocks hitting the economy per period.

Download Full-text

APLIKASI WEB CRAWLER UNTUK WEB CONTENT PADA MOBILE PHONE

JUTI Jurnal Ilmiah Teknologi Informasi ◽

10.12962/j24068535.v7i3.a79 ◽

2009 ◽

Vol 7 (3) ◽

pp. 127

Author(s):

Sarwosri Sarwosri ◽

Ahmad Hoirul Basori ◽

Wahyu Budi Surastyo

Keyword(s):

Mobile Phone ◽

Web Content ◽

Web Crawler

Download Full-text