scholarly journals TEXT MINING MENGGUNAKAN GENERATE ASSOCIATION RULE WITH WEIGHT (GARW) ALGORITHM UNTUK ANALISIS TEKS WEB CRAWLER

2020 ◽  
Vol 2 (2) ◽  
pp. 153-171
Author(s):  
Zulkifli Arsyad

Text mining is widely used to find hidden patterns and information in a large number of semi and unstructured texts. Text mining extracts interesting patterns to explore knowledge from textual data sources. Association rule extraction GARW (Generating Association Rule using Weighting Scheme) can be used to find knowledge from a collection of web content without having to read all the web content manually from the many search results of crawlers. The GARW algorithm is a development of a priori to produce relevant association rules. From the results of this knowledge discovery can facilitate netizens users in finding relevant information from search keywords without having to review one by one web content generated from search engine searches.

2019 ◽  
Vol 26 (2) ◽  
pp. 81-117
Author(s):  
Alexandra Katiuska Ramos Diaz ◽  
Sarajane Marques Peres

Biclustering and coclustering are data mining tasks capable of extracting relevant information from data by applying similarity criteria simultaneously to rows and columns of data matrices. Algorithms used to accomplish these tasks simultaneously cluster objects and attributes, enabling the discovery of biclusters or coclusters. Although similar, the natures and aims of these tasks are different, and coclustering can be seen as a generalization of biclustering. An accurate study on algorithms related to biclustering and coclustering is essential to achieve effectiveness when solving real-world problems. Determining the values appropriate for the parameters of these algorithms is even more difficult when complex real-world data are analyzed. For example, when biclustering or coclustering is applied to textual data (i.e., in text mining), a representation through a vector space model is required. Such representation usually generates vector spaces with a high number of dimensions and high sparsity, which influences the performance of many algorithms. This tutorial aims to didactically present concepts related to the biclustering and coclustering tasks and how two basic algorithms address these concepts. In addition, experiments are presented in data contexts with a high number of dimensions and high sparsity, represented by both a synthetic dataset and a corpus of real-world news. In general and comparative terms, the results obtained show the algorithm used for coclustering (i.e., NBVD) as the most appropriate for the experiments’ context. Although the biclustering algorithm (i.e., Cheng and Church) was responsible for producing less relevant results in textual data than NBVD, its application in data with a high number of dimensions and high sparsity provided a suitable study environment to understand its operation.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3327
Author(s):  
Vicente Román ◽  
Luis Payá ◽  
Adrián Peidró ◽  
Mónica Ballesta ◽  
Oscar Reinoso

Over the last few years, mobile robotics has experienced a great development thanks to the wide variety of problems that can be solved with this technology. An autonomous mobile robot must be able to operate in a priori unknown environments, planning its trajectory and navigating to the required target points. With this aim, it is crucial solving the mapping and localization problems with accuracy and acceptable computational cost. The use of omnidirectional vision systems has emerged as a robust choice thanks to the big quantity of information they can extract from the environment. The images must be processed to obtain relevant information that permits solving robustly the mapping and localization problems. The classical frameworks to address this problem are based on the extraction, description and tracking of local features or landmarks. However, more recently, a new family of methods has emerged as a robust alternative in mobile robotics. It consists of describing each image as a whole, what leads to conceptually simpler algorithms. While methods based on local features have been extensively studied and compared in the literature, those based on global appearance still merit a deep study to uncover their performance. In this work, a comparative evaluation of six global-appearance description techniques in localization tasks is carried out, both in terms of accuracy and computational cost. Some sets of images captured in a real environment are used with this aim, including some typical phenomena such as changes in lighting conditions, visual aliasing, partial occlusions and noise.


2014 ◽  
Vol 08 (04) ◽  
pp. 515-544 ◽  
Author(s):  
Pavlos Fafalios ◽  
Panagiotis Papadakos ◽  
Yannis Tzitzikas

The integration of the classical Web (of documents) with the emerging Web of Data is a challenging vision. In this paper we focus on an integration approach during searching which aims at enriching the responses of non-semantic search systems with semantic information, i.e. Linked Open Data (LOD), and exploiting the outcome for offering advanced exploratory search services which provide an overview of the search space and allow the users to explore the related LOD. We use named entities identified in the search results for automatically connecting search hits with LOD and we consider a scenario where this entity-based integration is performed at query time with no human effort and no a-priori indexing which is beneficial in terms of configurability and freshness. However, the number of identified entities can be high and the same is true for the semantic information about these entities that can be fetched from the available LOD. To this end, in this paper we propose a Link Analysis-based method which is used for ranking (and thus selecting to show) the more important semantic information related to the search results. We report the results of a survey regarding the marine domain with promising results, and comparative results that illustrate the effectiveness of the proposed (PageRank-based) ranking scheme. Finally, we report experimental results regarding efficiency showing that the proposed functionality can be offered even at query time.


2019 ◽  
Vol 15 (4) ◽  
pp. 41-56 ◽  
Author(s):  
Ibukun Tolulope Afolabi ◽  
Opeyemi Samuel Makinde ◽  
Olufunke Oyejoke Oladipupo

Currently, for content-based recommendations, semantic analysis of text from webpages seems to be a major problem. In this research, we present a semantic web content mining approach for recommender systems in online shopping. The methodology is based on two major phases. The first phase is the semantic preprocessing of textual data using the combination of a developed ontology and an existing ontology. The second phase uses the Naïve Bayes algorithm to make the recommendations. The output of the system is evaluated using precision, recall and f-measure. The results from the system showed that the semantic preprocessing improved the recommendation accuracy of the recommender system by 5.2% over the existing approach. Also, the developed system is able to provide a platform for content-based recommendation in online shopping. This system has an edge over the existing recommender approaches because it is able to analyze the textual contents of users feedback on a product in order to provide the necessary product recommendation.


Author(s):  
Annie T. Chen ◽  
Shu-Hong Zhu ◽  
Mike Conway

Our aim in this work is to apply text mining and novel visualization techniques to textual data derived from online health discussion forums in order to better understand consumers experiences and perceptions of electronic cigarettes and hookah.


2014 ◽  
Vol 136 (11) ◽  
Author(s):  
Michael W. Glier ◽  
Daniel A. McAdams ◽  
Julie S. Linsey

Bioinspired design is the adaptation of methods, strategies, or principles found in nature to solve engineering problems. One formalized approach to bioinspired solution seeking is the abstraction of the engineering problem into a functional need and then seeking solutions to this function using a keyword type search method on text based biological knowledge. These function keyword search approaches have shown potential for success, but as with many text based search methods, they produce a large number of results, many of little relevance to the problem in question. In this paper, we develop a method to train a computer to identify text passages more likely to suggest a solution to a human designer. The work presented examines the possibility of filtering biological keyword search results by using text mining algorithms to automatically identify which results are likely to be useful to a designer. The text mining algorithms are trained on a pair of surveys administered to human subjects to empirically identify a large number of sentences that are, or are not, helpful for idea generation. We develop and evaluate three text classification algorithms, namely, a Naïve Bayes (NB) classifier, a k nearest neighbors (kNN) classifier, and a support vector machine (SVM) classifier. Of these methods, the NB classifier generally had the best performance. Based on the analysis of 60 word stems, a NB classifier's precision is 0.87, recall is 0.52, and F score is 0.65. We find that word stem features that describe a physical action or process are correlated with helpful sentences. Similarly, we find biological jargon feature words are correlated with unhelpful sentences.


Author(s):  
Novario Jaya Perdana

The accuracy of search result using search engine depends on the keywords that are used. Lack of the information provided on the keywords can lead to reduced accuracy of the search result. This means searching information on the internet is a hard work. In this research, a software has been built to create document keywords sequences. The software uses Google Latent Semantic Distance which can extract relevant information from the document. The information is expressed in the form of specific words sequences which could be used as keyword recommendations in search engines. The result shows that the implementation of the method for creating document keyword recommendation achieved high accuracy and could finds the most relevant information in the top search results.


Hadmérnök ◽  
2020 ◽  
Vol 15 (4) ◽  
pp. 141-158
Author(s):  
Eszter Katalin Bognár

In modern warfare, the most important innovation to date has been the utilisation of information as a  weapon. The basis of successful military operations is  the ability to correctly assess a situation based on  credible collected information. In today’s military, the primary challenge is not the actual collection of data.  It has become more important to extract relevant  information from that data. This requirement cannot  be successfully completed without necessary  improvements in tools and techniques to support the acquisition and analysis of data. This study defines  Big Data and its concept as applied to military  reconnaissance, focusing on the processing of  imagery and textual data, bringing to light modern  data processing and analytics methods that enable  effective processing.


Author(s):  
Mohammed M. Tumala ◽  
Babatunde S. Omotosho

This paper employs text-mining techniques to analyse the communication strategy of the Central Bank of Nigeria (CBN) during the period 2004-2019. Since the policy communique released after each meeting of the CBN’s monetary policy committee (MPC) represents an important tool of central bank communication, we construct a corpus based on 87 policy communiques with a total of 123, 353 words. Having processed the textual data into a form suitable for analysis, we examined the readability, sentiments, and topics of the policy documents. While the CBN’s communication has increased substantially over the years, implying increased monetary policy transparency; the computed Coleman and Liau readability index shows that the word and sentence structures of the policy communiques have become more complex, thus reducing its readability. In terms of monetary policy sentiments, we find an average net score of -10.5 per cent, reflecting the level of policy uncertainties faced by the MPC over the sample period. In addition, our results indicate that the topics driving the linguistic contents of the communiques were influenced by the Bank’s policy objectives as well as the nature of shocks hitting the economy per period.


2009 ◽  
Vol 7 (3) ◽  
pp. 127
Author(s):  
Sarwosri Sarwosri ◽  
Ahmad Hoirul Basori ◽  
Wahyu Budi Surastyo
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document