Enhancing Wikipedia search results using Text Mining

Author(s):  
K.D.C.G. Kapugama ◽  
S.A.S. Lorensuhewa ◽  
M.A.L. Kalyani
Keyword(s):  
2014 ◽  
Vol 136 (11) ◽  
Author(s):  
Michael W. Glier ◽  
Daniel A. McAdams ◽  
Julie S. Linsey

Bioinspired design is the adaptation of methods, strategies, or principles found in nature to solve engineering problems. One formalized approach to bioinspired solution seeking is the abstraction of the engineering problem into a functional need and then seeking solutions to this function using a keyword type search method on text based biological knowledge. These function keyword search approaches have shown potential for success, but as with many text based search methods, they produce a large number of results, many of little relevance to the problem in question. In this paper, we develop a method to train a computer to identify text passages more likely to suggest a solution to a human designer. The work presented examines the possibility of filtering biological keyword search results by using text mining algorithms to automatically identify which results are likely to be useful to a designer. The text mining algorithms are trained on a pair of surveys administered to human subjects to empirically identify a large number of sentences that are, or are not, helpful for idea generation. We develop and evaluate three text classification algorithms, namely, a Naïve Bayes (NB) classifier, a k nearest neighbors (kNN) classifier, and a support vector machine (SVM) classifier. Of these methods, the NB classifier generally had the best performance. Based on the analysis of 60 word stems, a NB classifier's precision is 0.87, recall is 0.52, and F score is 0.65. We find that word stem features that describe a physical action or process are correlated with helpful sentences. Similarly, we find biological jargon feature words are correlated with unhelpful sentences.


2020 ◽  
Vol 2 (2) ◽  
pp. 153-171
Author(s):  
Zulkifli Arsyad

Text mining is widely used to find hidden patterns and information in a large number of semi and unstructured texts. Text mining extracts interesting patterns to explore knowledge from textual data sources. Association rule extraction GARW (Generating Association Rule using Weighting Scheme) can be used to find knowledge from a collection of web content without having to read all the web content manually from the many search results of crawlers. The GARW algorithm is a development of a priori to produce relevant association rules. From the results of this knowledge discovery can facilitate netizens users in finding relevant information from search keywords without having to review one by one web content generated from search engine searches.


The world came across the worst pandemic of all times in the year 2020 due to the outburst of Severe Acute Respiratory Syndrome Coronavirus-2 or Covid-19. All the questions about this outbreak were piled up and research was fast growing [1]. A study showing that in precisely just six months, substantial databases have been swamped with research articles, news, notes, and editorial related to coronavirus. It estimates that 23,634 distinctly published articles have been indexed on Web of Science and Scopus between 1 January and 30 June 2020. Imagine the data that is with us today!! Approximately 200,000 scholarly articles have been published related to Covid-19. This tells us that there is a need for simplifying search results to get answers to high priority questions for users specifically scientists. Currently, document clustering tools are being used in many areas. A similar clustering tool can be made particularly for Covid-19 which will help scientists and researchers get answers to high priority questions about this pandemic. In this paper, we are discussing about the process of text mining, text categorization and, text clustering. Also, a comparison of the algorithms used for clustering particularly in text data.


Author(s):  
Rengga Asmara ◽  
◽  
Nur Rasyid Mubtadai ◽  
Varidh Bimantara

Fiction books are one of the most popular types of books in Indonesia. There are five most popular genres in fiction books, namely fantasy, mystery, romance, sci-fi, and thriller. Each genre gives a different impression and special interest for readers. It has become a common habit when people choose a fiction book based on the title, author, or publisher of the book. However, it does not provide precise search results. In this final project, an application system was developed to find out fiction books based on semantic impressions on the cover of the fiction book. The impression on each book cover is obtained through a survey of fiction book lovers in Indonesia. To get the results of the closeness between the user search and the impression survey data obtained through text mining, as well as the cosine similarity algorithm to calculate the most precise proximity value to the impression the user expects. The results of this system display a fiction book that has a closeness value with an error rate of 3.93% based on the impression expected by the user.


2013 ◽  
Author(s):  
Ronald N. Kostoff ◽  
◽  
Henry A. Buchtel ◽  
John Andrews ◽  
Kirstin M. Pfiel

2020 ◽  
Vol 42 (5) ◽  
pp. 279-307
Author(s):  
Yonglim Joe
Keyword(s):  

2019 ◽  
Vol 19 (2) ◽  
pp. 29-38
Author(s):  
Young-Hee Kim ◽  
◽  
Taek-Hyun Lee ◽  
Jong-Myoung Kim ◽  
Won-Hyung Park ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document