Text data mining: a case study

Author(s):  
C.W. Ford ◽  
Chia-Chu Chiang ◽  
Hao Wu ◽  
R.R. Chilka ◽  
J.R. Talburt
2017 ◽  
Vol 2017 (66) ◽  
pp. 106-151
Author(s):  
Carlos M. Parra ◽  
Monica Chiarini Tremblay ◽  
Karen Paul ◽  
Arturo Castellanos

2002 ◽  
Author(s):  
Sai-Ming Li ◽  
Sanjeev Seereeram ◽  
Raman K. Mehra ◽  
Chris Miles

2005 ◽  
Vol 277-279 ◽  
pp. 259-265
Author(s):  
Jin Ah Park ◽  
Chang Su Lee ◽  
Jong C. Park

An abundant amount of information is produced in the digital domain, and an effective information extraction (IE) system is required to surf through this sea of information. In this paper, we show that an interactive visualization system works effectively to complement an IE system. In particular, three-dimensional (3D) visualization can turn a data-centric system into a user-centric one by facilitating the human visual system as a powerful pattern recognizer to become a part of the IE cycle. Because information as data is multidimensional in nature, 2D visualization has been the preferred mode. However, we argue that the extra dimension available for us in a 3D mode provides a valuable space where we can pack an orthogonal aspect of the available information. As for candidates of this orthogonal information, we have considered the following two aspects: 1) abstraction of the unstructured source data, and 2) the history line of the discovery process. We have applied our proposal to text data mining in bioinformatics. Through case studies of data mining for molecular interaction in the yeast and mitogen-activated protein kinase pathways, we demonstrate the possibility of interpreting the extracted results with a 3D visualization system.


2019 ◽  
Vol 2019 (1) ◽  
pp. 10848
Author(s):  
Andres Fortino ◽  
Roy Lowrance ◽  
Qitong Zhong ◽  
WeiChieh Huang

Author(s):  
Yasufumi Takama ◽  
◽  
Takuma Tonegawa

This paper proposes an interactive document clustering system, which is designed based on the concept of CMV (coordinated multiple views). An interactive document clustering is used by a user to obtain a set of document groups from a document collection in interactive manner. It is expected to be useful for various tasks such as text mining and document retrieval. As the result of document clustering consists of multiple objects such as clusters (document groups), documents, and words, each of those should be presented to users in different ways. Based on this consideration, the proposed system employs multiple views, each of which is designed for specific object such as document and keyword. A prototype system is implemented on TETDM (Total Environment for Text Data Mining), which is one of environments for developing text data mining tools. As it can provide the mechanism of coordination between modules, we decided to use it for developing the prototype system. The proposed system classifies information to be presented into 4 levels: clusters, document, bag of words, and word, each of which is displayed with different views. Experimental results with test participants show the effectiveness of the proposed system.


2003 ◽  
Vol 19 (Suppl 1) ◽  
pp. i331-i339 ◽  
Author(s):  
A. S. Yeh ◽  
L. Hirschman ◽  
A. A. Morgan

2018 ◽  
Vol 13 (1) ◽  
pp. 183-194 ◽  
Author(s):  
Megan Senseney ◽  
Eleanor Dickson ◽  
Beth Namachchivaya ◽  
Bertram Ludäscher

Text data mining and analysis has emerged as a viable research method for scholars, following the growth of mass digitization, digital publishing, and scholarly interest in data re-use. Yet the texts that comprise datasets for analysis are frequently protected by copyright or other intellectual property rights that limit their access and use. This article discusses the role of libraries at the intersection of data mining and intellectual property, asserting that academic libraries are vital partners in enabling scholars to effectively incorporate text data mining into their research. We report on activities leading up to an IMLS-funded National Forum of stakeholders and discuss preliminary findings from a systematic literature review, as well as initial results of interviews with forum stakeholders. Emerging themes suggest the need for a multi-pronged distributed approach that includes a public campaign for building awareness and advocacy, development of best practice guides for library support services and training, and international efforts toward data standardization and copyright harmonization.


Sign in / Sign up

Export Citation Format

Share Document