Interactive Text Mining with Pipeline Pilot: A Bibliographic Web-Based Tool for PubMed

2009 ◽  
Vol 9 (3) ◽  
pp. 366-374 ◽  
Author(s):  
S. Vellay ◽  
N. Miller Latimer ◽  
G. Paillard
2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Chih-Hsuan Wei ◽  
Hung-Yu Kao ◽  
Zhiyong Lu

The automatic recognition of gene names and their associated database identifiers from biomedical text has been widely studied in recent years, as these tasks play an important role in many downstream text-mining applications. Despite significant previous research, only a small number of tools are publicly available and these tools are typically restricted to detecting only mention level gene names or only document level gene identifiers. In this work, we report GNormPlus: an end-to-end and open source system that handles both gene mention and identifier detection. We created a new corpus of 694 PubMed articles to support our development of GNormPlus, containing manual annotations for not only gene names and their identifiers, but also closely related concepts useful for gene name disambiguation, such as gene families and protein domains. GNormPlus integrates several advanced text-mining techniques, including SimConcept for resolving composite gene names. As a result, GNormPlus compares favorably to other state-of-the-art methods when evaluated on two widely used public benchmarking datasets, achieving 86.7% F1-score on the BioCreative II Gene Normalization task dataset and 50.1% F1-score on the BioCreative III Gene Normalization task dataset. The GNormPlus source code and its annotated corpus are freely available, and the results of applying GNormPlus to the entire PubMed are freely accessible through our web-based tool PubTator.


2018 ◽  
Vol 22 (7) ◽  
pp. 1471-1488 ◽  
Author(s):  
Antonio Usai ◽  
Marco Pironti ◽  
Monika Mital ◽  
Chiraz Aouina Mejri

Purpose The aim of this work is to increase awareness of the potential of the technique of text mining to discover knowledge and further promote research collaboration between knowledge management and the information technology communities. Since its emergence, text mining has involved multidisciplinary studies, focused primarily on database technology, Web-based collaborative writing, text analysis, machine learning and knowledge discovery. However, owing to the large amount of research in this field, it is becoming increasingly difficult to identify existing studies and therefore suggest new topics. Design/methodology/approach This article offers a systematic review of 85 academic outputs (articles and books) focused on knowledge discovery derived from the text mining technique. The systematic review is conducted by applying “text mining at the term level, in which knowledge discovery takes place on a more focused collection of words and phrases that are extracted from and label each document” (Feldman et al., 1998, p. 1). Findings The results revealed that the keywords extracted to be associated with the main labels, id est, knowledge discovery and text mining, can be categorized in two periods: from 1998 to 2009, the term knowledge and text were always used. From 2010 to 2017 in addition to these terms, sentiment analysis, review manipulation, microblogging data and knowledgeable users were the other terms frequently used. Besides this, it is possible to notice the technical, engineering nature of each term present in the first decade. Whereas, a diverse range of fields such as business, marketing and finance emerged from 2010 to 2017 owing to a greater interest in the online environment. Originality/value This is a first comprehensive systematic review on knowledge discovery and text mining through the use of a text mining technique at term level, which offers to reduce redundant research and to avoid the possibility of missing relevant publications.


2019 ◽  
Vol 28 (01) ◽  
pp. 179-180

Abdellaoui R, Foulquié P, Texier N, Faviez C, Burgun A, Schück S. Detection of Cases of Noncompliance to Drug Treatment in Patient Forum Posts: Topic Model Approach. J Med Internet Res 2018;20(3):e85 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5874436/ Jones J, Pradhan M, Hosseini M, Kulanthaivel A, Hosseini M. Novel Approach to Cluster Patient-Generated Data Into Actionable Topics: Case Study of a Web-Based Breast Cancer. JMIR Med Inform 2018;6(4):e45 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6293240/ Park A, Conway M, Chen AT. Examining Thematic Similarity, Difference, and Membership in Three Online Mental Health Communities from Reddit: A Text Mining and Visualization Approach. Comput Human Behav 2018 Jan;78:98-112 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5810583/


2020 ◽  
Vol 17 (8) ◽  
pp. 3577-3580
Author(s):  
M. S. Roobini ◽  
B. Nikhil Chowdary ◽  
J. Madhav Chowdary ◽  
J. Aruna ◽  
Anitha Ponraj

Online reviews have an incredible effect on the present business and trade. The development of web-based business organizations has pulled in numerous buyers since they provide a scope of items on aggressive costs. The main aspect most buyers depends on while doing online shopping is the review of items for closing the choice of object. Basic leadership for the acquisition of online items generally relies upon reviews given by the clients. Henceforth, deft people or gatherings attempt to control item surveys for their advantages. In perspective on the impacts of these phony surveys, various systems to recognize these were proposed in the research. Because of reviews and its nature, this is hard to group these utilizing only one classifier. Henceforth, the present research discusses a classifier for dealing with identifying such phony reviews. The study also presents the text mining techniques both supervised and semi-supervised to identify counterfeit online reviews just as looks at the effectiveness of the two strategies on the datasets with hotel surveys.


In this era of competition there is a culture of online reviews or feedbacks. These feedbacks may be about any product or service. However, major issues are their unstructured textual form and big number. It means every user gives feedback in own style. Study and analyzing of such unorganized big number of feedbacks that are growing every year becomes herculean task. This paper describes about mining of structured data (table) and unstructured data (text) both. An application from academic environment for structured and unstructured form of data is considered and discussed to enhance understanding and easiness of researcher. Stanford Parser plays a very useful role to understand the semantic of a sentence. It gives a base that how to separate data from the wellspring of information accessible in the literary structure like web based life, tweets, news, books and so on. It is also helpful to judge a teaching learning process in terms of teacher’s performance and subject’s weakness if any. This paper has five sections first about introduction, second about literature of text mining and its techniques, third about proposed work and result, fourth about future perspectives and finally fifth as a conclusion.


Sign in / Sign up

Export Citation Format

Share Document