A Prospective Comparison of Evidence Synthesis Search Strategies Developed With and Without Text-Mining Tools

Background: In an era of explosive growth in biomedical evidence, improving systematic review (SR) search processes is increasingly critical. Text-mining tools (TMTs) are a potentially powerful resource to improve and streamline search strategy development. Two types of TMTs are especially of interest to searchers: word frequency (useful for identifying most used keyword terms, e.g., PubReminer) and clustering (visualizing common themes, e.g., Carrot2). Objectives: The objectives of this study were to compare the benefits and trade-offs of searches with and without the use of TMTs for evidence synthesis products in real world settings. Specific questions included: (1) Do TMTs decrease the time spent developing search strategies? (2) How do TMTs affect the sensitivity and yield of searches? (3) Do TMTs identify groups of records that can be safely excluded in the search evaluation step? (4) Does the complexity of a systematic review topic affect TMT performance? In addition to quantitative data, we collected librarians' comments on their experiences using TMTs to explore when and how these new tools may be useful in systematic review search¬¬ creation. Methods: In this prospective comparative study, we included seven SR projects, and classified them into simple or complex topics. The project librarian used conventional “usual practice” (UP) methods to create the MEDLINE search strategy, while a paired TMT librarian simultaneously and independently created a search strategy using a variety of TMTs. TMT librarians could choose one or more freely available TMTs per category from a pre-selected list in each of three categories: (1) keyword/phrase tools: AntConc, PubReMiner; (2) subject term tools: MeSH on Demand, PubReMiner, Yale MeSH Analyzer; and (3) strategy evaluation tools: Carrot2, VOSviewer. We collected results from both MEDLINE searches (with and without TMTs), coded every citation’s origin (UP or TMT respectively), deduplicated them, and then sent the citation library to the review team for screening. When the draft report was submitted, we used the final list of included citations to calculate the sensitivity, precision, and number-needed-to-read for each search (with and without TMTs). Separately, we tracked the time spent on various aspects of search creation by each librarian. Simple and complex topics were analyzed separately to provide insight into whether TMTs could be more useful for one type of topic or another. Results: Across all reviews, UP searches seemed to perform better than TMT, but because of the small sample size, none of these differences was statistically significant. UP searches were slightly more sensitive (92% [95% confidence intervals (CI) 85–99%]) than TMT searches (84.9% [95% CI 74.4–95.4%]). The mean number-needed-to-read was 83 (SD 34) for UP and 90 (SD 68) for TMT. Keyword and subject term development using TMTs generally took less time than those developed using UP alone. The average total time was 12 hours (SD 8) to create a complete search strategy by UP librarians, and 5 hours (SD 2) for the TMT librarians. TMTs neither affected search evaluation time nor improved identification of exclusion concepts (irrelevant records) that can be safely removed from the search set. Conclusion: Across all reviews but one, TMT searches were less sensitive than UP searches. For simple SR topics (i.e., single indication–single drug), TMT searches were slightly less sensitive, but reduced time spent in search design. For complex SR topics (e.g., multicomponent interventions), TMT searches were less sensitive than UP searches; nevertheless, in complex reviews, they identified unique eligible citations not found by the UP searches. TMT searches also reduced time spent in search strategy development. For all evidence synthesis types, TMT searches may be more efficient in reviews where comprehensiveness is not paramount, or as an adjunct to UP for evidence syntheses, because they can identify unique includable citations. If TMTs were easier to learn and use, their utility would be increased.

Download Full-text

Development of Text Mining Tools for Information Retrieval from Patents

Advances in Intelligent Systems and Computing - 11th International Conference on Practical Applications of Computational Biology & Bioinformatics ◽

10.1007/978-3-319-60816-7_9 ◽

2017 ◽

pp. 66-73 ◽

Cited By ~ 2

Author(s):

Tiago Alves ◽

Rúben Rodrigues ◽

Hugo Costa ◽

Miguel Rocha

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Mining Tools

Download Full-text

Detecting Health-Related Privacy Leaks in Social Networks Using Text Mining Tools

Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-642-38457-8_3 ◽

2013 ◽

pp. 25-39 ◽

Cited By ~ 4

Author(s):

Kambiz Ghazinour ◽

Marina Sokolova ◽

Stan Matwin

Keyword(s):

Social Networks ◽

Text Mining ◽

Health Related ◽

Mining Tools

Download Full-text

Text Mining

Handbook of Research on Public Information Technology ◽

10.4018/978-1-59904-857-4.ch054 ◽

2008 ◽

pp. 592-603 ◽

Cited By ~ 2

Author(s):

Antonina Durfee

Keyword(s):

Text Mining ◽

Deception Detection ◽

Text Summarization ◽

Authorship Attribution ◽

Venture Capitalists ◽

Help Desk ◽

News Agencies ◽

Textual Databases ◽

Available Information ◽

Mining Tools

Massive quantities of information continue accumulating at about 1.5 billion gigabytes per year in numerous repositories held at news agencies, at libraries, on corporate intranets, on personal computers, and on the Web. A large portion of all available information exists in the form of text. Researchers, analysts, editors, venture capitalists, lawyers, help desk specialists, and even students are faced with text analysis challenges. Text mining tools aim at discovering knowledge from textual databases by isolating key bits of information from large amounts of text, identifying relationships among documents. Text mining technology is used for plagiarism and authorship attribution, text summarization and retrieval, and deception detection.

Download Full-text

Semantic Interation, Text Mining, Tools and Technologies

Artificial Intelligence ◽

10.4018/978-1-5225-1759-7.ch056 ◽

2017 ◽

pp. 1361-1378

Author(s):

Chandrakant Ekkirala

Keyword(s):

Text Mining ◽

Mining Tools

Download Full-text

An Application of Text Mining to Capture and Analyze eWOM

Advances in Marketing, Customer Relationship Management, and E-Services - Capturing, Analyzing, and Managing Word-of-Mouth in the Digital Marketplace ◽

10.4018/978-1-4666-9449-1.ch010 ◽

2016 ◽

pp. 168-186 ◽

Cited By ~ 2

Author(s):

Taşkın Dirsehan

Keyword(s):

Data Mining ◽

Text Mining ◽

Customer Relationship ◽

Competitive Advantages ◽

Strategic Decisions ◽

Data Mining Tool ◽

Mining Tool ◽

The Moment ◽

Mining Tools

Marketing concept has progressed through different phases of evolution in the past. At the moment, customer relationship management is considered as the last era of marketing development. The main purpose of this approach is to build long-term oriented profitable relationships with customers. So, companies should know better their customers. This knowledge can be created through a deeper analysis of companies' data with data mining tools. Companies which are able to use data mining tools will gain strong competitive advantages for their strategic decisions. Hotel industry is selected in this study, since it provides a warehouse of customer comments from which precious knowledge can be obtained if text mining as a data mining tool is used appropriately. Thus, this study attempts to explain the stages of text mining with the use of Rapidminer. As a result, different approaches according to the customer satisfaction/dissatisfaction are discussed to build competitive advantages.

Download Full-text

Text Mining Tools: Techniques and Visualizations

Exploring Big Historical Data ◽

10.1142/9781783266104_0003 ◽

2015 ◽

pp. 73-111

Keyword(s):

Text Mining ◽

Mining Tools

Download Full-text

Text Mining, Tools

Encyclopedia of Systems Biology ◽

10.1007/978-1-4419-9863-7_181 ◽

2013 ◽

pp. 2160-2162

Author(s):

Jörg Hakenberg

Keyword(s):

Text Mining ◽

Mining Tools

Download Full-text

Text mining tools for extracting information about microbial biodiversity in food

Food Microbiology ◽

10.1016/j.fm.2018.04.011 ◽

2019 ◽

Vol 81 ◽

pp. 63-75 ◽

Cited By ~ 5

Author(s):

Estelle Chaix ◽

Louise Deléger ◽

Robert Bossy ◽

Claire Nédellec

Keyword(s):

Text Mining ◽

Microbial Biodiversity ◽

Mining Tools

Download Full-text

Networks Models of Actin Dynamics during Spermatozoa Postejaculatory Life: A Comparison among Human-Made and Text Mining-Based Models

BioMed Research International ◽

10.1155/2016/9795409 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Nicola Bernabò ◽

Alessandra Ordinelli ◽

Marina Ramal Sanchez ◽

Mauro Mattioli ◽

Barbara Barboni

Keyword(s):

Text Mining ◽

Literature Search ◽

Human Spermatozoa ◽

Actin Dynamics ◽

Hierarchical Architecture ◽

Scale Free ◽

Fertilizing Ability ◽

Actin Remodelling ◽

Mining Tools ◽

Biological Context

Here we realized a networks-based model representing the process of actin remodelling that occurs during the acquisition of fertilizing ability of human spermatozoa (HumanMade_ActinSpermNetwork, HM_ASN). Then, we compared it with the networks provided by two different text mining tools: Agilent Literature Search (ALS) and PESCADOR. As a reference, we used the data from the online repository Kyoto Encyclopaedia of Genes and Genomes (KEGG), referred to the actin dynamics in a more general biological context. We found that HM_ALS and the networks from KEGG data shared the same scale-free topology following the Barabasi-Albert model, thus suggesting that the information is spread within the network quickly and efficiently. On the contrary, the networks obtained by ALS and PESCADOR have a scale-free hierarchical architecture, which implies a different pattern of information transmission. Also, the hubs identified within the networks are different: HM_ALS and KEGG networks contain as hubs several molecules known to be involved in actin signalling; ALS was unable to find other hubs than “actin,” whereas PESCADOR gave some nonspecific result. This seems to suggest that the human-made information retrieval in the case of a specific event, such as actin dynamics in human spermatozoa, could be a reliable strategy.

Download Full-text