text mining tool
Recently Published Documents


TOTAL DOCUMENTS

60
(FIVE YEARS 24)

H-INDEX

11
(FIVE YEARS 2)

Author(s):  
Sonia Garcia Gonzalez-Moral ◽  
Aalya Al-Assaf ◽  
Savitri Pandey ◽  
Oladapo Ogunbayo ◽  
Dawn Craig

IntroductionThe COVID-19 pandemic led to a significant surge in clinical research activities in the search for effective and safe treatments. Attempting to disseminate early findings from clinical trials in a bid to accelerate patient access to promising treatments, a rise in the use of preprint repositories was observed. In the UK, NIHR Innovation Observatory (NIHRIO) provided primary horizon-scanning intelligence on global trials to a multi-agency initiative on COVID-19 therapeutics. This intelligence included signals from preliminary results to support the selection, prioritisation and access to promising medicines.MethodsA semi-automated text mining tool in Python3 used trial IDs (identifiers) of ongoing and completed studies selected from major clinical trial registries according to pre-determined criteria. Two sources, BioRxiv and MedRxiv are searched using the IDs as search criteria. Weekly, the tool automatically searches, de-duplicates, excludes reviews, and extracts title, authors, publication date, URL and DOI. The output produced is verified by two reviewers that manually screen and exclude studies that do not report results.ResultsA total of 36,771 publications were uploaded to BioRxiv and MedRxiv between March 3 and November 9 2020. Approximately 20–30 COVID-19 preprints per week were pre-selected by the tool. After manual screening and selection, a total of 123 preprints reporting clinical trial preliminary results were included. Additionally, 50 preprints that presented results of other study types on new vaccines and repurposed medicines for COVID-19 were also reported.ConclusionsUsing text mining for identification of clinical trial preliminary results proved an efficient approach to deal with the great volume of information. Semi-automation of searching increased efficiency allowing the reviewers to focus on relevant papers. More consistency in reporting of trial IDs would support automation. A comparison of accuracy of the tool on screening titles/abstract or full papers may help to support further refinement and increase efficiency gains.This project is funded by the NIHR [(HSRIC-2016-10009)/Innovation Observatory]. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nícia Rosário-Ferreira ◽  
Victor Guimarães ◽  
Vítor S. Costa ◽  
Irina S. Moreira

Abstract Background Blood cancers (BCs) are responsible for over 720 K yearly deaths worldwide. Their prevalence and mortality-rate uphold the relevance of research related to BCs. Despite the availability of different resources establishing Disease-Disease Associations (DDAs), the knowledge is scattered and not accessible in a straightforward way to the scientific community. Here, we propose SicknessMiner, a biomedical Text-Mining (TM) approach towards the centralization of DDAs. Our methodology encompasses Named Entity Recognition (NER) and Named Entity Normalization (NEN) steps, and the DDAs retrieved were compared to the DisGeNET resource for qualitative and quantitative comparison. Results We obtained the DDAs via co-mention using our SicknessMiner or gene- or variant-disease similarity on DisGeNET. SicknessMiner was able to retrieve around 92% of the DisGeNET results and nearly 15% of the SicknessMiner results were specific to our pipeline. Conclusions SicknessMiner is a valuable tool to extract disease-disease relationship from RAW input corpus.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
M. Y. Eileen C. van der Stoep ◽  
Dagmar Berghuis ◽  
Robbert G. M. Bredius ◽  
Emilie P. Buddingh ◽  
Alexander B. Mohseny ◽  
...  

AbstractTreosulfan is increasingly used as myeloablative agent in conditioning regimen prior to allogeneic hematopoietic stem cell transplantation (HSCT). In our pediatric HSCT program, myalgia was regularly observed after treosulfan-based conditioning, which is a relatively unknown side effect. Using a natural language processing and text-mining tool (CDC), we investigated whether treosulfan compared with busulfan was associated with an increased risk of myalgia. Furthermore, among treosulfan users, we studied the characteristics of given treatment of myalgia, and studied prognostic factors for developing myalgia during treosulfan use. Electronic Health Records (EHRs) until 28 days after HSCT were screened using the CDC for myalgia and 22 synonyms. Time to myalgia, location of pain, duration, severity and drug treatment were collected. Pain severity was classified according to the WHO pain relief ladder. Logistic regression was performed to assess prognostic factors. 114 patients received treosulfan and 92 busulfan. Myalgia was reported in 37 patients; 34 patients in the treosulfan group and 3 patients in the busulfan group (p = 0.01). In the treosulfan group, median time to myalgia was 7 days (0–12) and median duration of pain was 19 days (4–73). 44% of patients needed strong acting opiates and adjuvant medicines (e.g. ketamine). Hemoglobinopathy was a significant risk factor, as compared to other underlying diseases (OR 7.16 95% CI 2.09–30.03, p = 0.003). Myalgia appears to be a common adverse effect of treosulfan in pediatric HSCT, especially in hemoglobinopathy. Using the CDC, EHRs were easily screened to detect this previously unknown side effect, proving the effectiveness of the tool. Recognition of treosulfan-induced myalgia is important for adequate pain management strategies and thereby for improving the quality of hospital stay.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Shannon Sim ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Abstract Background Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We evaluated a machine learning and text mining tool’s ability to (a) automatically extract data elements from randomized trials, and (b) save time compared with manual extraction and verification. Methods For 75 randomized trials, we manually extracted and verified data for 21 data elements. We uploaded the randomized trials to an online machine learning and text mining tool, and quantified performance by evaluating its ability to identify the reporting of data elements (reported or not reported), and the relevance of the extracted sentences, fragments, and overall solutions. For each randomized trial, we measured the time to complete manual extraction and verification, and to review and amend the data extracted by the tool. We calculated the median (interquartile range [IQR]) time for manual and semi-automated data extraction, and overall time savings. Results The tool identified the reporting (reported or not reported) of data elements with median (IQR) 91% (75% to 99%) accuracy. Among the top five sentences for each data element at least one sentence was relevant in a median (IQR) 88% (83% to 99%) of cases. Among a median (IQR) 90% (86% to 97%) of relevant sentences, pertinent fragments had been highlighted by the tool; exact matches were unreliable (median (IQR) 52% [33% to 73%]). A median 48% of solutions were fully correct, but performance varied greatly across data elements (IQR 21% to 71%). Using ExaCT to assist the first reviewer resulted in a modest time savings compared with manual extraction by a single reviewer (17.9 vs. 21.6 h total extraction time across 75 randomized trials). Conclusions Using ExaCT to assist with data extraction resulted in modest gains in efficiency compared with manual extraction. The tool was reliable for identifying the reporting of most data elements. The tool’s ability to identify at least one relevant sentence and highlight pertinent fragments was generally good, but changes to sentence selection and/or highlighting were often required. Protocol https://doi.org/10.7939/DVN/RQPJKS


Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Shannon Sim ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Background. Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We prospectively evaluated an online machine learning and text mining tool’s ability to (a) automatically extract data elements from randomized trials, and (b) save time compared with manual extraction and verification. Methods. For 75 randomized trials published in 2017, we manually extracted and verified data for 21 unique data elements. We uploaded the randomized trials to ExaCT, an online machine learning and text mining tool, and quantified performance by evaluating the tool’s ability to identify the reporting of data elements (reported or not reported), and the relevance of the extracted sentences, fragments, and overall solutions. For each randomized trial, we measured the time to complete manual extraction and verification, and to review and amend the data extracted by ExaCT (simulating semi-automated data extraction). We summarized the relevance of the extractions for each data element using counts and proportions, and calculated the median and interquartile range (IQR) across data elements. We calculated the median (IQR) time for manual and semiautomated data extraction, and overall time savings. Results. The tool identified the reporting (reported or not reported) of data elements with median (IQR) 91 percent (75% to 99%) accuracy. Performance was perfect for four data elements: eligibility criteria, enrolment end date, control arm, and primary outcome(s). Among the top five sentences for each data element at least one sentence was relevant in a median (IQR) 88 percent (83% to 99%) of cases. Performance was perfect for four data elements: funding number, registration number, enrolment start date, and route of administration. Among a median (IQR) 90 percent (86% to 96%) of relevant sentences, pertinent fragments had been highlighted by the system; exact matches were unreliable (median (IQR) 52 percent [32% to 73%]). A median 48 percent of solutions were fully correct, but performance varied greatly across data elements (IQR 21% to 71%). Using ExaCT to assist the first reviewer resulted in a modest time savings compared with manual extraction by a single reviewer (17.9 vs. 21.6 hours total extraction time across 75 randomized trials). Conclusions. Using ExaCT to assist with data extraction resulted in modest gains in efficiency compared with manual extraction. The tool was reliable for identifying the reporting of most data elements. The tool’s ability to identify at least one relevant sentence and highlight pertinent fragments was generally good, but changes to sentence selection and/or highlighting were often required.


Author(s):  
Teresa Berry ◽  
Jeanine Williamson

To meet the research, teaching, and learning needs of their users, academic librarians, particularly those functioning as subject liaisons, are expected to know the institution’s curriculum and research areas so that they can help shape library strategies to meet those needs and to connect users to the library’s resources and services. The present study investigated the use of Refinitiv’s free web demo, Open Calais, as a text mining tool to help learn about the research areas in the University of Tennessee’s Tickle College of Engineering. We investigated the following research questions: What interdisciplinary research areas in the College does Open Calais reveal? What are the differences in Open Calais’ tagging of Scopus and web pages? What terms were uncovered by Open Calais that were unexpected by the subject librarian? The results showed a mixed picture of the usefulness of Open Calais for learning the research areas of the College of Engineering.  


2021 ◽  
Author(s):  
Heejung Yang ◽  
Beomjun Park ◽  
Jinyoung Park ◽  
Jiho Lee ◽  
Hyeon Seok Jang ◽  
...  

AbstractBiomedical databases grow by more than a thousand new publications every day. The large volume of biomedical literature that is being published at an unprecedented rate hinders the discovery of relevant knowledge from keywords of interest to gather new insights and form hypotheses. A text-mining tool, PubTator, helps to automatically annotate bioentities, such as species, chemicals, genes, and diseases, from PubMed abstracts and full-text articles. However, the manual re-organization and analysis of bioentities is a non-trivial and highly time-consuming task. ChexMix was designed to extract the unique identifiers of bioentities from query results. Herein, ChexMix was used to construct a taxonomic tree with allied species among Korean native plants and to extract the medical subject headings unique identifier of the bioentities, which co-occurred with the keywords in the same literature. ChexMix discovered the allied species related to a keyword of interest and experimentally proved its usefulness for multi-species analysis.


2021 ◽  
Vol 124 ◽  
pp. 103357
Author(s):  
G. Fantoni ◽  
E. Coli ◽  
F. Chiarello ◽  
R. Apreda ◽  
F. Dell’Orletta ◽  
...  

IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 134518-134529
Author(s):  
Niku Gronberg ◽  
Antti Knutas ◽  
Timo Hynninen ◽  
Maija Hujala

Sign in / Sign up

Export Citation Format

Share Document