Analysis of the yearbook from the Korea Meteorological Administration using a text-mining algorithm

Author(s):  
Yung-Seop Lee ◽  
Changwon Lim ◽  
Hyunseok Sun
2021 ◽  
Vol 11 (15) ◽  
pp. 6834
Author(s):  
Pradeepa Sampath ◽  
Nithya Shree Sridhar ◽  
Vimal Shanmuganathan ◽  
Yangsun Lee

Tuberculosis (TB) is one of the top causes of death in the world. Though TB is known as the world’s most infectious killer, it can be treated with a combination of TB drugs. Some of these drugs can be active against other infective agents, in addition to TB. We propose a framework called TREASURE (Text mining algoRithm basEd on Affinity analysis and Set intersection to find the action of tUberculosis dRugs against other pathogEns), which particularly focuses on the extraction of various drug–pathogen relationships in eight different TB drugs, namely pyrazinamide, moxifloxacin, ethambutol, isoniazid, rifampicin, linezolid, streptomycin and amikacin. More than 1500 research papers from PubMed are collected for each drug. The data collected for this purpose are first preprocessed, and various relation records are generated for each drug using affinity analysis. These records are then filtered based on the maximum co-occurrence value and set intersection property to obtain the required inferences. The inferences produced by this framework can help the medical researchers in finding cures for other bacterial diseases. Additionally, the analysis presented in this model can be utilized by the medical experts in their disease and drug experiments.


2015 ◽  
Vol 6 (4) ◽  
pp. 35-49 ◽  
Author(s):  
Laurent Issertial ◽  
Hiroshi Tsuji

This paper proposes a system called CFP Manager specialized on IT field and designed to ease the process of searching conference suitable to one's need. At present, the handling of CFP faces two problems: for emails, the huge quantity of CFP received can be easily skimmed through. For websites, the reviewing of some of the main CFP aggregators available online points out the lack of usable criteria. This system proposes to answer to these problems via its architecture consisting of three components: firstly an Information Extraction module extracting relevant information (as date, location, etc...) from CFP using rule based text mining algorithm. The second component enriches the now extracted data with external one from ontology models. Finally the last one displays the said data and allows the end user to perform complex queries on the CFP dataset and thus allow him to only access to CFP suitable for him. In order to validate the authors' proposal, they eventually process the well-known precision / recall metric on our information extraction component with an average of 0.95 for precision and 0.91 for recall on three different 100 CFP dataset. This paper finally discusses the validity of our approach by confronting our system for different queries with two systems already available online (WikiCFP and IEEE Conference Search) and basic text searching approach standing for searching in an email box. On a 100 CFP dataset with the wide variety of usable data and the possibility to perform complex queries we surpass basic text searching method and WikiCFP by not returning the false positive usually returned by them and find a result close to the IEEE system.


2019 ◽  
Vol 59 (3) ◽  
pp. 1519-1552 ◽  
Author(s):  
Lu Wei ◽  
Guowen Li ◽  
Xiaoqian Zhu ◽  
Jianping Li

Author(s):  
Joanna Tsenn ◽  
Julie S. Linsey ◽  
Daniel A. McAdams

Natural materials are able to achieve a wide range and combination of properties through the arrangement of the material’s components. These biological materials are often more effective and better suited to their function than engineered materials, even with the use of a limited set of components. By mimicking a biological material’s component arrangement, or structure, man-made bioinspired materials can achieve improved properties as well. While considerable research has been conducted on biological materials, identifying the beneficial structural design principles can be time-intensive for a materials designer. Previously, a text mining algorithm and tool were developed to quickly extract passages describing property-specific structural design principles from a corpus of materials journals. Although the tool identified over 90% of the principles (recall), many irrelevant passages were returned as well with approximately 32% of the passages being useful (precision). This paper discusses approaches to refine the program in order to improve precision. The text classification techniques of machine learning classifiers, statistical features, and part-of-speech analyses, are evaluated for effectiveness in sorting passages into relevant and irrelevant classes. Manual identification of patterns in the returned passages is also employed to create a rule-based method, resulting in an updated algorithm. An evaluation comparing the revised algorithm to the previously developed algorithm is completed using a new set of journal articles. Although the revised algorithm’s recall was reduced to 80%, the precision increased to 45% and the number of returned passages was reduced by 22%, allowing a materials designer to more quickly identify potentially useful structures. The paper concludes with suggestions to improve the program’s usefulness and scope for future work.


2020 ◽  
Author(s):  
Emmanuel Bonnet ◽  
Daurès Jean-Pierre ◽  
Landais Paul

Abstract Background: Literature search is challenging when thousands of articles are potentially involved. To facilitate literature search we created TEMAS a Text Mining Algorithm-assisted Search tool that we compared to a PubMed reference search (RS) in the context of etiological epidemiology.Methods: The 4 steps of TEMAS are: 1) a classic PubMed global search 2) a first sort removing articles without abstracts or containing off-topic terms 3) a clustering step with a descending hierarchical classification regrouping articles in independent classes 4) a final sort extracting from the targeted class the abstracts containing the terms of interest, with a link to the corresponding PubMed articles. Validation was performed for risk factors of breast cancer. We estimated the precision and recall rate compared to RS. Average precision and discounted cumulative gain (DCG) were also computed to perform a ranking-based evaluation. We also compared TEMAS results with articles selected in two meta-analyses.Results: For risk factors of breast cancer, breastfeeding, mammographic density, oral contraceptive, and menarche were explored. TEMAS consistently increased precision vs RS (from 23% to 32%), with a recall rate from 95% to 97%, and divided the number of selected articles to read from 2.3 to 4.8 times. Mean average precision for 100 articles was 47.4% for TEMAS vs 20.9% for PubMed ranked by best match, and DCG showed a consistent improvement for TEMAS compared to PubMed best match.Discussion: TEMAS divided the results of a literature search by 3.2, and improved the precision rate, the average precision, and the DCG compared to RS for epidemiological studies. Reducing the number of selected articles inevitably impacted the recall rate. However, it remained satisfactory and did not bias the corpus of information. Moreover, the recall rate was 100% for the two meta-analyses we analyzed, which suggests that the loss of recall rate observed above concerned articles not relevant enough to be included in the meta-analyses.Conclusion: TEMAS provides a user-friendly interface for non-specialists of literature search confronted with thousands of articles and appeared useful for meta-analyses.


2020 ◽  
Vol 3 (2) ◽  
pp. 107-121
Author(s):  
Triss Ashton ◽  
Nicholas Evangelopoulos ◽  
Audhesh Paswan ◽  
Victor R. Prybutok ◽  
Robert Pavur
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document