ISKE: An unsupervised automatic keyphrase extraction approach using the iterated sentences based on graph method

2021 ◽  
Vol 223 ◽  
pp. 107014
Author(s):  
Ling Chi ◽  
Liang Hu
2001 ◽  
Vol 321 (1) ◽  
pp. 102-131 ◽  
Author(s):  
M.V. Sofin ◽  
E.Yu. Kerimov ◽  
A.E. Chastukhin ◽  
N.A. Bazhanova ◽  
Yu.V. Balykova ◽  
...  

2021 ◽  
Author(s):  
Ling Chai ◽  
Xiaoming Wu ◽  
Yuan Ni ◽  
Guotong Xie ◽  
Liyu Cao ◽  
...  

BACKGROUND With the increase in the number of biomedical scientific publications, it is of great value to characterize the research status of subtopics in this field, especially in the specific field of diseases. However, there has not been a fully automated pipeline for mining and analysing research hotspots in this field. OBJECTIVE We propose a completely automatic method based on natural language processing technology to analyize scientific innovations in a specific disease area. METHODS The whole pipeline consists of three steps, i.e. keyphrase extraction, clustering and cluster naming. The pipeline expands the existing literature analysis methods (including keyphrase extraction, document clustering, and paper ranking), adds advanced semantic mining technology (contextualized embeddings from pre-trained language models), and designs a document cluster naming strategy based on core document mining and topic-related phrase mining. With this pipeline, a full picture of the field of a specific disease is established. Distinct document clusters are generated to describe various subfields in disease-related research. Core documents and topic-related phrases are used to name clusters to interpret the concerns that researchers care about. Besides, the relations between clusters are analysed. Finally, several important clusters are analysed, whose core citation paths illustrate the research roadmap for a certain subfield and whose phrases directly describe the hotspots in each subfield. RESULTS We applied the method in the field of cataracts. From the 35117 cataract publications, the proposed method has extracted phrases with a high frequency like cataract extraction, cataract formation, intraocular pressure, etc. The method also found the most important documents in this field, which reveal the flow of research hotspots over time. 23 communities are generated and the top 10 topic-related phrases and core documents are extracted to name the communities. The cluster with the most paper is mainly about cataract formation. The cluster with the most high-impact papers focuses on common cataract diseases related to cataract epidemiology surveys. The cluster with the highest novelty and the highest progressiveness is related to the femtosecond laser technique. CONCLUSIONS This fully automated method can achieve the full picture of the research status of the field of a specific disease, without expert annotation.


Sign in / Sign up

Export Citation Format

Share Document