scholarly journals USING TEXT MINING AND RANDOM FORESTS FOR AUTHOR IDENTIFICATION. THE CASE OF CLARIVATE WEB OF SCIENCE DATABASE

2019 ◽  
Author(s):  
Marin FOTACHE
Author(s):  
Miquel Pans ◽  
Joaquin Madera ◽  
Luís-Millan González ◽  
Maite Pellicer-Chenoll

It is currently difficult to have a global state of the art vision of certain scientific topics. In the field of physical activity (PA) and exercise, this is due to information overload. The present study aims to provide a solution by analysing a large mass of scientific articles using text mining (TM). The purpose was to analyse what is being investigated in the PA health field on young people from primary, secondary and higher education. Titles and abstracts published in the Web of Science (WOS) database were analysed using TM on 24 November 2020, and after removing duplicates, 85,368 remained. The results show 9960 (unique) words and the most frequently used bi-grams and tri-grams. A co-occurrence network was also generated. ‘Health’ was the first term of importance and the most repeated bi-grams and tri-grams were ‘body_mass’ and ‘body_mass_index’. The analyses of the 20 topics identified focused on health-related terms, the social sphere, sports performance and research processes. It also found that the terms health and exercise have become more important in recent years.


Parasitology ◽  
2020 ◽  
Vol 147 (14) ◽  
pp. 1643-1657
Author(s):  
John T. Ellis ◽  
Bethany Ellis ◽  
Antonio Velez-Estevez ◽  
Michael P. Reichel ◽  
Manuel J. Cobo

AbstractBibliometric methods were used to analyse the major research trends, themes and topics over the last 30 years in the parasitology discipline. The tools used were SciMAT, VOSviewer and SWIFT-Review in conjunction with the parasitology literature contained in the MEDLINE, Web of Science, Scopus and Dimensions databases. The analyses show that the major research themes are dynamic and continually changing with time, although some themes identified based on keywords such as malaria, nematode, epidemiology and phylogeny are consistently referenced over time. We note the major impact of countries like Brazil has had on the literature of parasitology research. The increase in recent times of research productivity on ‘antiparasitics’ is discussed, as well as the change in emphasis on different antiparasitic drugs and insecticides over time. In summary, innovation in parasitology is global, extensive, multidisciplinary, constantly evolving and closely aligned with the availability of technology.


2020 ◽  
Vol 36 (3) ◽  
Author(s):  
João Paulo Ferreira ◽  
Richard Miskolci

Abstract: This study reviewed articles originated in Brazil, in the United Kingdom, and in the United States from 1970 to September 2018 in the Web of Science database. Text mining techniques were used, and a predominantly qualitative analysis was performed, including correspondence analysis and sentiment analysis using the R Software (version 3.5.0) tools. Results show a repathologization of homosexuality in the gerontological knowledge production. This includes studies performed in 51 areas of knowledge in the three countries. That was followed by the depsychiatrization of homosexuality during the peak of deaths caused by AIDS, and its consequent recognition as an epidemiological threat. The article concludes reviewing the collected biomarkers, such as “sexual”, “risk”, “MSM”, and “HIV/AIDS”, which prove the progressive impact of sexual panic in gerontology studies and also associates AIDS with masculine homosexuality.


2019 ◽  
Author(s):  
Muhammad Malik Ar-Rahiem

Analisis bibliometrik dilakukan terhadap 40 publikasi tentang air tanah di Cekungan Bandung yang berada dalam basis data Web of Science. Analisis yang dilakukan yaitu analisis Co-Authorship dan analisis Text-Mining menggunakan metode perhitungan biner dan perhitungan penuh menggunakan piranti lukan VosViewer. Terdapat 125 penulis dan 1296 kata/istilah yang muncul dan kemudian menjadi dasar analisis. Hasil dari analisis ini menunjukkan bahwa riset air tanah di Cekungan Bandung dapat dibagi menjadi dua kelompok besar, yaitu kelompok penurunan muka tanah dan kelompok air tanah secara umum. Kelompok air tanah dapat dibagi lagi menjadi 3 kelompok kecil, yaitu kelompok pemodelan akuifer, pemodelan kontaminasi, dan pemodelan pencampuran air tanah. Analisis bibliometrik sangat membantu dalam memetakan perkembangan penelitian di Cekungan Bandung. Meski demikian masih banyak kekurangan dari hasil analisis bibliometrik, terutama karena terbatasnya metadata publikasi yang ada dalam basis data Web of Science. Ke depannya, metadata publikasi lain, terutama dari Google Scholar perlu dikonversi ke dalam templat yang bisa dianalisis menggunakan metode bibliometrik.


2013 ◽  
Vol 859 ◽  
pp. 280-283
Author(s):  
Shiang Hau Wu ◽  
Jiann Jong Guo

The study aimed at analyzing the keywords of the oil exploration research papers abstracts in 2012 and 2013 and using the random forests model to make the classification analysis in order to find the importance and similarities of 2012 and 2013 research trends. The contribution of the study included the following two points. First, the study used the text mining method in order to explore the content of oil exploration research paper abstracts. Second, the study applied the AdaBoost classification analysis to explore the relationship of the keywords between the two years’ keywords.


2020 ◽  
Vol 16 (45) ◽  
pp. 337
Author(s):  
Denise Fukumi Tsunoda ◽  
Paulo Sergio da Conceição Moreira ◽  
André José Ribeiro Guimarães

Almeja identificar métodos de machine learning empregados na automatização de revisões sistemáticas. Analisa, baseado na recomendação Preferred Reporting Items for Systematic Reviews, 29 de 211 documentos científicos recuperados das bases Web of Science e Scopus, sem restrição de idioma ou recorte temporal. Demonstra a tendência de crescimento da produção relacionada ao tema, com 65,51% dos registros publicados após 2016. Indica o interesse dos pesquisadores em técnicas de text mining, sendo a palavra-chave mais utilizada pelos autores. Em relação aos métodos encontrados, evidencia o algoritmo Support Vector Machine como o mais frequente, sendo utilizado em oito trabalhos, seguido pelas heurísticas Redes Neurais Artificiais e Naïve Bayes, com duas aplicações cada. Ressalta a aplicação majoritária dos métodos à área médica. Conclui, entretanto, que nenhuma das ferramentas identificadas oferece uma solução aplicável a qualquer área do conhecimento.


The world came across the worst pandemic of all times in the year 2020 due to the outburst of Severe Acute Respiratory Syndrome Coronavirus-2 or Covid-19. All the questions about this outbreak were piled up and research was fast growing [1]. A study showing that in precisely just six months, substantial databases have been swamped with research articles, news, notes, and editorial related to coronavirus. It estimates that 23,634 distinctly published articles have been indexed on Web of Science and Scopus between 1 January and 30 June 2020. Imagine the data that is with us today!! Approximately 200,000 scholarly articles have been published related to Covid-19. This tells us that there is a need for simplifying search results to get answers to high priority questions for users specifically scientists. Currently, document clustering tools are being used in many areas. A similar clustering tool can be made particularly for Covid-19 which will help scientists and researchers get answers to high priority questions about this pandemic. In this paper, we are discussing about the process of text mining, text categorization and, text clustering. Also, a comparison of the algorithms used for clustering particularly in text data.


2021 ◽  
Vol 13 (17) ◽  
pp. 9846
Author(s):  
Tian Tian ◽  
Stijn Speelman

Rural planning is in a state of flux, covering a range of topics. The objectives of planning have evolved over the years. To get an overview of the evolving themes and narratives on rural planning in China, a literature review is conducted here using text mining considering 145 papers published in Web of Science. Attention is given to trends over time in terms of the topics covered. Six evolving themes are revealed, namely: providing affordable and decent life under industrialization and urbanization progress, national ecological programs and practices, building a new (socialist) countryside and rural−urban relationship in planning, land planning and restructuring, rural tourism planning and activities, and other themes. It is highlighted that strategies and knowledge of “development” are a common instructional epistemology among agro-industrialism, agro-ruralism, scientific rationalism, and “economy oriented” humanism.


Sign in / Sign up

Export Citation Format

Share Document