scholarly journals LDAShiny: An R Package for Exploratory Review of Scientific Literature Based on a Bayesian Probabilistic Model and Machine Learning Tools

Mathematics ◽  
2021 ◽  
Vol 9 (14) ◽  
pp. 1671
Author(s):  
Javier De la Hoz-M ◽  
Mª José Fernández-Gómez ◽  
Susana Mendes

In this paper we propose an open source application called LDAShiny, which provides a graphical user interface to perform a review of scientific literature using the latent Dirichlet allocation algorithm and machine learning tools in an interactive and easy-to-use way. The procedures implemented are based on familiar approaches to modeling topics such as preprocessing, modeling, and postprocessing. The tool can be used by researchers or analysts who are not familiar with the R environment. We demonstrated the application by reviewing the literature published in the last three decades on the species Oreochromis niloticus. In total we reviewed 6196 abstracts of articles recorded in Scopus. LDAShiny allowed us to create the matrix of terms and documents. In the preprocessing phase it went from 530,143 unique terms to 3268. Thus, with the implemented options the number of unique terms was reduced, as well as the computational needs. The results showed that 14 topics were sufficient to describe the corpus of the example used in the demonstration. We also found that the general research topics on this species were related to growth performance, body weight, heavy metals, genetics and water quality, among others.

2021 ◽  
Vol 11 ◽  
Author(s):  
Zeyu Zhang ◽  
Zhiming Wang ◽  
Yun Huang

IntroductionCholangiocarcinoma (CCA) is the second most common hepatic malignancy. Progress and developments have also been made in the field of CCA management along with increasing scientific publications during the past decades, which reflect topics of general interest and suggest the future direction of studies. The purpose of this bibliometric study is to summarize scientific publications during the past 25 years in the field of CCA using a machine learning method.Material and MethodsScientific publications focusing on CCA from 1995 to 2019 were searched in PubMed using the MeSH term “cholangiocarcinoma.” Full associated data were downloaded in the format of PubMed and extracted in the R platform. Latent Dirichlet allocation (LDA) was adopted to identify the research topics from the abstract of each publication using Python.ResultsA total of 8,276 publications related to CCA from the last 25 years were found and included in this study. The most type of publications remained little changed, while the proportion of clinical trials remained relatively low (7.24% as the highest) and, more significantly, with a further downward trend during the recent years (1.42% in 2019). Neoplasm staging, hepatectomy, and survival rate were the most concerning terms among those who are diagnosis-related, treatment-related, and prognosis-related. The LDA analyses showed chemotherapy, hepatectomy, and stent as the highly concerned research topics of CCA treatment. Meanwhile, conversions from basic studies to clinical therapies were suggested by a poor connection between clusters of treatment management and basic research.ConclusionThe number of publications of CCA has increased rapidly during the past 25 years. Survival analysis, differential diagnosis, and microRNA expression are the most concerned topics in CCA studies. Besides, there is an urgent need for high-quality clinical trials and conversions from basic studies to clinical therapies.


2019 ◽  
Author(s):  
José Padarian ◽  
Budiman Minasny ◽  
Alex B. McBratney

Abstract. The application of machine learning (ML) techniques in various fields of science has increased rapidly, especially in the last ten years. The increasing availability of soil data that can be efficiently acquired remotely and proximally, and freely available open-source algorithms, have led to an accelerated adoption of ML techniques to analyse soil data. Given the large number of publications, it is an impossible task to manually review all papers on the application of ML in soil science without narrowing down a narrative of ML application in a specific research question. This paper aims to provide a comprehensive review of the application of ML techniques in soil science aided by a ML algorithm (Latent Dirichlet Allocation) to find patterns in a large collection of text corpus. The objective is to gain insight into publications of ML applications in soil science and to discuss the research gaps in this topic. We found that: a) there is an increasing usage of ML methods in soil sciences, mostly concentrated in developed countries, b) the reviewed publication can be grouped into 12 topics, namely remote sensing, soil organic carbon, water, contamination, methods (ensembles), erosion and parent material, methods (NN, SVM), spectroscopy, modelling (classes), crops, physical and modelling (continuous), c) advanced ML methods usually perform better than simpler approaches thanks to their capability to capture non-linear relationships. From these findings, we found research gaps, in particular: about the precautions that should be taken (parsimony) to avoid overfitting, and that the interpretability of the ML models is an important aspect to consider when applying advanced ML methods in order to improve our knowledge and understanding of soil. We foresee that a large number of studies will focus on the latter topic.


Author(s):  
Ahmed Sameer El Khatib

The aim of this paper is provide a first comprehensive structuring of the literature applying machine learning to finance. We use a probabilistic topic modelling approach to make sense of this diverse body of research spanning across the disciplines of finance, economics, computer sciences, and decision sciences. Through the topic modelling approach, a Latent Dirichlet Allocation Technique (LDA), we can extract the 14 coherent research topics that are the focus of the 6,148 academic articles during the years 1990-2019 analysed. We first describe and structure these topics, and then further show how the topic focus has evolved over the last two decades. Our study thus provides a structured topography for finance researchers seeking to integrate machine learning research approaches in their exploration of finance phenomena. We also showcase the benefits to finance researchers of the method of probabilistic modelling of topics for deep comprehension of a body of literature, especially when that literature has diverse multi-disciplinary actors.


2019 ◽  
Vol 23 (5) ◽  
pp. 116-121
Author(s):  
A. I. Nevorotin ◽  
I. V. Awsiewitsch ◽  
I. M. Sukhanov

This article is the continuation of analysis and discussion from the book by Professor AI Nevorotin "Matrix phraseological collection: a manual for writing a scientific article in English". The Matrix phraseological collection is a kind of catalog of text samples. The samples were from articles selected from the leading English-language scientific journals and were systematized in such away that when writing an article in English, a Russian researchers are able easy to find examples suitable for his/her own work. Furthermore, the selected samples can be transformed accordingly saving the semantic and syntactic relations between the elements and, finally, be inserted into the text. The second part of this work is devoted to the detailed analysis of the English scientific literature and also the section "Legality of the provisions of the problem".


2019 ◽  
Vol 7 (4) ◽  
pp. 184-190
Author(s):  
Himani Maheshwari ◽  
Pooja Goswami ◽  
Isha Rana

2021 ◽  
Vol 192 ◽  
pp. 103181
Author(s):  
Jagadish Timsina ◽  
Sudarshan Dutta ◽  
Krishna Prasad Devkota ◽  
Somsubhra Chakraborty ◽  
Ram Krishna Neupane ◽  
...  

Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


i-com ◽  
2021 ◽  
Vol 20 (1) ◽  
pp. 19-32
Author(s):  
Daniel Buschek ◽  
Charlotte Anlauff ◽  
Florian Lachner

Abstract This paper reflects on a case study of a user-centred concept development process for a Machine Learning (ML) based design tool, conducted at an industry partner. The resulting concept uses ML to match graphical user interface elements in sketches on paper to their digital counterparts to create consistent wireframes. A user study (N=20) with a working prototype shows that this concept is preferred by designers, compared to the previous manual procedure. Reflecting on our process and findings we discuss lessons learned for developing ML tools that respect practitioners’ needs and practices.


Author(s):  
Ernesto Dufrechou ◽  
Pablo Ezzatti ◽  
Enrique S Quintana-Ortí

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.


2021 ◽  
Vol 59 ◽  
pp. 102353
Author(s):  
Amber Grace Young ◽  
Ann Majchrzak ◽  
Gerald C. Kane

Sign in / Sign up

Export Citation Format

Share Document