scholarly journals Supervised topic modeling for predicting molecular substructure from mass spectrometry

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 403
Author(s):  
Gabriel K. Reder ◽  
Adamo Young ◽  
Jaan Altosaar ◽  
Jakub Rajniak ◽  
Noémie Elhadad ◽  
...  

Small-molecule metabolites are principal actors in myriad phenomena across biochemistry and serve as an important source of biomarkers and drug candidates. Given a sample of unknown composition, identifying the metabolites present is difficult given the large number of small molecules both known and yet to be discovered. Even for biofluids such as human blood, building reliable ways of identifying biomarkers is challenging. A workhorse method for characterizing individual molecules in such untargeted metabolomics studies is tandem mass spectrometry (MS/MS). MS/MS spectra provide rich information about chemical composition. However, structural characterization from spectra corresponding to unknown molecules remains a bottleneck in metabolomics. Current methods often rely on matching to pre-existing databases in one form or another.  Here we develop a preprocessing scheme and supervised topic modeling approach to identify modular groups of spectrum fragments and neutral losses corresponding to chemical substructures using labeled latent Dirichlet allocation (LLDA) to map spectrum features to known chemical structures. These structures appear in new unknown spectra and can be predicted. We find that LLDA is an interpretable and reliable method for structure prediction from MS/MS spectra. Specifically, the LLDA approach has the following advantages: (a) molecular topics are interpretable; (b) A practitioner can select any set of chemical structure labels relevant to their problem; (c ) LLDA performs well and can exceed the performance of other methods in predicting substructures in novel contexts.

2019 ◽  
Vol 26 (25) ◽  
pp. 4799-4831 ◽  
Author(s):  
Jiahua Cui ◽  
Xiaoyang Liu ◽  
Larry M.C. Chow

P-glycoprotein, also known as ABCB1 in the ABC transporter family, confers the simultaneous resistance of metastatic cancer cells towards various anticancer drugs with different targets and diverse chemical structures. The exploration of safe and specific inhibitors of this pump has always been the pursuit of scientists for the past four decades. Naturally occurring flavonoids as benzopyrone derivatives were recognized as a class of nontoxic inhibitors of P-gp. The recent advent of synthetic flavonoid dimer FD18, as a potent P-gp modulator in reversing multidrug resistance both in vitro and in vivo, specifically targeted the pseudodimeric structure of the drug transporter and represented a new generation of inhibitors with high transporter binding affinity and low toxicity. This review concerned the recent updates on the structure-activity relationships of flavonoids as P-gp inhibitors, the molecular mechanisms of their action and their ability to overcome P-gp-mediated MDR in preclinical studies. It had crucial implications on the discovery of new drug candidates that modulated the efflux of ABC transporters and also provided some clues for the future development in this promising area.


Molecules ◽  
2021 ◽  
Vol 26 (6) ◽  
pp. 1555
Author(s):  
Enas E. Eltamany ◽  
Usama Ramadan Abdelmohsen ◽  
Dina M. Hal ◽  
Amany K. Ibrahim ◽  
Hashim A. Hassanean ◽  
...  

Chemical investigation of the methanolic extract of the Red Sea cucumber Holothuria spinifera led to the isolation of a new cerebroside, holospiniferoside (1), together with thymidine (2), methyl-α-d-glucopyranoside (3), a new triacylglycerol (4), and cholesterol (5). Their chemical structures were established by NMR and mass spectrometric analysis, including gas chromatography–mass spectrometry (GC–MS) and high-resolution mass spectrometry (HRMS). All the isolated compounds are reported in this species for the first time. Moreover, compound 1 exhibited promising in vitro antiproliferative effect on the human breast cancer cell line (MCF-7) with IC50 of 20.6 µM compared to the IC50 of 15.3 µM for the drug cisplatin. To predict the possible mechanism underlying the cytotoxicity of compound 1, a docking study was performed to elucidate its binding interactions with the active site of the protein Mdm2–p53. Compound 1 displayed an apoptotic activity via strong interaction with the active site of the target protein. This study highlights the importance of marine natural products in the design of new anticancer agents.


2021 ◽  
pp. 1-16
Author(s):  
Ibtissem Gasmi ◽  
Mohamed Walid Azizi ◽  
Hassina Seridi-Bouchelaghem ◽  
Nabiha Azizi ◽  
Samir Brahim Belhaouari

Context-Aware Recommender System (CARS) suggests more relevant services by adapting them to the user’s specific context situation. Nevertheless, the use of many contextual factors can increase data sparsity while few context parameters fail to introduce the contextual effects in recommendations. Moreover, several CARSs are based on similarity algorithms, such as cosine and Pearson correlation coefficients. These methods are not very effective in the sparse datasets. This paper presents a context-aware model to integrate contextual factors into prediction process when there are insufficient co-rated items. The proposed algorithm uses Latent Dirichlet Allocation (LDA) to learn the latent interests of users from the textual descriptions of items. Then, it integrates both the explicit contextual factors and their degree of importance in the prediction process by introducing a weighting function. Indeed, the PSO algorithm is employed to learn and optimize weights of these features. The results on the Movielens 1 M dataset show that the proposed model can achieve an F-measure of 45.51% with precision as 68.64%. Furthermore, the enhancement in MAE and RMSE can respectively reach 41.63% and 39.69% compared with the state-of-the-art techniques.


2021 ◽  
Vol 16 (4) ◽  
pp. 1042-1065
Author(s):  
Anne Gottfried ◽  
Caroline Hartmann ◽  
Donald Yates

The business intelligence (BI) market has grown at a tremendous rate in the past decade due to technological advancements, big data and the availability of open source content. Despite this growth, the use of open government data (OGD) as a source of information is very limited among the private sector due to a lack of knowledge as to its benefits. Scant evidence on the use of OGD by private organizations suggests that it can lead to the creation of innovative ideas as well as assist in making better informed decisions. Given the benefits but lack of use of OGD to generate business intelligence, we extend research in this area by exploring how OGD can be used to generate business intelligence for the identification of market opportunities and strategy formulation; an area of research that is still in its infancy. Using a two-industry case study approach (footwear and lumber), we use latent Dirichlet allocation (LDA) topic modeling to extract emerging topics in these two industries from OGD, and a data visualization tool (pyLDAVis) to visualize the topics in order to interpret and transform the data into business intelligence. Additionally, we perform an environmental scanning of the environment for the two industries to validate the usability of the information obtained. The results provide evidence that OGD can be a valuable source of information for generating business intelligence and demonstrate how topic modeling and visualization tools can assist organizations in extracting and analyzing information for the identification of market opportunities.


2021 ◽  
Vol 13 (5) ◽  
pp. 2876
Author(s):  
Anne Parlina ◽  
Kalamullah Ramli ◽  
Hendri Murfi

The literature discussing the concepts, technologies, and ICT-based urban innovation approaches of smart cities has been growing, along with initiatives from cities all over the world that are competing to improve their services and become smart and sustainable. However, current studies that provide a comprehensive understanding and reveal smart and sustainable city research trends and characteristics are still lacking. Meanwhile, policymakers and practitioners alike need to pursue progressive development. In response to this shortcoming, this research offers content analysis studies based on topic modeling approaches to capture the evolution and characteristics of topics in the scientific literature on smart and sustainable city research. More importantly, a novel topic-detecting algorithm based on the deep learning and clustering techniques, namely deep autoencoders-based fuzzy C-means (DFCM), is introduced for analyzing the research topic trend. The topics generated by this proposed algorithm have relatively higher coherence values than those generated by previously used topic detection methods, namely non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), and eigenspace-based fuzzy C-means (EFCM). The 30 main topics that appeared in topic modeling with the DFCM algorithm were classified into six groups (technology, energy, environment, transportation, e-governance, and human capital and welfare) that characterize the six dimensions of smart, sustainable city research.


2021 ◽  
Author(s):  
Faizah Faizah ◽  
Bor-Shen Lin

BACKGROUND The World Health Organization (WHO) declared COVID-19 as a global pandemic on January 30, 2020. However, the pandemic has not been over yet. Furthermore, in the first quartal of 2021, some countries face the third wave of the pandemic. During the difficult time, the development of the vaccines for COVID-19 accelerates rapidly. Understanding the public perception of the COVID-19 Vaccine according to the data collected from social media can widen the perspective on the state of the global pandemic OBJECTIVE This study explores and analyzes the latent topic on COVID-19 Vaccine Tweet posted by individuals from various countries by using two-stage topic modeling. METHODS A two-stage analysis in topic modeling was proposed to investigating people’s reactions in five countries. The first stage is Latent Dirichlet Allocation that produces the latent topics with the corresponding term distributions that facilitate the investigators to understand the main issues or opinions. The second stage then performs agglomerative clustering on the latent topics based on Hellinger distance, which merges close topics hierarchically into topic clusters to visualize those topics in either tree or graph views. RESULTS In general, the topic discussion regarding the COVID-19 Vaccine in five countries is similar. Topic themes such as "first vaccine" and & "vaccine effect" dominate the public discussion. The remarkable point is that people in some countries have some topic themes, such as "politician opinion" and " stay home" in Canada, "emergency" in India, and & "blood clots" in the United Kingdom. The analysis also shows the most popular COVID-19 Vaccine, which is gaining more public interest. CONCLUSIONS With LDA and Hierarchical clustering, two-stage topic modeling is powerful for visualizing the latent topics and understanding the public perception regarding the COVID-19 Vaccine.


Molecules ◽  
2018 ◽  
Vol 23 (11) ◽  
pp. 3003 ◽  
Author(s):  
Seoung Rak Lee ◽  
Dahae Lee ◽  
Jae Sik Yu ◽  
René Benndorf ◽  
Sullim Lee ◽  
...  

In recent years, investigations into the biochemistry of insect-associated bacteria have increased. When combined with analytical dereplication processes, these studies provide a powerful strategy to identify structurally and/or biologically novel compounds. Non-ribosomally synthesized cyclic peptides have a broad bioactivity spectrum with high medicinal potential. Here, we report the discovery of three new cyclic tripeptides: natalenamides A–C (compounds 1–3). These compounds were identified from the culture broth of the fungus-growing termite-associated Actinomadura sp. RB99 using a liquid chromatography (LC)/ultraviolet (UV)/mass spectrometry (MS)-based dereplication method. Chemical structures of the new compounds (1–3) were established by analysis of comprehensive spectroscopic methods, including one-dimensional (1H and 13C) and two-dimensional (1H-1H-COSY, HSQC, HMBC) nuclear magnetic resonance spectroscopy (NMR), together with high-resolution electrospray ionization mass spectrometry (HR-ESIMS) data. The absolute configurations of the new compounds were elucidated using Marfey’s analysis. Through several bioactivity tests for the tripeptides, we found that compound 3 exhibited significant inhibitory effects on 3-isobutyl-1-methylxanthine (IBMX)-induced melanin production. The effect of compound 3 was similar to that of kojic acid, a compound extensively used as a cosmetic material with a skin-whitening effect.


Sign in / Sign up

Export Citation Format

Share Document