literature mining
Recently Published Documents


TOTAL DOCUMENTS

319
(FIVE YEARS 131)

H-INDEX

22
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Yao Gong ◽  
Gaurav Behera ◽  
Luke Erber ◽  
Ang Luo ◽  
Yue Chen

Proline hydroxylation (Hyp) regulates protein structure, stability and protein-protein interaction and is widely involved in diverse metabolic and physiological pathways in cells and diseases. To reveal functional features of the proline hydroxylation proteome, we integrated various data sources for deep proteome profiling of proline hydroxylation proteome in human and developed HypDB (https://www.HypDB.site), an annotated database and web server for proline hydroxylation proteome. HypDB provides site-specific evidence of modification based on extensive LC-MS analysis and literature mining with 15319 non-redundant Hyp sites and 8226 sites with high confidence on human proteins. Annotation analysis revealed significant enrichment of proline hydroxylation on key functional domains and tissue-specific distribution of Hyp abundance across 26 types of human organs and fluids and 6 cell lines. The network connectivity analysis further revealed a critical role of proline hydroxylation in mediating protein-protein interactions. Moreover, the spectral library generated by HypDB enabled data-independent analysis (DIA) of clinical tissues and the identification of novel Hyp biomarkers in lung cancer and kidney cancer. Taken together, our integrated analysis of human proteome with publicly accessible HypDB revealed functional diversity of Hyp substrates and provides a quantitative data source to characterize proline hydroxylation in pathways and diseases.


2022 ◽  
Vol 44 (1) ◽  
pp. 309-328
Author(s):  
Masoumeh Naserkheil ◽  
Farzad Ghafouri ◽  
Sonia Zakizadeh ◽  
Nasrollah Pirany ◽  
Zeinab Manzari ◽  
...  

Mastitis, inflammation of the mammary gland, is the most prevalent disease in dairy cattle that has a potential impact on profitability and animal welfare. Specifically designed multi-omics studies can be used to prioritize candidate genes and identify biomarkers and the molecular mechanisms underlying mastitis in dairy cattle. Hence, the present study aimed to explore the genetic basis of bovine mastitis by integrating microarray and RNA-Seq data containing healthy and mastitic samples in comparative transcriptome analysis with the results of published genome-wide association studies (GWAS) using a literature mining approach. The integration of different information sources resulted in the identification of 33 common and relevant genes associated with bovine mastitis. Among these, seven genes—CXCR1, HCK, IL1RN, MMP9, S100A9, GRO1, and SOCS3—were identified as the hub genes (highly connected genes) for mastitis susceptibility and resistance, and were subjected to protein-protein interaction (PPI) network and gene regulatory network construction. Gene ontology annotation and enrichment analysis revealed 23, 7, and 4 GO terms related to mastitis in the biological process, molecular function, and cellular component categories, respectively. Moreover, the main metabolic-signalling pathways responsible for the regulation of immune or inflammatory responses were significantly enriched in cytokine–cytokine-receptor interaction, the IL-17 signaling pathway, viral protein interaction with cytokines and cytokine receptors, and the chemokine signaling pathway. Consequently, the identification of these genes, pathways, and their respective functions could contribute to a better understanding of the genetics and mechanisms regulating mastitis and can be considered a starting point for future studies on bovine mastitis.


2022 ◽  
Author(s):  
Bo Dong ◽  
Jing Liu ◽  
Bing Chen ◽  
Yuqi Huang ◽  
Peng Ai ◽  
...  

Abstract -Purpose: The adaptability of blue-spotted mudskipper (Boleophthalmus Periophthalmodon; BP) and giant-fin mudskipper (Periophthalmus magnuspinnatus; PM), has been previously reported at the genome level to explain their amphibious life. However, the roles of GI microbiota in their adaptation to the terrestrial life are worth exploring. -Methods: In this study, we mainly utilized metagenomic data from these two representative mudskippers and typical aquicolous fish species to obtain microbial composition, diversity, abundance and potential functions of GI microbiota for comparisons between amphibious and aquicolous fishes. Meanwhile, we summarized the GI microbiota results of representative seawater fishes, freshwater fishes, amphibians, and terrestrial animals by literature mining for comparing those of the mudskippers. -Result: Interestingly the content for each dominant phylum was strikingly different among BP, PM and aquicolous fishes. We also observed that the profile of GI microbiota in mudskippers owned the typical bacterial families for the terrestrial animals, (freshwater and seawater) fishes, and amphibians at the same time, which is consistent with their life style of water-to-land and freshwater to seawater transition. More interestingly, certain bacteria strains like S24-7, previously thought to be specific in terrestrial animals, were also identified in both BP and PM. -Conclusion: The various composite and diversity of mudskipper GI microflora are therefore considered to conduce to their terrestrial adaptation in these amphibious fishes.


2021 ◽  
Author(s):  
Adam J H Newton ◽  
David Chartash ◽  
Steven H Kleinstein ◽  
Robert A McDougal

Objective: The accelerating pace of biomedical publication has made retrieving papers and extracting specific comprehensive scientific information a key challenge. A timely example of such a challenge is to retrieve the subset of papers that report on immune signatures (coherent sets of biomarkers) to understand the immune response mechanisms which drive differential SARS-CoV-2 infection outcomes. A systematic and scalable approach is needed to identify and extract COVID-19 immune signatures in a structured and machine-readable format. Materials and Methods: We used SPECTER embeddings with SVM classifiers to automatically identify papers containing immune signatures. A generic web platform was used to manually screen papers and allow anonymous submission. Results: We demonstrate a classifier that retrieves papers with human COVID-19 immune signatures with a positive predictive value of 86%. Semi-automated queries to the corresponding authors of these publications requesting signature information achieved a 31% response rate. This demonstrates the efficacy of using a SVM classifier with document embeddings of the abstract and title, to retrieve papers with scientifically salient information, even when that information is rarely present in the abstract. Additionally, classification based on the embeddings identified the type of immune signature (e.g., gene expression vs. other types of profiling) with a positive predictive value of 74%. Conclusions: Coupling a classifier based on document embeddings with direct author engagement offers a promising pathway to build a semi-structured representation of scientifically relevant information. Through this approach, partially automated literature mining can help rapidly create semi-structured knowledge repositories for automatic analysis of emerging health threats.


2021 ◽  
Vol 12 (1) ◽  
pp. 154
Author(s):  
Ziheng Zhang ◽  
Feng Han ◽  
Hongjian Zhang ◽  
Tomohiro Aoki ◽  
Katsuhiko Ogasawara

Biomedical terms extracted using Word2vec, the most popular word embedding model in recent years, serve as the foundation for various natural language processing (NLP) applications, such as biomedical information retrieval, relation extraction, and recommendation systems. The objective of this study is to examine how changes in the ratio of the biomedical domain to general domain data in the corpus affect the extraction of similar biomedical terms using Word2vec. We downloaded abstracts of 214,892 articles from PubMed Central (PMC) and the 3.9 GB Billion Word (BW) benchmark corpus from the computer science community. The datasets were preprocessed and grouped into 11 corpora based on the ratio of BW to PMC, ranging from 0:10 to 10:0, and then Word2vec models were trained on these corpora. The cosine similarities between the biomedical terms obtained from the Word2vec models were then compared in each model. The results indicated that the models trained with both BW and PMC data outperformed the model trained only with medical data. The similarity between the biomedical terms extracted by the Word2vec model increased when the ratio of the biomedical domain to general domain data was 3:7 to 5:5. This study allows NLP researchers to apply Word2vec based on more information and increase the similarity of extracted biomedical terms to improve their effectiveness in NLP applications, such as biomedical information extraction.


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Ting Tao ◽  
Qing Zhang ◽  
Zibo Liu ◽  
Ting Zhang ◽  
Lingyu Wang ◽  
...  

Polygonum cuspidatum (PC) has been reported to exert a potent antihyperlipidemic effect. However, its mechanisms of action and active ingredients remain elusive and require further research. In this study, we first conducted in vivo experiments to validate that Polygonum cuspidatum extract (PCE) could ameliorate the blood lipid level in hyperlipidemia model rats. Then, ultrahigh performance liquid chromatography coupled with Q-Exactive MS/MS (UPLC-QE-MS/MS) was applied to verify its 12 main active ingredients. The pharmacophore matching model was employed to predict the target point of the active ingredient, and 27 overlapping genes were identified via database and literature mining. String online database and Cytoscape software were utilized to construct a Protein-Protein Interaction (PPI) network, followed by function annotation analysis and pathway enrichment analysis. The results showed that the PI3K/AKT signaling pathway and its downstream FOXO3/ERα factors were significantly enriched. Furthermore, in vitro experiments were performed to determine the lipid content and oxidative stress (OS) indicators in OA-induced HepG2 cells, and immunofluorescence and western blotting analysis were carried out to analyze the effects of PCE on related proteins. Our experimental results show that the mechanism of antihyperlipidemic action of PCE is related to the activation of the PI3K/AKT signaling pathway and its downstream FOXO3/ERα factors, and polydatin and resveratrol are the main active ingredients in PCE that exert antihyperlipidemic effects.


2021 ◽  
Author(s):  
Mohammed S. M. Almuslehi ◽  
Monokesh K. Sen ◽  
Peter J. Shortland ◽  
David A. Mahns ◽  
Jens R. Coorssen

Abstract A change in visual perception is a frequent early symptom of multiple sclerosis (MS), the pathoetiology of which remains unclear. Following a slow demyelination process caused by 12 weeks of low-dose (0.1%) cuprizone (CPZ) consumption, histology and proteomics were used to investigate components of the visual pathway in young adult mice. Histological investigation did not identify demyelination or gliosis in the optic tracts, pretectal nuclei, superior colliculi, lateral geniculate nuclei or visual cortices. However, top-down proteomic assessment of the optic nerve/tract revealed a significant change in the abundance of 34 spots in high-resolution 2D gels. Subsequent liquid chromatography-tandem mass spectrometry analysis identified alterations in 75 proteoforms. Literature mining revealed the relevance of these proteoforms in terms of proteins previously implicated in animal models and human MS. Importantly, 24 proteoforms were not previously described in any animal models of MS or MS itself. Bioinformatic analysis indicated involvement of these proteoforms in cytoskeleton organization, metabolic dysregulation, protein aggregation, and axonal support. Collectively, these results indicate that continuous CPZ-feeding, which evokes a slow demyelination, results in proteomic changes that precede any clear histological changes in the visual pathway and that these proteoforms may be potential early markers of degenerative demyelinating conditions.


Pharmaceutics ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2001
Author(s):  
Laura E. McCoubrey ◽  
Stavriani Thomaidou ◽  
Moe Elbadawi ◽  
Simon Gaisford ◽  
Mine Orlu ◽  
...  

Over 150 drugs are currently recognised as being susceptible to metabolism or bioaccumulation (together described as depletion) by gastrointestinal microorganisms; however, the true number is likely higher. Microbial drug depletion is often variable between and within individuals, depending on their unique composition of gut microbiota. Such variability can lead to significant differences in pharmacokinetics, which may be associated with dosing difficulties and lack of medication response. In this study, literature mining and unsupervised learning were used to curate a dataset of 455 drug–microbiota interactions. From this, 11 supervised learning models were developed that could predict drugs’ susceptibility to depletion by gut microbiota. The best model, a tuned extremely randomised trees classifier, achieved performance metrics of AUROC: 75.1% ± 6.8; weighted recall: 79.2% ± 3.9; balanced accuracy: 69.0% ± 4.6; and weighted precision: 80.2% ± 3.7 when validated on 91 drugs. This machine learning model is the first of its kind and provides a rapid, reliable, and resource-friendly tool for researchers and industry professionals to screen drugs for susceptibility to depletion by gut microbiota. The recognition of drug–microbiome interactions can support successful drug development and promote better formulations and dosage regimens for patients.


Cells ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 3169
Author(s):  
Ning Zhang ◽  
Yameng Wu ◽  
Yu Guo ◽  
Yu Sa ◽  
Qifeng Li ◽  
...  

In the field of gliomas research, the broad availability of genetic and image information originated by computer technologies and the booming of biomedical publications has led to the advent of the big-data era. Machine learning methods were applied as possible approaches to speed up the data mining processes. In this article, we reviewed the present situation and future orientations of machine learning application in gliomas within the context of workflows to integrate analysis for precision cancer care. Publicly available tools or algorithms for key machine learning technologies in the literature mining for glioma clinical research were reviewed and compared. Further, the existing solutions of machine learning methods and their limitations in glioma prediction and diagnostics, such as overfitting and class imbalanced, were critically analyzed.


Sign in / Sign up

Export Citation Format

Share Document