data browsing
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 10)

H-INDEX

5
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Luke Reilly ◽  
Lirong Peng ◽  
Erika Lara ◽  
Daniel Ramos ◽  
Michael Fernandopulle ◽  
...  

Fully automated proteomic pipelines have the potential to achieve deep coverage of cellular proteomes with high throughput and scalability. However, it is important to evaluate performance, including both reproducibility and ability to provide meaningful levels of biological insight. Here, we present an approach combining high field asymmetric waveform ion mobility spectrometer (FAIMS) interface and data independent acquisition (DIA) proteomics approach developed as part of the induced pluripotent stem cell (iPSC) Neurodegenerative Disease Initiative (iNDI), a large-scale effort to understand how inherited diseases may manifest in neuronal cells. Our FAIMS-DIA approach identified more than 8000 proteins per mass spectrometry (MS) acquisition as well as superior total identification, reproducibility, and accuracy compared to other existing DIA methods. Next, we applied this approach to perform a longitudinal proteomic profiling of the differentiation of iPSC-derived neurons from the KOLF2.1J parental line used in iNDI. This analysis demonstrated a steady increase in expression of mature cortical neuron markers over the course of neuron differentiation. We validated the performance of our proteomics pipeline by comparing it to single cell RNA-Seq datasets obtained in parallel, confirming expression of key markers and cell type annotations. An interactive webapp of this temporal data is available for aligned-UMAP visualization and data browsing (https://share.streamlit.io/anant-droid/singlecellumap). In summary, we report an extensively optimized and validated proteomic pipeline that will be suitable for large-scale studies such as iNDI.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Chengkun Wu ◽  
Xinyi Xiao ◽  
Canqun Yang ◽  
JinXiang Chen ◽  
Jiacai Yi ◽  
...  

Abstract Background Interactions of microbes and diseases are of great importance for biomedical research. However, large-scale of microbe–disease interactions are hidden in the biomedical literature. The structured databases for microbe–disease interactions are in limited amounts. In this paper, we aim to construct a large-scale database for microbe–disease interactions automatically. We attained this goal via applying text mining methods based on a deep learning model with a moderate curation cost. We also built a user-friendly web interface that allows researchers to navigate and query required information. Results Firstly, we manually constructed a golden-standard corpus and a sliver-standard corpus (SSC) for microbe–disease interactions for curation. Moreover, we proposed a text mining framework for microbe–disease interaction extraction based on a pretrained model BERE. We applied named entity recognition tools to detect microbe and disease mentions from the free biomedical texts. After that, we fine-tuned the pretrained model BERE to recognize relations between targeted entities, which was originally built for drug–target interactions or drug–drug interactions. The introduction of SSC for model fine-tuning greatly improved detection performance for microbe–disease interactions, with an average reduction in error of approximately 10%. The MDIDB website offers data browsing, custom searching for specific diseases or microbes, and batch downloading. Conclusions Evaluation results demonstrate that our method outperform the baseline model (rule-based PKDE4J) with an average $$F_1$$ F 1 -score of 73.81%. For further validation, we randomly sampled nearly 1000 predicted interactions by our model, and manually checked the correctness of each interaction, which gives a 73% accuracy. The MDIDB webiste is freely avaliable throuth http://dbmdi.com/index/


2021 ◽  
Author(s):  
Chengkun Wu ◽  
Xinyi Xiao ◽  
Canqun Yang ◽  
JinXiang Chen ◽  
Jiacai Yi ◽  
...  

Abstract Background: Interactions of microbes and diseases are of great importance for biomedical research. However, large-scale curated databases for microbe-disease interactions are missing, as the amount of related literature is enormous and the curation process is costly and time-consuming. In this paper, we aim to construct a large-scale database for microbe-disease interactions automatically. We attained this goal via applying text mining methods based on a deep learning model with a moderate curation cost. We also built a user-friendly web interface to allow researchers navigate and query desired information. Results: For curation, we manually constructed a golden-standard corpora (GSC) and a sliver-standard corpora (SSC) for microbe-disease interactions. Then we proposed a text mining framework for microbe-disease interaction extraction without having to build a model from scratch. Firstly, we applied named entity recognition (NER) tools to detect microbe and disease mentions from texts. Then we transferred a deep learning model BERE to recognize relations between entities, which was originally built for drug-target interactions or drug-drug interactions. The introduction of SSC for model ne-tuning greatly improves the performance of detection for microbe-disease interactions, with an average reduction in error of approximately 10%. The resulting MDIDB website offers data browsing, custom search for specific diseases or microbes as well as batch download. Conclusions: Evaluation results demonstrate that our method outperform the baseline model (rule-based PKDE4J) with an average F1-score of 73.81%. For further validation, we randomly sampled nearly 1,000 predicted interactions by our model, and manually checked the correctness of each interaction, which gives a 73% accuracy. The MDIDB webiste is freely avaliable throuth http://dbmdi.com/index/


2020 ◽  
Vol 49 (D1) ◽  
pp. D86-D91
Author(s):  
Bailing Zhou ◽  
Baohua Ji ◽  
Kui Liu ◽  
Guodong Hu ◽  
Fei Wang ◽  
...  

Abstract Long non-coding RNAs (lncRNAs) play important functional roles in many diverse biological processes. However, not all expressed lncRNAs are functional. Thus, it is necessary to manually collect all experimentally validated functional lncRNAs (EVlncRNA) with their sequences, structures, and functions annotated in a central database. The first release of such a database (EVLncRNAs) was made using the literature prior to 1 May 2016. Since then (till 15 May 2020), 19 245 articles related to lncRNAs have been published. In EVLncRNAs 2.0, these articles were manually examined for a major expansion of the data collected. Specifically, the number of annotated EVlncRNAs, associated diseases, lncRNA-disease associations, and interaction records were increased by 260%, 320%, 484% and 537%, respectively. Moreover, the database has added several new categories: 8 lncRNA structures, 33 exosomal lncRNAs, 188 circular RNAs, and 1079 drug-resistant, chemoresistant, and stress-resistant lncRNAs. All records have checked against known retraction and fake articles. This release also comes with a highly interactive visual interaction network that facilitates users to track the underlying relations among lncRNAs, miRNAs, proteins, genes and other functional elements. Furthermore, it provides links to four new bioinformatics tools with improved data browsing and searching functionality. EVLncRNAs 2.0 is freely available at https://www.sdklab-biophysics-dzu.net/EVLncRNAs2/.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Henry Loeffler-Wirth ◽  
Jasmin Reikowski ◽  
Siras Hakobyan ◽  
Jonas Wagner ◽  
Hans Binder

Abstract Background oposSOM is a comprehensive, machine learning based open-source data analysis software combining functionalities such as diversity analyses, biomarker selection, function mining, and visualization. Results These functionalities are now available as interactive web-browser application for a broader user audience interested in extracting detailed information from high-throughput omics data sets pre-processed by oposSOM. It enables interactive browsing of single-gene and gene set profiles, of molecular ‘portrait landscapes’, of associated phenotype diversity, and signalling pathway activation patterns. Conclusion The oposSOM-Browser makes available interactive data browsing for five transcriptome data sets of cancer (melanomas, B-cell lymphomas, gliomas) and of peripheral blood (sepsis and healthy individuals) at www.izbi.uni-leipzig.de/opossom-browser.


2020 ◽  
Vol 39 (3) ◽  
pp. 469-481
Author(s):  
Chao Ma ◽  
Ye Zhao ◽  
Shamal AL‐Dohuki ◽  
Jing Yang ◽  
Xinyue Ye ◽  
...  
Keyword(s):  

2020 ◽  
Vol 19 (8) ◽  
pp. 1396-1408 ◽  
Author(s):  
Veit Schwämmle ◽  
Christina E. Hagensen ◽  
Adelina Rogowska-Wrzesinska ◽  
Ole N. Jensen

Statistical testing remains one of the main challenges for high-confidence detection of differentially regulated proteins or peptides in large-scale quantitative proteomics experiments by mass spectrometry. Statistical tests need to be sufficiently robust to deal with experiment intrinsic data structures and variations and often also reduced feature coverage across different biological samples due to ubiquitous missing values. A robust statistical test provides accurate confidence scores of large-scale proteomics results, regardless of instrument platform, experimental protocol and software tools. However, the multitude of different combinations of experimental strategies, mass spectrometry techniques and informatics methods complicate the decision of choosing appropriate statistical approaches. We address this challenge by introducing PolySTest, a user-friendly web service for statistical testing, data browsing and data visualization. We introduce a new method, Miss test, that simultaneously tests for missingness and feature abundance, thereby complementing common statistical tests by rescuing otherwise discarded data features. We demonstrate that PolySTest with integrated Miss test achieves higher confidence and higher sensitivity for artificial and experimental proteomics data sets with known ground truth. Application of PolySTest to mass spectrometry based large-scale proteomics data obtained from differentiating muscle cells resulted in the rescue of 10–20% additional proteins in the identified molecular networks relevant to muscle differentiation. We conclude that PolySTest is a valuable addition to existing tools and instrument enhancements that improve coverage and depth of large-scale proteomics experiments. A fully functional demo version of PolySTest and Miss test is available via http://computproteomics.bmb.sdu.dk/Apps/PolySTest.


Information ◽  
2019 ◽  
Vol 10 (10) ◽  
pp. 310 ◽  
Author(s):  
Ronzhin ◽  
Folmer ◽  
Maria ◽  
Brattinga ◽  
Beek ◽  
...  

After more than a decade, the supply-driven approach to publishing public (open) data has resulted in an ever-growing number of data silos. Hundreds of thousands of datasets have been catalogued and can be accessed at data portals at different administrative levels. However, usually, users do not think in terms of datasets when they search for information. Instead, they are interested in information that is most likely scattered across several datasets. In the world of proprietary in-company data, organizations invest heavily in connecting data in knowledge graphs and/or store data in data lakes with the intention of having an integrated view of the data for analysis. With the rise of machine learning, it is a common belief that governments can improve their services, for example, by allowing citizens to get answers related to government information from virtual assistants like Alexa or Siri. To provide high-quality answers, these systems need to be fed with knowledge graphs. In this paper, we share our experience of constructing and using the first open government knowledge graph in the Netherlands. Based on the developed demonstrators, we elaborate on the value of having such a graph and demonstrate its use in the context of improved data browsing, multicriteria analysis for urban planning, and the development of location-aware chat bots.


2019 ◽  
Vol 48 (D1) ◽  
pp. D226-D232 ◽  
Author(s):  
Yanbo Yang ◽  
Qiong Zhang ◽  
Ya-Ru Miao ◽  
Jiajun Yang ◽  
Wenqian Yang ◽  
...  

Abstract Alternative polyadenylation (APA) is an important post-transcriptional regulation that recognizes different polyadenylation signals (PASs), resulting in transcripts with different 3′ untranslated regions, thereby influencing a series of biological processes and functions. Recent studies have revealed that some single nucleotide polymorphisms (SNPs) could contribute to tumorigenesis and development through dysregulating APA. However, the associations between SNPs and APA in human cancers remain largely unknown. Here, using genotype and APA data of 9082 samples from The Cancer Genome Atlas (TCGA) and The Cancer 3′UTR Altas (TC3A), we systematically identified SNPs affecting APA events across 32 cancer types and defined them as APA quantitative trait loci (apaQTLs). As a result, a total of 467 942 cis-apaQTLs and 30 721 trans-apaQTLs were identified. By integrating apaQTLs with survival and genome-wide association studies (GWAS) data, we further identified 2154 apaQTLs associated with patient survival time and 151 342 apaQTLs located in GWAS loci. In addition, we designed an online tool to predict the effects of SNPs on PASs by utilizing PAS motif prediction tool. Finally, we developed SNP2APA, a user-friendly and intuitive database (http://gong_lab.hzau.edu.cn/SNP2APA/) for data browsing, searching, and downloading. SNP2APA will significantly improve our understanding of genetic variants and APA in human cancers.


2019 ◽  
Author(s):  
Veit Schwämmle ◽  
Christina E Hagensen ◽  
Adelina Rogowska-Wrzesinska ◽  
Ole N. Jensen

AbstractStatistical testing remains one of the main challenges for high-confidence detection of differentially regulated proteins or peptides in large-scale quantitative proteomics experiments by mass spectrometry. Statistical tests need to be sufficiently robust to deal with experiment intrinsic data structures and variations and often also reduced feature coverage across different biological samples due to ubiquitous missing values. A robust statistical test provides accurate confidence scores of large-scale proteomics results, regardless of instrument platform, experimental protocol and software tools. However, the multitude of different combinations of experimental strategies, mass spectrometry techniques and informatics methods complicate the decision of choosing appropriate statistical approaches. We address this challenge by introducing PolySTest, a user-friendly web service for statistical testing, data browsing and data visualization. We introduce a new method, Miss Test, that simultaneously tests for missingness and feature abundance, thereby complementing common statistical tests by rescuing otherwise discarded data features. We demonstrate that PolySTest with integrated Miss Test achieves higher confidence and higher sensitivity for artificial and experimental proteomics data sets with known ground truth. Application of PolySTest to mass spectrometry based large-scale proteomics data obtained from differentiating muscle cells resulted in the rescue of 10%-20% additional proteins in the identified molecular networks relevant to muscle differentiation. We conclude that PolySTest is a valuable addition to existing tools and instrument enhancements that improve coverage and depth of large-scale proteomics experiments. A fully functional demo version of PolySTest and Miss Test is available via http://computproteomics.bmb.sdu.dk/Apps/PolySTest.


Sign in / Sign up

Export Citation Format

Share Document