Faculty Opinions recommendation of Thylakoid proteome modulation in pea plants grown at different irradiances: quantitative proteomic profiling in a non-model organism aided by transcriptomic data integration.

Transcriptomic data is often expensive and difficult to generate in large cohorts in comparison to genomic data and therefore is often important to integrate multiple transcriptomic datasets from both microarray and next generation sequencing (NGS) based transcriptomic data across similar experiments or clinical trials to improve analytical power and discovery of novel transcripts and genes. However, transcriptomic data integration presents a few challenges including re-annotation and batch effect removal. We developed the Gene Expression Data Integration (GEDI) R package to enable transcriptomic data integration by combining already existing R packages. With just four functions, the GEDI R package makes constructing a transcriptomic data integration pipeline straightforward. Together, the functions overcome the complications in transcriptomic data integration by automatically re-annotating the data and removing the batch effect. The removal of the batch effect is verified with Principal Component Analysis and the data integration is verified using a logistic regression model with forward stepwise feature selection. To demonstrate the functionalities of the GEDI package, we integrated five bovine endometrial transcriptomic datasets from the NCBI Gene Expression Omnibus. The datasets included Affymetrix, Agilent and RNA-sequencing data. Furthermore, we compared the GEDI package to already existing tools and found that GEDI is the only tool that provides a full transcriptomic data integration pipeline including verification of both batch effect removal and data integration.

Download Full-text

Interpretation of psychiatric genome-wide association studies with multispecies heterogeneous functional genomic data integration

Neuropsychopharmacology ◽

10.1038/s41386-020-00795-5 ◽

2020 ◽

Vol 46 (1) ◽

pp. 86-97 ◽

Cited By ~ 1

Author(s):

Timothy Reynolds ◽

Emma C. Johnson ◽

Spencer B. Huggett ◽

Jason A. Bubier ◽

Rohan H. C. Palmer ◽

...

Keyword(s):

Data Integration ◽

Genetic Variants ◽

Association Studies ◽

Model Organism ◽

Genomic Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Behavioral Traits ◽

Genome Wide ◽

Advanced Analysis

AbstractGenome-wide association studies and other discovery genetics methods provide a means to identify previously unknown biological mechanisms underlying behavioral disorders that may point to new therapeutic avenues, augment diagnostic tools, and yield a deeper understanding of the biology of psychiatric conditions. Recent advances in psychiatric genetics have been made possible through large-scale collaborative efforts. These studies have begun to unearth many novel genetic variants associated with psychiatric disorders and behavioral traits in human populations. Significant challenges remain in characterizing the resulting disease-associated genetic variants and prioritizing functional follow-up to make them useful for mechanistic understanding and development of therapeutics. Model organism research has generated extensive genomic data that can provide insight into the neurobiological mechanisms of variant action, but a cohesive effort must be made to establish which aspects of the biological modulation of behavioral traits are evolutionarily conserved across species. Scalable computing, new data integration strategies, and advanced analysis methods outlined in this review provide a framework to efficiently harness model organism data in support of clinically relevant psychiatric phenotypes.

Download Full-text

Assessing key decisions for transcriptomic data integration in biochemical networks

PLoS Computational Biology ◽

10.1371/journal.pcbi.1007185 ◽

2019 ◽

Vol 15 (7) ◽

pp. e1007185 ◽

Cited By ~ 12

Author(s):

Anne Richelle ◽

Chintan Joshi ◽

Nathan E. Lewis

Keyword(s):

Data Integration ◽

Biochemical Networks ◽

Transcriptomic Data

Download Full-text

Proteomic profiling and integrated analysis with transcriptomic data bring new insights in the stress responses of Kluyveromyces marxianus after an arrest during high-temperature ethanol fermentation

Biotechnology for Biofuels ◽

10.1186/s13068-019-1390-2 ◽

2019 ◽

Vol 12 (1) ◽

Cited By ~ 6

Author(s):

Pengsong Li ◽

Xiaofen Fu ◽

Ming Chen ◽

Lei Zhang ◽

Shizhong Li

Keyword(s):

High Temperature ◽

Stress Responses ◽

Ethanol Fermentation ◽

Kluyveromyces Marxianus ◽

Integrated Analysis ◽

Proteomic Profiling ◽

Transcriptomic Data

Download Full-text

Machado: Open source genomics data integration framework

GigaScience ◽

10.1093/gigascience/giaa097 ◽

2020 ◽

Vol 9 (9) ◽

Cited By ~ 1

Author(s):

Mauricio de Alvarenga Mudadu ◽

Adhemar Zerlotini

Keyword(s):

Data Integration ◽

Open Source ◽

Model Organism ◽

Database Schema ◽

Generic Model ◽

Model Organism Database ◽

Integration Framework ◽

Object Relational ◽

Transcriptomics Data ◽

Relational Framework

Abstract Background Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic Model Organism Database) biological relational database schema, known as Chado, is one of the few successful open source initiatives; it is widely adopted and many software packages are able to connect to it. Findings We have been developing an open source software package named Machado, a genomics data integration framework implemented in Python, to enable research groups to both store and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on top of already existing databases. It has several data-loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL, and LSTrAP. There is an API to connect to JBrowse, and a web visualization tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a Google-like search, i.e., single auto-complete search box that provides fast results and filters. Conclusion Machado aims to be a modern object-relational framework that uses the latest Python libraries to produce an effective open source resource for genomics research.

Download Full-text

Machado: open source genomics data integration framework

10.1101/2020.05.08.084731 ◽

2020 ◽

Author(s):

Mauricio de Alvarenga Mudadu ◽

Adhemar Zerlotini

Keyword(s):

Data Integration ◽

Open Source ◽

Model Organism ◽

Database Schema ◽

Generic Model ◽

Model Organism Database ◽

Integration Framework ◽

Object Relational ◽

Transcriptomics Data ◽

Relational Framework

ABSTRACTBackgroundGenome projects and multiomics experiments generate huge volumes of data that must be stored, mined and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for over a decade and have been implementing software libraries, toolkits, platforms, and databases to succeed in this matter. The GMOD’s (Generic Model Organism Database project) biological relational database schema, known as Chado, is one of the few successful open source initiatives, it is widely adopted and many softwares are able to connect to it.ResultsWe have been developing an open source software named Machado (https://github.com/lmb-embrapa/machado), a genomics data integration framework implemented in Python, to enable research groups to both store and browse, query, and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on the top of already existing databases. It has several data loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL and LSTrAP. There is an API to connect to JBrowse and a web browsing visualisation tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a google-like search i.e. single auto-complete search box that provides fast results and incremental filters.ConclusionMachado aims to be a modern object-relational framework that uses the latests Python libraries to produce an effective open source resource for genomics research.

Download Full-text