scholarly journals A semi-variance approach to visualising phylogenetic autocorrelation

2021 ◽  
Author(s):  
Michael J Noonan ◽  
William F Fagan ◽  
Christen Herbert Fleming

Comparing traits across species has been a hallmark of biological research for centuries. While inter-specific comparisons can be highly informative, phylogenetic inertia can bias estimates if not properly accounted for in comparative analyses. In response, researchers typically treat phylogenetic inertia as a form of autocorrelation that can be detected, modelled, and corrected for. Despite the range of methods available for quantifying the strength of phylogenetic autocorrelation, no tools exist for visualising these autocorrelation structures. Here we derive variogram methods suitable for phylogenic data, and show how they can be used to straightforwardly visualise phylogenetic autocorrelation. We then demonstrate their utility for three empirical examples: sexual size dimorphism (SSD) in the Musteloidea, maximum per capita rate of population growth, r, in the Carnivora, and brain size in the Artiodactyla. When modelling musteloid SSD, the empirical variogram showed a tendency for the variance in SSD to stabilise over time, a characteristic feature of Ornstein-Uhlenbeck (OU) evolution. In agreement with this visual assessment, model selection identified the OU model as the best fit to the data. In contrast, the infinitely diffusive Brownian Motion (BM) model did not capture the asymptotic behaviour of the variogram and was less supported than the OU model. Phylogenetic variograms proved equally useful in understanding why an OU model was selected when modelling r in the Carnivora, and why BM was the selected evolutionary model for brain size in the Artiodactyla. Because the variograms of the various evolutionary processes each have different theoretical profiles, comparing fitted semi-variance functions against empirical semi-variograms can serve as a useful diagnostic tool, allowing researchers to understand why any given evolutionary model might be selected over another, which features are well captured by the model, and which are not. This allows for fitted models to be compared against the empirical variogram, facilitating model identification prior to subsequent analyses. We therefore recommend that any phylogenetic analysis begin with a non-parametric estimate of the autocorrelation structure of the data that can be visualized. The methods developed in this work are openly available in the new R package ctpm.

Author(s):  
Shuo Wang ◽  
Wei Su ◽  
Chuanfan Zhong ◽  
Taowei Yang ◽  
Wenbin Chen ◽  
...  

Prostate cancer (PCa) is a high morbidity malignancy in males, and biochemical recurrence (BCR) may appear after the surgery. Our study is designed to build up a risk score model using circular RNA sequencing data for PCa. The dataset is from the GEO database, using a cohort of 144 patients in Canada. We removed the low abundance circRNAs (FPKM < 1) and obtained 546 circRNAs for the next step. BCR-related circRNAs were selected by Logistic regression using the “survival” and “survminer” R package. Least absolute shrinkage and selector operation (LASSO) regression with 10-fold cross-validation and penalty was used to construct a risk score model by “glmnet” R software package. In total, eight circRNAs (including circ_30029, circ_117300, circ_176436, circ_112897, circ_112897, circ_178252, circ_115617, circ_14736, and circ_17720) were involved in our risk score model. Further, we employed differentially expressed mRNAs between high and low risk score groups. The following Gene Ontology (GO) analysis were visualized by Omicshare Online tools. As per the GO analysis results, tumor immune microenvironment related pathways are significantly enriched. “CIBERSORT” and “ESTIMATE” R package were used to detect tumor-infiltrating immune cells and compare the level of microenvironment scores between high and low risk score groups. What’s more, we verified two of eight circRNA’s (circ_14736 and circ_17720) circular characteristics and tested their biological function with qPCR and CCK8 in vitro. circ_14736 and circ_17720 were detected in exosomes of PCa patients’ plasma. This is the first bioinformatics study to establish a prognosis model for prostate cancer using circRNA. These circRNAs were associated with CD8+ T cell activities and may serve as a circRNA-based liquid biopsy panel for disease prognosis.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4718 ◽  
Author(s):  
Wouter van der Bijl

Confirmatory path analysis allows researchers to evaluate and compare causal models using observational data. This tool has great value for comparative biologists since they are often unable to gather experimental data on macro-evolutionary hypotheses, but is cumbersome and error-prone to perform. I introducephylopath, an R package that implements phylogenetic path analysis (PPA) as described by von Hardenberg & Gonzalez-Voyer (2013). In addition to the published method, I provide support for the inclusion of binary variables. I illustrate PPA andphylopathby recreating part of a study on the relationship between brain size and vulnerability to extinction. The package aims to make the analysis straight-forward, providing convenience functions, and several plotting methods, which I hope will encourage the spread of the method.


Author(s):  
Xiaoyu Liang ◽  
Ying Hu ◽  
Chunhua Yan ◽  
Ke Xu

Abstract Motivation High-quality imaging analyses have been proposed to drive innovation in biomedical and biological research. However, the application of images remains underexploited because of the limited capacity of human vision and the challenges in extracting quantitative information from images. Computationally extracting quantitative information from images is critical to overcoming this limitation. Here, we present a novel R package, i2d, to simulate data from an image based on digital convolution. Results The R package i2d allows users to transform an image into a simulated dataset that can be used to extract and analyze complex information in biomedical and biological research. The package also includes three novel and efficient methods for graph clustering based on simulated data, which can be used to dissect complex gene networks into sub-clusters that have similar biological functions. Availability and implementation The code, the documentation, a tutorial and example data are available on an open source at: github.com/XiaoyuLiang/i2d. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Kuan-Hao Chao ◽  
Yi-Wen Hsiao ◽  
Yi-Fang Lee ◽  
Chien-Yueh Lee ◽  
Liang-Chuan Lai ◽  
...  

RNA-Seq analysis has revolutionized researchers' understanding of the transcriptome in biological research. Assessing the differences in transcriptomic profiles between tissue samples or patient groups enables researchers to explore the underlying biological impact of transcription. RNA-Seq analysis requires multiple processing steps and huge computational capabilities. There are many well-developed R packages for individual steps; however, there are few R/Bioconductor packages that integrate existing software tools into a comprehensive RNA-Seq analysis and provide fundamental end-to-end results in pure R environment so that researchers can quickly and easily get fundamental information in big sequencing data. To address this need, we have developed the open source R/Bioconductor package, RNASeqR. It allows users to run an automated RNA-Seq analysis with only six steps, producing essential tabular and graphical results for further biological interpretation. The features of RNASeqR include: six-step analysis, comprehensive visualization, background execution version, and the integration of both R and command-line software. RNASeqR provides fast, light-weight, and easy-to-run RNA-Seq analysis pipeline in pure R environment. It allows users to efficiently utilize popular software tools, including both R/Bioconductor and command-line tools, without predefining the resources or environments. RNASeqR is freely available for Linux and macOS operating systems from Bioconductor (https://bioconductor.org/packages/release/bioc/html/RNASeqR.html).


2020 ◽  
Author(s):  
George G. Vega Yon ◽  
Duncan C. Thomas ◽  
John Morrison ◽  
Huaiyu Mi ◽  
Paul D. Thomas ◽  
...  

AbstractMotivationGene function annotation is important for a variety of downstream analyses of genetic data. Yet experimental characterization of function remains costly and slow, making computational prediction an important endeavor. In this paper we use a probabilistic evolutionary model built upon phylogenetic trees and experimental Gene Ontology functional annotations that allows automated prediction of function for unannotated genes.ResultsWe have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out validation, and we further validated some of the predictions in the experimental scientific literature.AvailabilityOur method has been implemented as an R package and it is available online at https://github.com/USCBiostats/aphylo. Code needed to reproduce the tables and figures can be found in https://github.com/USCbiostats/aphylo-simulations.Author summaryUnderstanding the individual role that genes play in life is a key issue in biomedical-sciences. While information regarding gene functions is continuously growing, the number of genes with unknown biological purpose is yet greater. Because of this, scientists have dedicated much of their time to build and design tools that automatically infer gene functions. In this paper, we present yet another attempt to do such. While very simple, our model of gene-function evolution has some key features that have the potential to generate an impact in the field: (a) compared to other methods, ours is highly-scalable, which means that it is possible to simultaneously analyze hundreds of what are known as gene-families, compromising thousands of genes, (b) supports our biological intuition as our model’s data-driven results coherently agree with what theory dictates regarding how gene-functions evolved, (c) notwithstanding its simplicity, the model’s prediction accuracy is comparable to other more complex alternatives, and (d) perhaps most importantly, it can be used to both support new annotations and to suggest areas in which existing annotations show inconsistencies that may indicate errors or controversies.


2015 ◽  
Vol 1092-1093 ◽  
pp. 1317-1325
Author(s):  
Jin Da Qi ◽  
Wei Li ◽  
Wei Cong Fu ◽  
Jian Wen Dong ◽  
Shuang Yi Lin

This paper uses Qishan National Forest Park as a sample to apply step analysis and cluster analysis on 14 attractions in this park by GIS spatial analysis function. To be more exact, based on planar space theory, visibility, continuity, clarity, comfort and other six factors were selected to be analyzed. Results provide that one attraction has the best landscape resource, six attractions own better landscape resource, two attractions is general and five spots are poor. The results are used to verify the feasibility of landscape visual assessment model which is based on GIS technology. Furthermore, this would also provide technical support for the visual landscape assessment of forest park.


2019 ◽  
Vol 35 (20) ◽  
pp. 4196-4199 ◽  
Author(s):  
David S Robertson ◽  
Jan Wildenhain ◽  
Adel Javanmard ◽  
Natasha A Karp

Abstract Summary In many areas of biological research, hypotheses are tested in a sequential manner, without having access to future P-values or even the number of hypotheses to be tested. A key setting where this online hypothesis testing occurs is in the context of publicly available data repositories, where the family of hypotheses to be tested is continually growing as new data is accumulated over time. Recently, Javanmard and Montanari proposed the first procedures that control the FDR for online hypothesis testing. We present an R package, onlineFDR, which implements these procedures and provides wrapper functions to apply them to a historic dataset or a growing data repository. Availability and implementation The R package is freely available through Bioconductor (http://www.bioconductor.org/packages/onlineFDR). Supplementary information Supplementary data are available at Bioinformatics online.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7255
Author(s):  
Mahmoud Ahmed ◽  
Trang Huyen Lai ◽  
Deok Ryong Kim

Background The co-localization analysis of fluorescence microscopy images is a widely used technique in biological research. It is often used to determine the co-distribution of two proteins inside the cell, suggesting that these two proteins could be functionally or physically associated. The limiting step in conducting microscopy image analysis in a graphical interface tool is the selection of the regions of interest for the co-localization of two proteins. Implementation This package provides a simple straightforward workflow for loading fluorescence images, choosing regions of interest and calculating co-localization measurements. Included in the package is a shiny app that can be invoked locally to interactively select the regions of interest where two proteins are co-localized. Availability colocr is available on the comprehensive R archive network, and the source code is available on GitHub under the GPL-3 license as part of the ROpenSci collection, https://github.com/ropensci/colocr.


2018 ◽  
Author(s):  
Haley R. Eidem ◽  
Jacob Steenwyk ◽  
Jennifer Wisecaver ◽  
John A. Capra ◽  
Patrick Abbot ◽  
...  

AbstractBackgroundThe integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. Even though several methods exist to integrate heterogeneous omics data, most biologists still manually select candidate genes by examining the intersection of lists of candidates stemming from analyses of different types of omics data that have been generated by imposing hard (strict) thresholds on quantitative variables, such as P-values and fold changes, increasing the chance of missing potentially important candidates.MethodsTo better facilitate the unbiased integration of heterogeneous omics data collected from diverse platforms and samples, we propose a desirability function framework for identifying candidate genes with strong evidence across data types as targets for follow-up functional analysis. Our approach is targeted towards disease systems with sparse, heterogeneous omics data, so we tested it on one such pathology: spontaneous preterm birth (sPTB).ResultsWe developed the software integRATE, which uses desirability functions to rank genes both within and across studies, identifying well-supported candidate genes according to the cumulative weight of biological evidence rather than based on imposition of hard thresholds of key variables. Integrating 10 sPTB omics studies identified both genes in pathways previously suspected to be involved in sPTB as well as novel genes never before linked to this syndrome. integRATE is available as an R package on GitHub (https://github.com/haleyeidem/integRATE).ConclusionsDesirability-based data integration is a solution most applicable in biological research areas where omics data is especially heterogeneous and sparse, allowing for the prioritization of candidate genes that can be used to inform more targeted downstream functional analyses.


Sign in / Sign up

Export Citation Format

Share Document