scholarly journals Prediction With Dimension Reduction of Multiple Molecular Data Sources for Patient Survival

2017 ◽  
Vol 16 ◽  
pp. 117693511771851 ◽  
Author(s):  
Adam Kaplan ◽  
Eric F Lock

Predictive modeling from high-dimensional genomic data is often preceded by a dimension reduction step, such as principal component analysis (PCA). However, the application of PCA is not straightforward for multisource data, wherein multiple sources of ‘omics data measure different but related biological components. In this article, we use recent advances in the dimension reduction of multisource data for predictive modeling. In particular, we apply exploratory results from Joint and Individual Variation Explained (JIVE), an extension of PCA for multisource data, for prediction of differing response types. We conduct illustrative simulations to illustrate the practical advantages and interpretability of our approach. As an application example, we consider predicting survival for patients with glioblastoma multiforme from 3 data sources measuring messenger RNA expression, microRNA expression, and DNA methylation. We also introduce a method to estimate JIVE scores for new samples that were not used in the initial dimension reduction and study its theoretical properties; this method is implemented in the R package R.JIVE on CRAN, in the function jive.predict.

2016 ◽  
Vol 113 (51) ◽  
pp. 14662-14667 ◽  
Author(s):  
Zhixiang Lin ◽  
Can Yang ◽  
Ying Zhu ◽  
John Duchi ◽  
Yao Fu ◽  
...  

Dimension reduction methods are commonly applied to high-throughput biological datasets. However, the results can be hindered by confounding factors, either biological or technical in origin. In this study, we extend principal component analysis (PCA) to propose AC-PCA for simultaneous dimension reduction and adjustment for confounding (AC) variation. We show that AC-PCA can adjust for (i) variations across individual donors present in a human brain exon array dataset and (ii) variations of different species in a model organism ENCODE RNA sequencing dataset. Our approach is able to recover the anatomical structure of neocortical regions and to capture the shared variation among species during embryonic development. For gene selection purposes, we extend AC-PCA with sparsity constraints and propose and implement an efficient algorithm. The methods developed in this paper can also be applied to more general settings. The R package and MATLAB source code are available athttps://github.com/linzx06/AC-PCA.


Author(s):  
Derek W Brown ◽  
Timothy A Myers ◽  
Mitchell J Machiela

Abstract Summary A concern when conducting genome-wide association studies (GWAS) is the potential for population stratification, i.e. ancestry based genetic differences between cases and controls, that if not properly accounted for, could lead to biased association results. We developed PCAmatchR as an open source R package for performing optimal case-control matching using principal component analysis (PCA) to aid in selecting controls that are well matched by ancestry to cases. PCAmatchR takes user supplied PCA outputs and selects matching controls for cases by utilizing a weighted Mahalanobis distance metric which weights each principal component by the percent of genetic variation explained. Results from the 1000 Genomes Project data demonstrate both the functionality and performance of PCAmatchR for selecting matching controls for case populations as well as reducing inflation of association test statistics. PCAmatchR improves genomic similarity between matched cases and controls, which minimizes the effects of population stratification in GWAS analyses. Availability PCAmatchR is freely available for download on GitHub (https://github.com/machiela-lab/PCAmatchR) or through CRAN (https://cran.r-project.org/web/packages/PCAmatchR/index.html) Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Javier Fernández-López ◽  
M. Teresa Telleria ◽  
Margarita Dueñas ◽  
Mara Laguna-Castro ◽  
Klaus Schliep ◽  
...  

AbstractThe use of different sources of evidence has been recommended in order to conduct species delimitation analyses to solve taxonomic issues. In this study, we use a maximum likelihood framework to combine morphological and molecular traits to study the case of Xylodon australis (Hymenochaetales, Basidiomycota) using the locate.yeti function from the phytools R package. Xylodon australis has been considered a single species distributed across Australia, New Zealand and Patagonia. Multi-locus phylogenetic analyses were conducted to unmask the actual diversity under X. australis as well as the kinship relations respect their relatives. To assess the taxonomic position of each clade, locate.yeti function was used to locate in a molecular phylogeny the X. australis type material for which no molecular data was available using morphological continuous traits. Two different species were distinguished under the X. australis name, one from Australia–New Zealand and other from Patagonia. In addition, a close relationship with Xylodon lenis, a species from the South East of Asia, was confirmed for the Patagonian clade. We discuss the implications of our results for the biogeographical history of this genus and we evaluate the potential of this method to be used with historical collections for which molecular data is not available.


Molecules ◽  
2021 ◽  
Vol 26 (4) ◽  
pp. 1180
Author(s):  
Rafał Wawrzyniak ◽  
Wiesław Wasiak ◽  
Beata Jasiewicz ◽  
Alina Bączkiewicz ◽  
Katarzyna Buczkowska

Aneura pinguis (L.) Dumort. is a representative of the simple thalloid liverworts, one of the three main types of liverwort gametophytes. According to classical taxonomy, A. pinguis represents one morphologically variable species; however, genetic data reveal that this species is a complex consisting of 10 cryptic species (named by letters from A to J), of which four are further subdivided into two or three evolutionary lineages. The objective of this work was to develop an efficient method for the characterisation of plant material using marker compounds. The volatile chemical constituents of cryptic species within the liverwort A. pinguis were analysed by GC-MS. The compounds were isolated from plant material using the HS-SPME technique. Of the 66 compounds examined, 40 were identified. Of these 40 compounds, nine were selected for use as marker compounds of individual cryptic species of A. pinguis. A guide was then developed that clarified how these markers could be used for the rapid identification of the genetic lineages of A. pinguis. Multivariate statistical analyses (principal component and cluster analysis) revealed that the chemical compounds in A. pinguis made it possible to distinguish individual cryptic species (including genetic lineages), with the exception of cryptic species G and H. The classification of samples based on the volatile compounds by cluster analysis reflected phylogenetic relationships between cryptic species and genetic lineages of A. pinguis revealed based on molecular data.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1432
Author(s):  
Xwégnon Ghislain Agoua ◽  
Robin Girard ◽  
Georges Kariniotakis

The efficient integration of photovoltaic (PV) production in energy systems is conditioned by the capacity to anticipate its variability, that is, the capacity to provide accurate forecasts. From the classical forecasting methods in the state of the art dealing with a single power plant, the focus has moved in recent years to spatio-temporal approaches, where geographically dispersed data are used as input to improve forecasts of a site for the horizons up to 6 h ahead. These spatio-temporal approaches provide different performances according to the data sources available but the question of the impact of each source on the actual forecasting performance is still not evaluated. In this paper, we propose a flexible spatio-temporal model to generate PV production forecasts for horizons up to 6 h ahead and we use this model to evaluate the effect of different spatial and temporal data sources on the accuracy of the forecasts. The sources considered are measurements from neighboring PV plants, local meteorological stations, Numerical Weather Predictions, and satellite images. The evaluation of the performance is carried out using a real-world test case featuring a high number of 136 PV plants. The forecasting error has been evaluated for each data source using the Mean Absolute Error and Root Mean Square Error. The results show that neighboring PV plants help to achieve around 10% reduction in forecasting error for the first three hours, followed by satellite images which help to gain an additional 3% all over the horizons up to 6 h ahead. The NWP data show no improvement for horizons up to 6 h but is essential for greater horizons.


2021 ◽  
Vol 736 ◽  
pp. 137-182
Author(s):  
Daniel Burckhardt ◽  
David Ouvrard ◽  
Diana M. Percy

The classification of the superfamily Psylloidea is revised to incorporate findings from recent molecular studies, and to integrate a reassessment of monophyla primarily based on molecular data with morphological evidence and previous classifications. We incorporate a reinterpretation of relevant morphology in the light of the molecular findings and discuss conflicts with respect to different data sources and sampling strategies. Seven families are recognised of which four (Calophyidae, Carsidaridae, Mastigimatidae and Triozidae) are strongly supported, and three (Aphalaridae, Liviidae and Psyllidae) weakly or moderately supported. Although the revised classification is mostly similar to those recognised by recent authors, there are some notable differences, such as Diaphorina and Katacephala which are transferred from Liviidae to Psyllidae. Five new subfamilies and one new genus are described, and one secondary homonym is replaced by a new species name. A new or revised status is proposed for one family, four subfamilies, four tribes, seven subtribes and five genera. One tribe and eight genera / subgenera are synonymised, and 32 new and six revised species combinations are proposed. All recognised genera of Psylloidea (extant and fossil) are assigned to family level taxa, except for one which is considered a nomen dubium.


Genetika ◽  
2019 ◽  
Vol 51 (1) ◽  
pp. 1-15 ◽  
Author(s):  
Aleksandra Savic ◽  
Milka Brdar-Jokanovic ◽  
Miodrag Dimitrijevic ◽  
Sofija Petrovic ◽  
Milan Zdravkovic ◽  
...  

The characterization of 41 common bean cultivars and landraces from breeding collection of Institute of Field and Vegetable Crops, Novi Sad, Serbia, was done based on phenotypic traits and microsatellite markers. Phenotypic traits were chosen from Bioversity International descriptor list. In addition, main yield components were investigated. Analysis of phaseolin type revealed affiliation of cultivars and landraces to Mesoamerican or Andean gene pool. Cultivars and landraces demonstrated significant diversity level with regard to studied phenotypic traits. Identified variation showed high potential for developing new cultivars with desirable combination of traits. Principal component analysis based on phenotypic traits separated bean cultivars and landraces in two groups, which corresponded to Mesoamerican and Andean determined according to phaseolin type. Putative hybrids, with combination of traits between gene pools were also identified. Analysis of microsatellite data, using twenty-two SSR primer pairs, showed medium gene diversity in studied material. Microsatellite-based cluster analysis separated genotypes in two discrete clusters and several subclusters. No clear separation according to gene pool was found between the clusters, however grouping according to gene pool and patterns of phenotypic variation, following these gene pools, were observed within subclusters. Knowledge on detailed relationships of cultivars and landraces based on phenotypic and molecular data would facilitate identification of candidates for future breeding.


Sign in / Sign up

Export Citation Format

Share Document