Dealing with ‘the spectre of "spurious" correlations': hazards in comparing ratios and other derived variables with a randomization test to determine if a biological interpretation is justified

Oikos ◽  
2021 ◽  
Author(s):  
Matthew R. Williams ◽  
Byron B. Lamont ◽  
Tianhua He
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kalifa Manjang ◽  
Shailesh Tripathi ◽  
Olli Yli-Harja ◽  
Matthias Dehmer ◽  
Galina Glazko ◽  
...  

AbstractThe identification of prognostic biomarkers for predicting cancer progression is an important problem for two reasons. First, such biomarkers find practical application in a clinical context for the treatment of patients. Second, interrogation of the biomarkers themselves is assumed to lead to novel insights of disease mechanisms and the underlying molecular processes that cause the pathological behavior. For breast cancer, many signatures based on gene expression values have been reported to be associated with overall survival. Consequently, such signatures have been used for suggesting biological explanations of breast cancer and drug mechanisms. In this paper, we demonstrate for a large number of breast cancer signatures that such an implication is not justified. Our approach eliminates systematically all traces of biological meaning of signature genes and shows that among the remaining genes, surrogate gene sets can be formed with indistinguishable prognostic prediction capabilities and opposite biological meaning. Hence, our results demonstrate that none of the studied signatures has a sensible biological interpretation or meaning with respect to disease etiology. Overall, this shows that prognostic signatures are black-box models with sensible predictions of breast cancer outcome but no value for revealing causal connections. Furthermore, we show that the number of such surrogate gene sets is not small but very large.


Genetics ◽  
2002 ◽  
Vol 160 (4) ◽  
pp. 1707-1719
Author(s):  
Chiara Sabatti ◽  
Neil Risch

Abstract We illustrate how homozygosity of haplotypes can be used to measure the level of disequilibrium between two or more markers. An excess of either homozygosity or heterozygosity signals a departure from the gametic phase equilibrium: We describe the specific form of dependence that is associated with high (low) homozygosity and derive various linkage disequilibrium measures. They feature a clear biological interpretation, can be used to construct tests, and are standardized to allow comparison across loci and populations. They are particularly advantageous to measure linkage disequilibrium between highly polymorphic markers.


2021 ◽  
Vol 83 (3) ◽  
Author(s):  
Ginger Egberts ◽  
Fred Vermolen ◽  
Paul van Zuijlen

AbstractTo deal with permanent deformations and residual stresses, we consider a morphoelastic model for the scar formation as the result of wound healing after a skin trauma. Next to the mechanical components such as strain and displacements, the model accounts for biological constituents such as the concentration of signaling molecules, the cellular densities of fibroblasts and myofibroblasts, and the density of collagen. Here we present stability constraints for the one-dimensional counterpart of this morphoelastic model, for both the continuous and (semi-) discrete problem. We show that the truncation error between these eigenvalues associated with the continuous and semi-discrete problem is of order $${{\mathcal {O}}}(h^2)$$ O ( h 2 ) . Next we perform numerical validation to these constraints and provide a biological interpretation of the (in)stability. For the mechanical part of the model, the results show the components reach equilibria in a (non) monotonic way, depending on the value of the viscosity. The results show that the parameters of the chemical part of the model need to meet the stability constraint, depending on the decay rate of the signaling molecules, to avoid unrealistic results.


1993 ◽  
Vol 73 (4) ◽  
pp. 985-994 ◽  
Author(s):  
G. Saindon ◽  
G. B. Schaalje

Genotype × environment (GE) effects in regional registration trials for dry beans in western Canada were studied to determine whether geographic distribution of sites could be rationalized. The structure of the GE effects on the maturity, seed weight and seed yield of seven dry bean check cultivars grown at eight locations during 4 yr was investigated using GE mean squares decomposition, cluster analysis and the additive main effect and multiplicative interaction (AMMI) method. The analyses revealed a high level of redundancy in the locations which suggested that most GE effects can be captured with fewer testing sites. The partition of the GE mean squares demonstrated the possibility of reproducing the GE structure of the entire data sets with as few as three sites; however, more locations may be needed to compensate for unpredictable environmental effects. Based on biological interpretation of groupings and visual assessment of the AMMI displays, a five-location set fully represented the GE effects on maturity, seed weight and seed yield and accounted for the inconsistent clustering of the Brooks site for the three traits. Also, the set should allow for site losses due to unpredictable environmental events. The dry bean industry in western Canada is expanding to non-traditional growing areas and the establishment of trials in these areas should be considered as they may create GE effects not considered to date. Key words: Phaseolus vulgaris L., genotype × environment interactions, cluster analysis, AMMI analysis


2015 ◽  
Vol 2015 ◽  
pp. 1-13
Author(s):  
Igor Sandalov ◽  
Leonid Padyukov

To identify putative relations between different genetic factors in the human genome in the development of common complex disease, we mapped the genetic data to an ensemble of spin chains and analysed the data as a quantum system. Each SNP is considered as a spin with three states corresponding to possible genotypes. The combined genotype represents a multispin state, described by the product of individual-spin states. Each person is characterized by a single genetic vector (GV) and individuals with identical GVs comprise the GV group. This consolidation of genotypes into GVs provides integration of multiple genetic variants for a single statistical test and excludes ambiguity of biological interpretation known for allele and haplotype associations. We analyzed two independent cohorts, with 2633 rheumatoid arthritis cases and 2108 healthy controls, and data for 6 SNPs from the HTR2A locus plus shared epitope allele. We found that GVs based on selected markers are highly informative and overlap for 98.3% of the healthy population between two cohorts. Interestingly, some of the GV groups contain either only controls or only cases, thus demonstrating extreme susceptibility or protection features. By using this new approach we confirmed previously detected univariate associations and demonstrated the most efficient selection of SNPs for combined analyses for functional studies.


Author(s):  
Richard L. Scheaffer ◽  
Ann Watkins ◽  
Mrudulla Gnanadesikan ◽  
Jeffrey A. Witmer

2013 ◽  
Vol 25 (4) ◽  
pp. 406-417 ◽  
Author(s):  
Márlon de Castro Vasconcelos ◽  
Adriano Sanches Melo ◽  
Albano Schwarzbold

AIM: We evaluated five stream classification systems observing: 1) differences in richness, abundance and macroinvertebrates communities among stream classes within classification systems; and 2) whether classification systems present better performance using macroinvertebrates. Additionally, we evaluated the effects of taxonomic resolution and data type (abundance and presence) on results. METHODS: Five stream classification systems were used, two based on hydroregions, one based on ecoregions by FEOW, a fourth one based on stream orders and the last one based on clusters of environment variables sampled in 37 streams at Rio Grande do Sul state, Brazil. We used a randomization test to evaluate differences of richness and abundance, a db-MANOVA to evaluate the differences of species assemblages and Classification Strength (CS) to evaluate the classifications performance. RESULTS: There were differences of richness and abundance among stream classes within each stream classification. The same result was found for community data, except for stream order classifications in family level. We observed that stream classes obtained for each stream classification differed in terms of environment variables (db-MANOVA). The classification based on environment variables showed higher CS values than other classification systems. The taxonomic resolution was important to the observed results. Data on genera level presented CS values 12% higher than family level for cluster classification, and the data type was dependent on the classification system and taxonomic resolution employed. CONCLUSION: Our results indicate that classifications based on cluster of environment variables was better than other stream classification systems, and similar results using genera level can be obtained for management programs using family resolution in a geographical context similar to this study.


Sign in / Sign up

Export Citation Format

Share Document