scholarly journals A Statistical Framework for QTL Hotspot Detection

2020 ◽  
Author(s):  
Po-Ya Wu ◽  
Man-Hsia Yang ◽  
Chen-Hung Kao

ABSTRACTQuantitative trait loci (QTL) hotspots (genomic locations enriched in QTL) are a common and notable feature when collecting many QTL for various traits in many areas of biological studies. The QTL hotspots are important and attractive since they are highly informative and may harbor genes for the quantitative traits. So far, the current statistical methods for QTL hotspot detection use either the individual-level data from the genetical genomics experiments or the summarized data from public QTL databases to proceed with the detection analysis. These detection methods attempt to address some of the concerns, including the correlation structure among traits, the magnitude of LOD scores within a hotspot and computational cost, that arise during the process of QTL hotspot detection. In this article, we describe a statistical framework that can handle both types of data as well as address all the concerns at a time for QTL hotspot detection. Our statistical framework directly operates on the QTL matrix and hence has a very cheap computation cost, and is deployed to take advantage of the QTL mapping results for assisting the detection analysis. Two special devices, trait grouping and top γn,α profile, are introduced into the framework. The trait grouping attempts to group the closely linked or pleiotropic traits together to take care of the true linkages and cope with the underestimation of hotspot thresholds due to non-genetic correlations (arising from ignoring the correlation structure among traits), so as to have the ability to obtain much stricter thresholds and dismiss spurious hotspots. The top γn,α profile is designed to outline the LOD-score pattern of a hotspot across the different hotspot architectures, so that it can serve to identify and characterize the types of QTL hotspots with varying sizes and LOD score distributions. Real examples, numerical analysis and simulation study are performed to validate our statistical framework, investigate the detection properties, and also compare with the current methods in QTL hotspot detection. The results demonstrate that the proposed statistical framework can effectively accommodate the correlation structure among traits, identify the types of hotspots and still keep the notable features of easy implementation and fast computation for practical QTL hotspot detection.

2021 ◽  
Vol 11 (4) ◽  
Author(s):  
Po-Ya Wu ◽  
Man-Hsia Yang ◽  
Chen-Hung Kao

AbstractQuantitative trait loci (QTL) hotspots (genomic locations enriched in QTL) are a common and notable feature when collecting many QTL for various traits in many areas of biological studies. The QTL hotspots are important and attractive since they are highly informative and may harbor genes for the quantitative traits. So far, the current statistical methods for QTL hotspot detection use either the individual-level data from the genetical genomics experiments or the summarized data from public QTL databases to proceed with the detection analysis. These methods may suffer from the problems of ignoring the correlation structure among traits, neglecting the magnitude of LOD scores for the QTL, or paying a very high computational cost, which often lead to the detection of excessive spurious hotspots, failure to discover biologically interesting hotspots composed of a small-to-moderate number of QTL with strong LOD scores, and computational intractability, respectively, during the detection process. In this article, we describe a statistical framework that can handle both types of data as well as address all the problems at a time for QTL hotspot detection. Our statistical framework directly operates on the QTL matrix and hence has a very cheap computational cost and is deployed to take advantage of the QTL mapping results for assisting the detection analysis. Two special devices, trait grouping and top γn,α profile, are introduced into the framework. The trait grouping attempts to group the traits controlled by closely linked or pleiotropic QTL together into the same trait groups and randomly allocates these QTL together across the genomic positions separately by trait group to account for the correlation structure among traits, so as to have the ability to obtain much stricter thresholds and dismiss spurious hotspots. The top γn,α profile is designed to outline the LOD-score pattern of QTL in a hotspot across the different hotspot architectures, so that it can serve to identify and characterize the types of QTL hotspots with varying sizes and LOD-score distributions. Real examples, numerical analysis, and simulation study are performed to validate our statistical framework, investigate the detection properties, and also compare with the current methods in QTL hotspot detection. The results demonstrate that the proposed statistical framework can effectively accommodate the correlation structure among traits, identify the types of hotspots, and still keep the notable features of easy implementation and fast computation for practical QTL hotspot detection.


2018 ◽  
Author(s):  
Man-Hsia Yang ◽  
Dong-Hong Wu ◽  
Chen-Hung Kao

ABSTRACTGenome-wide detection of quantitative trait loci (QTL) hotspots underlying variation in many molecular and phenotypic traits has been a key step in various biological studies since the QTL hotspots are highly informative and can be linked to the genes for the quantitative traits. Several statistical methods have been proposed to detect QTL hotspots. These hotspot detection methods rely heavily on permutation tests performed on summarized QTL data or individual-level data (with genotypes and phenotypes) from the genetical genomics experiments. In this article, we propose a statistical procedure for QTL hotspot detection by using the summarized QTL (interval) data collected in public web-accessible databases. First, a simple statistical method based on the uniform distribution is derived to convert the QTL interval data into the expected QTL frequency (EQF) matrix. And then, to account for the correlation structure among traits, the QTLs for correlated traits are grouped together into the same categories to form a reduced EQF matrix. Furthermore, a permutation algorithm on the EQF elements or on the QTL intervals is developed to compute a sliding scale of EQF thresholds, ranging from strict to liberal, for assessing the significance of QTL hotspots. With grouping, much stricter thresholds can be obtained to avoid the detection of spurious hotspots. Real example analysis and simulation study are carried out to illustrate our procedure, evaluate the performances and compare with other methods. It shows that our procedure can control the genome-wide error rates at the target levels, provide appropriate thresholds for correlated data and is comparable to the methods using individual-level data in hotspot detection. Depending on the thresholds used, more than 100 hotspots are detected in GRAMENE rice database. We also perform a genome-wide comparative analysis of the detected hotspots and the known genes collected in the Rice Q-TARO database. The comparative analysis reveals that the hotspots and genes are conformable in the sense that they co-localize closely and are functionally related to relevant traits. Our statistical procedure can provide a framework for exploring the networks among QTL hotspots, genes and quantitative traits in biological studies. The R codes that produce both numerical and graphical outputs of QTL hotspot detection in the genome are available on the worldwide web http://www.stat.sinica.edu.tw/~chkao/.


2021 ◽  
Vol 13 (15) ◽  
pp. 2869
Author(s):  
MohammadAli Hemati ◽  
Mahdi Hasanlou ◽  
Masoud Mahdianpari ◽  
Fariba Mohammadimanesh

With uninterrupted space-based data collection since 1972, Landsat plays a key role in systematic monitoring of the Earth’s surface, enabled by an extensive and free, radiometrically consistent, global archive of imagery. Governments and international organizations rely on Landsat time series for monitoring and deriving a systematic understanding of the dynamics of the Earth’s surface at a spatial scale relevant to management, scientific inquiry, and policy development. In this study, we identify trends in Landsat-informed change detection studies by surveying 50 years of published applications, processing, and change detection methods. Specifically, a representative database was created resulting in 490 relevant journal articles derived from the Web of Science and Scopus. From these articles, we provide a review of recent developments, opportunities, and trends in Landsat change detection studies. The impact of the Landsat free and open data policy in 2008 is evident in the literature as a turning point in the number and nature of change detection studies. Based upon the search terms used and articles included, average number of Landsat images used in studies increased from 10 images before 2008 to 100,000 images in 2020. The 2008 opening of the Landsat archive resulted in a marked increase in the number of images used per study, typically providing the basis for the other trends in evidence. These key trends include an increase in automated processing, use of analysis-ready data (especially those with atmospheric correction), and use of cloud computing platforms, all over increasing large areas. The nature of change methods has evolved from representative bi-temporal pairs to time series of images capturing dynamics and trends, capable of revealing both gradual and abrupt changes. The result also revealed a greater use of nonparametric classifiers for Landsat change detection analysis. Landsat-9, to be launched in September 2021, in combination with the continued operation of Landsat-8 and integration with Sentinel-2, enhances opportunities for improved monitoring of change over increasingly larger areas with greater intra- and interannual frequency.


Author(s):  
Benjamin D. Youngman ◽  
David B. Stephenson

We develop a statistical framework for simulating natural hazard events that combines extreme value theory and geostatistics. Robust generalized additive model forms represent generalized Pareto marginal distribution parameters while a Student’s t -process captures spatial dependence and gives a continuous-space framework for natural hazard event simulations. Efficiency of the simulation method allows many years of data (typically over 10 000) to be obtained at relatively little computational cost. This makes the model viable for forming the hazard module of a catastrophe model. We illustrate the framework by simulating maximum wind gusts for European windstorms, which are found to have realistic marginal and spatial properties, and validate well against wind gust measurements.


2010 ◽  
Vol 19 (8) ◽  
pp. 996 ◽  
Author(s):  
Philip E. Higuera ◽  
Daniel G. Gavin ◽  
Patrick J. Bartlein ◽  
Douglas J. Hallett

Over the past several decades, high-resolution sediment–charcoal records have been increasingly used to reconstruct local fire history. Data analysis methods usually involve a decomposition that detrends a charcoal series and then applies a threshold value to isolate individual peaks, which are interpreted as fire episodes. Despite the proliferation of these studies, methods have evolved largely in the absence of a thorough statistical framework. We describe eight alternative decomposition models (four detrending methods used with two threshold-determination methods) and evaluate their sensitivity to a set of known parameters integrated into simulated charcoal records. Results indicate that the combination of a globally defined threshold with specific detrending methods can produce strongly biased results, depending on whether or not variance in a charcoal record is stationary through time. These biases are largely eliminated by using a locally defined threshold, which adapts to changes in variability throughout a charcoal record. Applying the alternative decomposition methods on three previously published charcoal records largely supports our conclusions from simulated records. We also present a minimum-count test for empirical records, which reduces the likelihood of false positives when charcoal counts are low. We conclude by discussing how to evaluate when peak detection methods are warranted with a given sediment–charcoal record.


2019 ◽  
Vol 116 (38) ◽  
pp. 18962-18970 ◽  
Author(s):  
Sushant Kumar ◽  
Declan Clarke ◽  
Mark B. Gerstein

Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue–residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.


2004 ◽  
Vol 13 (3) ◽  
pp. 275 ◽  
Author(s):  
R. Pu ◽  
P. Gong ◽  
Z. Li ◽  
J. Scarborough

A wildfire-mapping algorithm is proposed based on fire dynamics, called the dynamic algorithm. It is applied to daily NOAA/AVHRR/HRPT data for wildland areas (scrub, chaparral, grassland, marsh, riparian forest, woodland, rangeland and forests) in California for September and October 1999. Daily AVHRR images acquired from two successive days are compared for active fire detection and burn scar mapping. The algorithm consists of four stages: data preparation; hotspot detection; burn scar mapping; and final confirmation of potential burn scar pixels. Preliminary comparisons between the result mapped by the dynamic algorithm and the fire polygons collected by the California Department of Forestry and Fire Protection through ground survey indicate that the algorithm can track burn scars at different developmental stages at a daily level. The comparisons between wildfire mapping results produced by a modified version of an existing algorithm and the dynamic algorithm also indicate this point. This is the major contribution of this algorithm to wildfire detection methods. The dynamic algorithm requires highly precise registration between consecutive images.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Ronald de Vlaming ◽  
Eric A. W. Slob ◽  
Philip R. Jansen ◽  
Alain Dagher ◽  
Philipp D. Koellinger ◽  
...  

AbstractHuman variation in brain morphology and behavior are related and highly heritable. Yet, it is largely unknown to what extent specific features of brain morphology and behavior are genetically related. Here, we introduce a computationally efficient approach for multivariate genomic-relatedness-based restricted maximum likelihood (MGREML) to estimate the genetic correlation between a large number of phenotypes simultaneously. Using individual-level data (N = 20,190) from the UK Biobank, we provide estimates of the heritability of gray-matter volume in 74 regions of interest (ROIs) in the brain and we map genetic correlations between these ROIs and health-relevant behavioral outcomes, including intelligence. We find four genetically distinct clusters in the brain that are aligned with standard anatomical subdivision in neuroscience. Behavioral traits have distinct genetic correlations with brain morphology which suggests trait-specific relevance of ROIs. These empirical results illustrate how MGREML can be used to estimate internally consistent and high-dimensional genetic correlation matrices in large datasets.


Sign in / Sign up

Export Citation Format

Share Document