A FLEXIBLE NONPARAMETRIC APPROACH TO FIND CANDIDATE GENES ASSOCIATED WITH DISEASE IN MICROARRAY EXPERIMENTS

2013 ◽  
Vol 11 (02) ◽  
pp. 1250021 ◽  
Author(s):  
AHMED HOSSAIN ◽  
ANDREW R. WILLAN ◽  
JOSEPH BEYENE

Very often biologists are interested to know the biological function of a particular gene. Its true biological function may depend on other genes. Finding other genes in the same biological pathway of that gene may enhance further understanding of its biological function. Therefore, we are interested in finding other candidate genes whose expression values are highly correlated with that of a "seed" gene. The "seed" gene, which is known and associated with a disease, is used as a reference to extract candidate genes from microarray experiments and enriched pathways. We propose a nonparametric procedure for selecting the candidate genes. The expression levels for these candidate genes are correlated with that of a "seed" gene in microarray experiments. The proposed test statistic compares two Area Under Receiver Operating Characteristic Curves (AUC) for gene pairs, taking implicit correlation between two AUCs into account. The performance of our method is compared to the other well-known methods through the use of simulation and real data analysis.

Symmetry ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 936
Author(s):  
Dan Wang

In this paper, a ratio test based on bootstrap approximation is proposed to detect the persistence change in heavy-tailed observations. This paper focuses on the symmetry testing problems of I(1)-to-I(0) and I(0)-to-I(1). On the basis of residual CUSUM, the test statistic is constructed in a ratio form. I prove the null distribution of the test statistic. The consistency under alternative hypothesis is also discussed. However, the null distribution of the test statistic contains an unknown tail index. To address this challenge, I present a bootstrap approximation method for determining the rejection region of this test. Simulation studies of artificial data are conducted to assess the finite sample performance, which shows that our method is better than the kernel method in all listed cases. The analysis of real data also demonstrates the excellent performance of this method.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
K. S. Au ◽  
L. Hebert ◽  
P. Hillman ◽  
C. Baker ◽  
M. R. Brown ◽  
...  

AbstractMyelomeningocele (MMC) affects one in 1000 newborns annually worldwide and each surviving child faces tremendous lifetime medical and caregiving burdens. Both genetic and environmental factors contribute to disease risk but the mechanism is unclear. This study examined 506 MMC subjects for ultra-rare deleterious variants (URDVs, absent in gnomAD v2.1.1 controls that have Combined Annotation Dependent Depletion score ≥ 20) in candidate genes either known to cause abnormal neural tube closure in animals or previously associated with human MMC in the current study cohort. Approximately 70% of the study subjects carried one to nine URDVs among 302 candidate genes. Half of the study subjects carried heterozygous URDVs in multiple genes involved in the structure and/or function of cilium, cytoskeleton, extracellular matrix, WNT signaling, and/or cell migration. Another 20% of the study subjects carried heterozygous URDVs in candidate genes associated with gene transcription regulation, folate metabolism, or glucose metabolism. Presence of URDVs in the candidate genes involving these biological function groups may elevate the risk of developing myelomeningocele in the study cohort.


Author(s):  
Lingtao Kong

The exponential distribution has been widely used in engineering, social and biological sciences. In this paper, we propose a new goodness-of-fit test for fuzzy exponentiality using α-pessimistic value. The test statistics is established based on Kullback-Leibler information. By using Monte Carlo method, we obtain the empirical critical points of the test statistic at four different significant levels. To evaluate the performance of the proposed test, we compare it with four commonly used tests through some simulations. Experimental studies show that the proposed test has higher power than other tests in most cases. In particular, for the uniform and linear failure rate alternatives, our method has the best performance. A real data example is investigated to show the application of our test.


Author(s):  
Claudia Angelini ◽  
Daniela De Canditiis ◽  
Margherita Mutarelli ◽  
Marianna Pensky

The objective of the present paper is to develop a truly functional Bayesian method specifically designed for time series microarray data. The method allows one to identify differentially expressed genes in a time-course microarray experiment, to rank them and to estimate their expression profiles. Each gene expression profile is modeled as an expansion over some orthonormal basis, where the coefficients and the number of basis functions are estimated from the data. The proposed procedure deals successfully with various technical difficulties that arise in typical microarray experiments such as a small number of observations, non-uniform sampling intervals and missing or replicated data. The procedure allows one to account for various types of errors and offers a good compromise between nonparametric techniques and techniques based on normality assumptions. In addition, all evaluations are performed using analytic expressions, so the entire procedure requires very small computational effort. The procedure is studied using both simulated and real data, and is compared with competitive recent approaches. Finally, the procedure is applied to a case study of a human breast cancer cell line stimulated with estrogen. We succeeded in finding new significant genes that were not marked in an earlier work on the same dataset.


2018 ◽  
Vol 19 (9) ◽  
pp. 2794 ◽  
Author(s):  
Rong Zhou ◽  
Komivi Dossa ◽  
Donghua Li ◽  
Jingyin Yu ◽  
Jun You ◽  
...  

Sesame is poised to become a major oilseed crop owing to its high oil quality and adaptation to various ecological areas. However, the seed yield of sesame is very low and the underlying genetic basis is still elusive. Here, we performed genome-wide association studies of 39 seed yield-related traits categorized into five major trait groups, in three different environments, using 705 diverse lines. Extensive variation was observed for the traits with capsule size, capsule number and seed size-related traits, found to be highly correlated with seed yield indexes. In total, 646 loci were significantly associated with the 39 traits (p < 10−7) and resolved to 547 quantitative trait loci QTLs. We identified six multi-environment QTLs and 76 pleiotropic QTLs associated with two to five different traits. By analyzing the candidate genes for the assayed traits, we retrieved 48 potential genes containing significant functional loci. Several homologs of these candidate genes in Arabidopsis are described to be involved in seed or biomass formation. However, we also identified novel candidate genes, such as SiLPT3 and SiACS8, which may control capsule length and capsule number traits. Altogether, we provided the highly-anticipated basis for research on genetics and functional genomics towards seed yield improvement in sesame.


Epigenomics ◽  
2020 ◽  
Vol 12 (9) ◽  
pp. 747-755
Author(s):  
Veronika Suni ◽  
Fatemeh Seyednasrollah ◽  
Bishwa Ghimire ◽  
Sini Junttila ◽  
Asta Laiho ◽  
...  

Aim: DNA methylation is a key epigenetic mechanism regulating gene expression. Identifying differentially methylated regions is integral to DNA methylation analysis and there is a need for robust tools reliably detecting regions with significant differences in their methylation status. Materials & methods: We present here a reproducibility-optimized test statistic (ROTS) for detection of differential DNA methylation from high-throughput sequencing or array-based data. Results: Using both simulated and real data, we demonstrate the ability of ROTS to identify differential methylation between sample groups. Conclusion: Compared with state-of-the-art methods, ROTS shows competitive sensitivity and specificity in detecting consistently differentially methylated regions.


2018 ◽  
Vol 28 (8) ◽  
pp. 2418-2438
Author(s):  
Xi Shen ◽  
Chang-Xing Ma ◽  
Kam C Yuen ◽  
Guo-Liang Tian

Bilateral correlated data are often encountered in medical researches such as ophthalmologic (or otolaryngologic) studies, in which each unit contributes information from paired organs to the data analysis, and the measurements from such paired organs are generally highly correlated. Various statistical methods have been developed to tackle intra-class correlation on bilateral correlated data analysis. In practice, it is very important to adjust the effect of confounder on statistical inferences, since either ignoring the intra-class correlation or confounding effect may lead to biased results. In this article, we propose three approaches for testing common risk difference for stratified bilateral correlated data under the assumption of equal correlation. Five confidence intervals of common difference of two proportions are derived. The performance of the proposed test methods and confidence interval estimations is evaluated by Monte Carlo simulations. The simulation results show that the score test statistic outperforms other statistics in the sense that the former has robust type [Formula: see text] error rates with high powers. The score confidence interval induced from the score test statistic performs satisfactorily in terms of coverage probabilities with reasonable interval widths. A real data set from an otolaryngologic study is used to illustrate the proposed methodologies.


2018 ◽  
Vol 28 (9) ◽  
pp. 2868-2875
Author(s):  
Zhongxue Chen ◽  
Qingzhong Liu ◽  
Kai Wang

Several gene- or set-based association tests have been proposed recently in the literature. Powerful statistical approaches are still highly desirable in this area. In this paper we propose a novel statistical association test, which uses information of the burden component and its complement from the genotypes. This new test statistic has a simple null distribution, which is a special and simplified variance-gamma distribution, and its p-value can be easily calculated. Through a comprehensive simulation study, we show that the new test can control type I error rate and has superior detecting power compared with some popular existing methods. We also apply the new approach to a real data set; the results demonstrate that this test is promising.


2016 ◽  
Vol 27 (2) ◽  
pp. 541-548 ◽  
Author(s):  
Tsung-Shan Tsou

Intuitively, one only needs patients with two positive screening test results for positive predictive values comparison, and those with two negative screening test results for contrasting negative predictive values. Nevertheless, current existing methods rely on the multinomial model that includes superfluous parameters unnecessary for specific comparisons. This practice results in complex statistics formulas. We introduce a novel likelihood approach that fits the intuition by including a minimum number of parameters of interest in paired designs. It is demonstrated that our robust score test statistic is identical to a newly proposed weighted generalized score test statistic. Simulations and real data analysis are used for illustration.


2008 ◽  
Vol 2 ◽  
pp. BBI.S473 ◽  
Author(s):  
Akihiro Hirakawa ◽  
Yasunori Sato ◽  
Chikuma Hamada ◽  
Isao Yoshimura

Choosing an appropriate statistic and precisely evaluating the false discovery rate (FDR) are both essential for devising an effective method for identifying differentially expressed genes in microarray data. The t-type score proposed by Pan et al. (2003) succeeded in suppressing false positives by controlling the underestimation of variance but left the overestimation uncontrolled. For controlling the overestimation, we devised a new test statistic (variance stabilized t-type score) by placing shrunken sample variances of the James-Stein type in the denominator of the t-type score. Since the relative superiority of the mean and median FDRs was unclear in the widely adopted Significance Analysis of Microarrays (SAM), we conducted simulation studies to examine the performance of the variance stabilized t-type score and the characteristics of the two FDRs. The variance stabilized t-type score was generally better than or at least as good as the t-type score, irrespective of the sample size and proportion of differentially expressed genes. In terms of accuracy, the median FDR was superior to the mean FDR when the proportion of differentially expressed genes was large. The variance stabilized t-type score with the median FDR was applied to actual colorectal cancer data and yielded a reasonable result.


Sign in / Sign up

Export Citation Format

Share Document