yeast dataset
Recently Published Documents


TOTAL DOCUMENTS

11
(FIVE YEARS 4)

H-INDEX

2
(FIVE YEARS 1)

BMC Genomics ◽  
2020 ◽  
Vol 21 (S10) ◽  
Author(s):  
Ju Hun Choi ◽  
Taegun Kim ◽  
Junghyun Jung ◽  
Jong Wha J. Joo

Abstract Background Regulatory hotspots are genetic variations that may regulate the expression levels of many genes. It has been of great interest to find those hotspots utilizing expression quantitative trait locus (eQTL) analysis. However, it has been reported that many of the findings are spurious hotspots induced by various unknown confounding factors. Recently, methods utilizing complicated statistical models have been developed that successfully identify genuine hotspots. Next-generation Intersample Correlation Emended (NICE) is one of the methods that show high sensitivity and low false-discovery rate in finding regulatory hotspots. Even though the methods successfully find genuine hotspots, they have not been widely used due to their non-user-friendly interfaces and complex running processes. Furthermore, most of the methods are impractical due to their prohibitively high computational complexity. Results To overcome the limitations of existing methods, we developed a fully automated web-based tool, referred to as NICER (NICE Renew), which is based on NICE program. First, we dramatically reduced running and installing burden of NICE. Second, we significantly reduced running time by incorporating multi-processing. Third, besides our web-based NICER, users can use NICER on Google Compute Engine and can readily install and run the NICER web service on their local computers. Finally, we provide different input formats and visualizations tools to show results. Utilizing a yeast dataset, we show that NICER can be successfully used in an eQTL analysis to identify many genuine regulatory hotspots, for which more than half of the hotspots were previously reported elsewhere. Conclusions Even though many hotspot analysis tools have been proposed, they have not been widely used for many practical reasons. NICER is a fully-automated web-based solution for eQTL mapping and regulatory hotspots analysis. NICER provides a user-friendly interface and has made hotspot analysis more viable by reducing the running time significantly. We believe that NICER will become the method of choice for increasing power of eQTL hotspot analysis.


2020 ◽  
Vol 36 (12) ◽  
pp. 3803-3810
Author(s):  
Jia Wen ◽  
Colby T Ford ◽  
Daniel Janies ◽  
Xinghua Shi

Abstract Motivation Epistasis reflects the distortion on a particular trait or phenotype resulting from the combinatorial effect of two or more genes or genetic variants. Epistasis is an important genetic foundation underlying quantitative traits in many organisms as well as in complex human diseases. However, there are two major barriers in identifying epistasis using large genomic datasets. One is that epistasis analysis will induce over-fitting of an over-saturated model with the high-dimensionality of a genomic dataset. Therefore, the problem of identifying epistasis demands efficient statistical methods. The second barrier comes from the intensive computing time for epistasis analysis, even when the appropriate model and data are specified. Results In this study, we combine statistical techniques and computational techniques to scale up epistasis analysis using Empirical Bayesian Elastic Net (EBEN) models. Specifically, we first apply a matrix manipulation strategy for pre-computing the correlation matrix and pre-filter to narrow down the search space for epistasis analysis. We then develop a parallelized approach to further accelerate the modeling process. Our experiments on synthetic and empirical genomic data demonstrate that our parallelized methods offer tens of fold speed up in comparison with the classical EBEN method which runs in a sequential manner. We applied our parallelized approach to a yeast dataset, and we were able to identify both main and epistatic effects of genetic variants associated with traits such as fitness. Availability and implementation The software is available at github.com/shilab/parEBEN.


2019 ◽  
Vol 35 (24) ◽  
pp. 5078-5085 ◽  
Author(s):  
Javad Ansarifar ◽  
Lizhi Wang

AbstractMotivationEpistasis, which is the phenomenon of genetic interactions, plays a central role in many scientific discoveries. However, due to the combinatorial nature of the problem, it is extremely challenging to decipher the exact combinations of genes that trigger the epistatic effects. Many existing methods only focus on two-way interactions. Some of the most effective methods used machine learning techniques, but many were designed for special case-and-control studies or suffer from overfitting. We propose three new algorithms for multi-effect and multi-way epistases detection, with one guaranteeing global optimality and the other two being local optimization oriented heuristics.ResultsThe computational performance of the proposed heuristic algorithm was compared with several state-of-the-art methods using a yeast dataset. Results suggested that searching for the global optimal solution could be extremely time consuming, but the proposed heuristic algorithm was much more effective and efficient than others at finding a close-to-optimal solution. Moreover, it was able to provide biological insight on the exact configurations of epistases, besides achieving a higher prediction accuracy than the state-of-the-art methods.Availability and implementationData source was publicly available and details are provided in the text.


2019 ◽  
Author(s):  
Sean R. Hackett ◽  
Edward A. Baltz ◽  
Marc Coram ◽  
Bernd J. Wranik ◽  
Griffin Kim ◽  
...  

AbstractWe present an approach for inferring genome-wide regulatory causality and demonstrate its application on a yeast dataset constructed by independently inducing hundreds of transcription factors and measuring timecourses of the resulting gene expression responses. We discuss the regulatory cascades in detail for a single transcription factor, Aft1; however, we have 201 TF induction timecourses that include >100,000 signal-containing dynamic responses. From a single TF induction timecourse we can often discriminate the direct from the indirect effects of the induced TF. Across our entire dataset, however, we find that the majority of expression changes are indirectly driven by unknown regulators. By integrating all timecourses into a single whole-cell transcriptional model, potential regulators of each gene can be predicted without incorporating prior information. In doing so, the indirect effects of a TF are understood as a series of direct regulatory predictions that capture how regulation propagates over time to create a causal regulatory network. This approach, which we call CANDID (Causal Attribution Networks Driven by Induction Dynamics), resulted in the prediction of multiple transcriptional regulators that were validated experimentally.


2016 ◽  
Vol 66 (2) ◽  
pp. 113 ◽  
Author(s):  
Ashok P. ◽  
G.M Kadhar Nawaz

<p>Rough set theory is used to handle uncertainty and incomplete information by applying two sets, lower and upper approximation. In this paper, the clustering process is improved by adapting the preliminary centroid selection method on rough K-means (RKM) algorithm. The entropy based rough K-means (ERKM) method is developed by adapting entropy based preliminary centroids selection on RKM and executed and also validated by cluster validity indexes. An example shows that the ERKM performs effectively by selection of entropy based preliminary centroid. In addition, Outlier detection is an important task in data mining and very much different from the rest of the objects in the cluster. Entropy based rough outlier factor (EROF) method is used to detect outlier effectively for yeast dataset. An example shows that EROF detects outlier effectively on protein localisation sites and ERKM clustering algorithm performed effectively. Further, experimental readings show that the ERKM and EROF method outperformed the other methods.</p><p> </p>


2014 ◽  
Vol 07 (01) ◽  
pp. 1450018 ◽  
Author(s):  
S. R. KANNAN ◽  
S. RAMTHILAGAM ◽  
R. DEVI ◽  
T. P. HONG

Finding subtypes of cancer in breast cancer database is an extremely difficult task because of heavy noise by measurement error. Most of the recent clustering techniques for breast cancer database to achieve cancerous and noncancerous often weigh down the interpretability of the structure. Hence, this paper tries to find effective Fuzzy C-Means-based clustering techniques to identify the proper subtypes of cancer in breast cancer database. This paper obtains the objective function of effective Fuzzy C-Means clustering techniques by incorporating the kernel induced distance function, Renyi's entropy function, weighted distance measure and neighborhood terms-based spatial context. The effectiveness of the proposed methods are proved through the experimental works on Lung cancer database, IRIS dataset, Wine dataset, Checkerboard dataset, Time Series dataset and Yeast dataset. Finally, the proposed methods are implemented successfully to cluster the breast cancer database into cancerous and noncancerous. The clustering accuracy has been validated through error matrix and silhouette method.


2012 ◽  
Vol 2012 ◽  
pp. 1-7
Author(s):  
Enrico Capobianco

Static representations of protein interactions networks or PIN reflect measurements referred to a variety of conditions, including time. To partially bypass such limitation, gene expression information is usually integrated in the network to measure its “activity level.” In general, the entire PIN modular organization (complexes, pathways) can reveal changes of configuration whose functional significance depends on biological annotation. However, since network dynamics are based on the presence of different conditions leading to comparisons between normal and disease states, or between networks observed sequentially in time, our working hypothesis refers to the analysis of differential networks based on varying modularity and uncertainty. Two popular methods were applied and evaluated, k-core and Q-modularity, over a reference yeast dataset comprising a PIN of literature-curated data obtained from the fusion of heterogeneous measurements sources. While the functional aspect of interest is cell cycle and the corresponding interactions were isolated, the PIN dynamics were externally induced by time-course measured gene expression values, which we consider one of the “modularity drivers.” Notably, due to the nature of such expression values referred to the “just-in-time method,” we could specialize our approach according to three constrained modular configurations then comparatively assessed through local entropy measures.


Sign in / Sign up

Export Citation Format

Share Document