scholarly journals ezGeno: An Automatic Model Selection Package for Genomic Data Analysis

2020 ◽  
Author(s):  
Jun-Liang Lin ◽  
Tsung-Ting Hsieh ◽  
Yi-An Tung ◽  
Xuan-Jun Chen ◽  
Yu-Chun Hsiao ◽  
...  

AbstractTo facilitate the process of tailor-making a deep neural network for exploring the dynamics of genomic DNA, we have developed a hands-on package called ezGeno that automates the search process of various parameters and network structure. ezGeno considers three different sets of search spaces, namely, the number of filters, dilation factors, and the connectivity between different layers. ezGeno can be applied to any kind of 1D genomic input such as genomic sequences, histone modifications, DNase feature data and so on. Combinations of multiple abovementioned 1D features are also applicable. Specifically, for the task of predicting TF binding using genomic sequences as the input, ezGeno can consistently return the best performing set of parameters and network structure, as well as highlight the important segments within the original sequences. For the task of predicting tissue-specific enhancer activity using both sequence and DNase feature data as the input, ezGeno also regularly outperforms the hand-designed models. In this study, we demonstrate that ezGeno is superior in efficiency and accuracy when compared to AutoKeras, a general open-source AutoML package. The average AUC of ezGeno is also consistently higher than the result of using a one-layer DeepBind model. With the flexibility of ezGeno, we expect that this package can provide future researchers not only support of model design in their analysis of genomic studies but also more insights into the regulatory landscape.AvailabilityThe ezGeno package can be freely accessed at https://github.com/ailabstw/ezGeno.ContactDr. Chien-Yu Chen, [email protected]

2021 ◽  
pp. 104225872110335
Author(s):  
Jake Duke ◽  
Taha Havakhor ◽  
Rachel Mui ◽  
Owen Parker

Building on the behavioral theory of the firm, we empirically examine how starting strategies and syndication networks can influence venture capital (VC) firms’ problemistic search. We propose that: (a) depending on a VC’s strategic starting point, that is, the VC’s extent of specialization, the directionality of problemistic search may change to either expanding or contracting search activities; and (b) depending on search direction, structural holes in syndication networks can either impede or facilitate the problemistic search process. In a sample of U.S. VC firms, we find results consistent with our predictions, which have important implications for entrepreneurship and organizational strategy research.


2021 ◽  
Author(s):  
Kaixian Yu ◽  
Zihan Cui ◽  
Xin Sui ◽  
Xing Qiu ◽  
Jinfeng Zhang

Abstract Bayesian networks (BNs) provide a probabilistic, graphical framework for modeling high-dimensional joint distributions with complex correlation structures. BNs have wide applications in many disciplines, including biology, social science, finance and biomedical science. Despite extensive studies in the past, network structure learning from data is still a challenging open question in BN research. In this study, we present a sequential Monte Carlo (SMC)-based three-stage approach, GRowth-based Approach with Staged Pruning (GRASP). A double filtering strategy was first used for discovering the overall skeleton of the target BN. To search for the optimal network structures we designed an adaptive SMC (adSMC) algorithm to increase the quality and diversity of sampled networks which were further improved by a third stage to reclaim edges missed in the skeleton discovery step. GRASP gave very satisfactory results when tested on benchmark networks. Finally, BN structure learning using multiple types of genomics data illustrates GRASP’s potential in discovering novel biological relationships in integrative genomic studies.


Methods ◽  
2013 ◽  
Vol 62 (3) ◽  
pp. 216-225 ◽  
Author(s):  
Minaka Ishibashi ◽  
Alejandro S. Mechaly ◽  
Thomas S. Becker ◽  
Silke Rinkwitz

2019 ◽  
Author(s):  
Katherine A. Alexander ◽  
María J. García-García

ABSTRACTImprinting at the Dlk1-Dio3 cluster is controlled by the IG-DMR, an imprinting control region differentially methylated between maternal and paternal chromosomes. The maternal IG-DMR is essential for imprinting control, functioning as a cis enhancer element. Meanwhile, DNA methylation at the paternal IG-DMR is thought to prevent enhancer activity. To explore whether suppression of enhancer activity at the methylated IG-DMR requires the transcriptional repressor TRIM28, we analyzed Trim28chatwo embryos and performed epistatic experiments with IG-DMR deletion mutants. We found that while TRIM28 regulates the enhancer properties of the paternal IG-DMR, it also controls imprinting through other mechanisms. Additionally, we found that the paternal IG-DMR, previously deemed dispensable for imprinting, is required in certain tissues, demonstrating that imprinting is regulated in a tissue-specific manner. Using PRO-seq to analyze nascent transcription, we identified 30 novel transcribed regulatory elements, including 23 that are tissue-specific. These results demonstrate that different tissues have a distinctive regulatory landscape at the Dlk1-Dio3 cluster and provide insight into potential mechanisms of tissue-specific imprinting control. Together, our findings challenge the premise that Dlk1-Dio3 imprinting is regulated through a single mechanism and demonstrate that different tissues use distinct strategies to accomplish imprinted gene expression.


2017 ◽  
Author(s):  
Camille Berthelot ◽  
Diego Villar ◽  
Julie E. Horvath ◽  
Duncan T. Odom ◽  
Paul Flicek

AbstractTo gain insight into how mammalian gene expression is controlled by rapidly evolving regulatory elements, we jointly analysed promoter and enhancer activity with downstream transcription levels in liver samples from twenty species. Genes associated with complex regulatory landscapes generally exhibit high expression levels that remain evolutionarily stable. While the number of regulatory elements is the key driver of transcriptional output and resilience, regulatory conservation matters: elements active across mammals most effectively stabilise gene expression. In contrast, recently-evolved enhancers typically contribute weakly, consistent with their high evolutionary plasticity. These effects are observed across the entire mammalian clade and robust to potential confounders, such as gene expression level. Overall, our results illuminate how the evolutionary stability of gene expression is profoundly entwined with both the number and conservation of surrounding promoters and enhancers.HighlightsGene expression levels and stability are linked to the number of elements in the regulatory landscape.Conserved regulatory elements associate with tightly controlled, highly expressed genes.Recently evolved enhancers weakly influence gene expression, but promoters are similarly active regardless of conservation.The interplay between complexity of the regulatory landscape and conservation of individual promoters and enhancers shapes gene expression in mammals.


2020 ◽  
Vol 6 (49) ◽  
pp. eabe2955
Author(s):  
Yann Le Poul ◽  
Yaqun Xin ◽  
Liucong Ling ◽  
Bettina Mühling ◽  
Rita Jaenichen ◽  
...  

Developmental enhancers control the expression of genes prefiguring morphological patterns. The activity of an enhancer varies among cells of a tissue, but collectively, expression levels in individual cells constitute a spatial pattern of gene expression. How the spatial and quantitative regulatory information is encoded in an enhancer sequence is elusive. To link spatial pattern and activity levels of an enhancer, we used systematic mutations of the yellow spot enhancer, active in developing Drosophila wings, and tested their effect in a reporter assay. Moreover, we developed an analytic framework based on the comprehensive quantification of spatial reporter activity. We show that the quantitative enhancer activity results from densely packed regulatory information along the sequence, and that a complex interplay between activators and multiple tiers of repressors carves the spatial pattern. Our results shed light on how an enhancer reads and integrates trans-regulatory landscape information to encode a spatial quantitative pattern.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 444
Author(s):  
Zhao Yang ◽  
Shengbing Zhang ◽  
Ruxu Li ◽  
Chuxi Li ◽  
Miao Wang ◽  
...  

With the development of deep learning technologies and edge computing, the combination of them can make artificial intelligence ubiquitous. Due to the constrained computation resources of the edge device, the research in the field of on-device deep learning not only focuses on the model accuracy but also on the model efficiency, for example, inference latency. There are many attempts to optimize the existing deep learning models for the purpose of deploying them on the edge devices that meet specific application requirements while maintaining high accuracy. Such work not only requires professional knowledge but also needs a lot of experiments, which limits the customization of neural networks for varied devices and application scenarios. In order to reduce the human intervention in designing and optimizing the neural network structure, multi-objective neural architecture search methods that can automatically search for neural networks featured with high accuracy and can satisfy certain hardware performance requirements are proposed. However, the current methods commonly set accuracy and inference latency as the performance indicator during the search process, and sample numerous network structures to obtain the required neural network. Lacking regulation to the search direction with the search objectives will generate a large number of useless networks during the search process, which influences the search efficiency to a great extent. Therefore, in this paper, an efficient resource-aware search method is proposed. Firstly, the network inference consumption profiling model for any specific device is established, and it can help us directly obtain the resource consumption of each operation in the network structure and the inference latency of the entire sampled network. Next, on the basis of the Bayesian search, a resource-aware Pareto Bayesian search is proposed. Accuracy and inference latency are set as the constraints to regulate the search direction. With a clearer search direction, the overall search efficiency will be improved. Furthermore, cell-based structure and lightweight operation are applied to optimize the search space for further enhancing the search efficiency. The experimental results demonstrate that with our method, the inference latency of the searched network structure reduced 94.71% without scarifying the accuracy. At the same time, the search efficiency increased by 18.18%.


2021 ◽  
Vol 12 ◽  
Author(s):  
Kaixian Yu ◽  
Zihan Cui ◽  
Xin Sui ◽  
Xing Qiu ◽  
Jinfeng Zhang

Bayesian networks (BNs) provide a probabilistic, graphical framework for modeling high-dimensional joint distributions with complex correlation structures. BNs have wide applications in many disciplines, including biology, social science, finance and biomedical science. Despite extensive studies in the past, network structure learning from data is still a challenging open question in BN research. In this study, we present a sequential Monte Carlo (SMC)-based three-stage approach, GRowth-based Approach with Staged Pruning (GRASP). A double filtering strategy was first used for discovering the overall skeleton of the target BN. To search for the optimal network structures we designed an adaptive SMC (adSMC) algorithm to increase the quality and diversity of sampled networks which were further improved by a third stage to reclaim edges missed in the skeleton discovery step. GRASP gave very satisfactory results when tested on benchmark networks. Finally, BN structure learning using multiple types of genomics data illustrates GRASP’s potential in discovering novel biological relationships in integrative genomic studies.


2016 ◽  
Vol 113 (48) ◽  
pp. E7720-E7729 ◽  
Author(s):  
Ruben Schep ◽  
Anamaria Necsulea ◽  
Eddie Rodríguez-Carballo ◽  
Isabel Guerreiro ◽  
Guillaume Andrey ◽  
...  

VertebrateHoxgenes encode transcription factors operating during the development of multiple organs and structures. However, the evolutionary mechanism underlying this remarkable pleiotropy remains to be fully understood. Here, we show thatHoxd8andHoxd9, two genes of theHoxDcomplex, are transcribed during mammary bud (MB) development. However, unlike in other developmental contexts, their coexpression does not rely on the same regulatory mechanism.Hoxd8is regulated by the combined activity of closely located sequences and the most distant telomeric gene desert. On the other hand,Hoxd9is controlled by an enhancer-rich region that is also located within the telomeric gene desert but has no impact onHoxd8transcription, thus constituting an exception to the global regulatory logic systematically observed at this locus. The latter DNA region is also involved inHoxdgene regulation in other contexts and strongly interacts withHoxd9in all tissues analyzed thus far, indicating that its regulatory activity was already operational before the appearance of mammary glands. Within this DNA region and neighboring a strong limb enhancer, we identified a short sequence conserved in therian mammals and capable of enhancer activity in the MBs. We propose thatHoxdgene regulation in embryonic MBs evolved by hijacking a preexisting regulatory landscape that was already at work before the emergence of mammals in structures such as the limbs or the intestinal tract.


2019 ◽  
Author(s):  
Kaixian Yu ◽  
Zihan Cui ◽  
Xing Qiu ◽  
Jinfeng Zhang

AbstractBayesian networks (BNs) provide a probabilistic, graphical framework for modeling high-dimensional joint distributions with complex dependence structures. BNs can be used to infer complex biological networks using heterogeneous data from different sources with missing values. Despite extensive studies in the past, network structure learning from data is still a challenging open question in BN research. In this study, we present a sequential Monte Carlo (SMC) based three-stage approach, GRowth-based Approach with Staged Pruning (GRASP). A double filtering strategy was first used for discovering the overall skeleton of the target BN. To search for the optimal network structures we designed an adaptive SMC (adSMC) algorithm to increase the diversity of sampled networks which were further improved by a new stage to reclaim edges missed in the skeleton discovery step. GRASP gave very satisfactory results when tested on benchmark networks. Finally, BN structure learning using multiple types of genomics data illustrates GRASP’s potential in discovering novel biological relationships in integrative genomic studies.


Sign in / Sign up

Export Citation Format

Share Document