scholarly journals Explainability in transformer models for functional genomics

Author(s):  
Jim Clauwaert ◽  
Gerben Menschaert ◽  
Willem Waegeman

Abstract The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field.

2020 ◽  
Author(s):  
Jim Clauwaert ◽  
Gerben Menschaert ◽  
Willem Waegeman

AbstractThe effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally comprises the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present several methods that can be used to gather insights on biological processes that drive any genome annotation task. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of sub-units (attention heads) of the model are specialized towards identifying DNA binding sites. Working with a neural network trained to detect transcription start sites in E. coli, we successfully characterize both locations and consensus sequences of transcription factor binding sites, including both well-known and potentially novel elements involved in the initiation of the transcription process.


2020 ◽  
Vol 26 ◽  
Author(s):  
Xiaoping Min ◽  
Fengqing Lu ◽  
Chunyan Li

: Enhancer-promoter interactions (EPIs) in the human genome are of great significance to transcriptional regulation which tightly controls gene expression. Identification of EPIs can help us better deciphering gene regulation and understanding disease mechanisms. However, experimental methods to identify EPIs are constrained by the fund, time and manpower while computational methods using DNA sequences and genomic features are viable alternatives. Deep learning methods have shown promising prospects in classification and efforts that have been utilized to identify EPIs. In this survey, we specifically focus on sequence-based deep learning methods and conduct a comprehensive review of the literatures of them. We first briefly introduce existing sequence-based frameworks on EPIs prediction and their technique details. After that, we elaborate on the dataset, pre-processing means and evaluation strategies. Finally, we discuss the challenges these methods are confronted with and suggest several future opportunities.


Energies ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 1749
Author(s):  
Elzbieta Szychta ◽  
Leszek Szychta

Energy efficiency of systems of water pumping is a complex problem since efficiency of two distinct interacting systems needs to be combined: water and power supply. This paper introduces a non-intrusive method of calculating the so-called “collective losses” of a cage induction motor. The term “collective losses”, which the authors define, allows for accurate estimation of motor efficiency. Control system of a pump determines operating point of a pumping station, and thus its efficiency. General estimated performance characteristics of a motor, components of a control system, are assumed to serve selection of a range of pumping speed variations. Rotational speed has a direct effect on motor load torque, pump power and head, and thus on motor performance. Hellwig’s statistical method was used to specify characteristics of estimated collective losses on the basis of experimental studies of 21 motors rated at up to 2.2 kW. The results of simulations and experiments are used to verify validity and efficiency of the suggested method. The method is non-intrusive, simple to use, and requires minimum data.


Genome ◽  
2010 ◽  
Vol 53 (11) ◽  
pp. 1002-1016 ◽  
Author(s):  
B.R. Cullis ◽  
A.B. Smith ◽  
C.P. Beeck ◽  
W.A. Cowling

Exploring and exploiting variety by environment (V × E) interaction is one of the major challenges facing plant breeders. In paper I of this series, we presented an approach to modelling V × E interaction in the analysis of complex multi-environment trials using factor analytic models. In this paper, we develop a range of statistical tools which explore V × E interaction in this context. These tools include graphical displays such as heat-maps of genetic correlation matrices as well as so-called E-scaled uniplots that are a more informative alternative to the classical biplot for large plant breeding multi-environment trials. We also present a new approach to prediction for multi-environment trials that include pedigree information. This approach allows meaningful selection indices to be formed either for potential new varieties or potential parents.


1987 ◽  
Vol 7 (8) ◽  
pp. 2933-2940
Author(s):  
H Honkawa ◽  
W Masahashi ◽  
S Hashimoto ◽  
T Hashimoto-Gotoh

A number of deletion mutants were isolated, including 5', 3', and internal deletions in the 5'-flanking region of the human cellular oncogene related to the Harvey sarcoma virus (c-H-ras), and their transforming activities were examined in NIH 3T3 cells. DNA sequences which could not be detected without losing transforming activity were localized to a relatively short stretch upstream of the region which showed homology to the 5'-flanking region of v-H-ras oncogene. S1 nuclease analysis indicated that there were two clusters of mRNA start sites at positions that were about 1,371 and 1,298 base pairs upstream of the first coding ATG. The minimum region required for promoter function was estimated to be a 51-base-pair-long (or less) DNA segment. The promoter was GC rich (78%) and did not contain the consensus sequences that are usually observed in PolII-directed promoters but contained a GC box within which one of the mRNA start sites was included. In addition, two sets of positive and negative elements seemed to be located between the promoter and the protein-coding region, which appeared to influence positively and negatively, respectively, the efficiency of transformation with the c-H-ras oncogene.


2019 ◽  
Vol 24 (1) ◽  
pp. 147-169 ◽  
Author(s):  
Britta Søgaard ◽  
Heather Dawn Skipworth ◽  
Michael Bourlakis ◽  
Carlos Mena ◽  
Richard Wilding

PurposeThis paper aims to explore how purchasing could respond to disruptive technologies by examining the assumptions underlying purchasing strategic alignment and purchasing maturity through a contingency lens.Design/methodology/approachThis study uses a systematic review across purchasing maturity and purchasing strategic alignment literature. This is supplemented with exploratory case studies to include practitioners’ views.FindingsThis research demonstrates that neither purchasing maturity nor purchasing strategic alignment are suitable approaches to respond to disruptive technologies. Purchasing maturity does not allow purchasing managers to select relevant practices. It also shows no consideration of any contingencies, which practitioners highlight as important for the selection of practices. Purchasing strategic alignment includes the company strategy as a contingency but does not provide any practices to choose from. It does not include any other contextual contingencies considered important by practitioners. The findings indicate that linking the two research streams may provide a more suitable approach to responding to disruptive technologies.Research limitations/implicationsThis research demonstrates the requirement to develop a new approach to responding to disruptive technologies, by linking purchasing maturity and purchasing strategic alignment to contextual contingencies. This is a currently unexplored approach in academic literature, which refutes the generally accepted premise that higher maturity unilaterally supports a better positioning towards technological disruption. This research also highlights a requirement for practitioners to shift their approach to “best practices”.Originality/valueThis is the first research to systematically review the relationships between purchasing maturity and purchasing strategic alignment. It adds to contingency theory by suggesting that purchasing maturity models can support the achievement of strategic alignment. Also, future research directions are suggested to explore these relationships.


1981 ◽  
Vol 27 (3) ◽  
pp. 405-421 ◽  
Author(s):  
Dan A. Lewis ◽  
Greta Salem

Crime prevention strategies often aim at changing the motivations and predispositions of offenders. A new approach has developed within the last dec ade which focuses on changing the behavior of potential victims. The authors explore the theoretical foundations of the new strategies for reducing crime, commonly known as community crime prevention. They suggest that the in novation is a result of a major shift in the research paradigm for studying the effects of crime. The orientation underlying community crime prevention is labeled the "victimization perspective." Following a description of some limitations in that perspective, the authors offer, as an alternative, a perspective oriented toward social control. The social control perspective, which is based on the empirical findings of several recently completed research projects, offers a theoretical foundation both for a fresh approach to the study of the effects of crime and for the development of policies for community crime prevention.


2005 ◽  
Vol 360 (1460) ◽  
pp. 1597-1603 ◽  
Author(s):  
Maria De Iorio ◽  
Eric de Silva ◽  
Michael P.H Stumpf

The variation of the recombination rate along chromosomal DNA is one of the important determinants of the patterns of linkage disequilibrium. A number of inferential methods have been developed which estimate the recombination rate and its variation from population genetic data. The majority of these methods are based on modelling the genealogical process underlying a sample of DNA sequences and thus explicitly include a model of the demographic process. Here we propose a different inferential procedure based on a previously introduced framework where recombination is modelled as a point process along a DNA sequence. The approach infers regions containing putative hotspots based on the inferred minimum number of recombination events; it thus depends only indirectly on the underlying population demography. A Poisson point process model with local rates is then used to infer patterns of recombination rate estimation in a fully Bayesian framework. We illustrate this new approach by applying it to several population genetic datasets, including a region with an experimentally confirmed recombination hotspot.


Sign in / Sign up

Export Citation Format

Share Document