scholarly journals On the sparsity of fitness functions and implications for learning

2021 ◽  
Author(s):  
David H Brookes ◽  
Amirali Aghazadeh ◽  
Jennifer Listgarten

Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the amount of fitness data available to learn these functions is typically small relative to the large combinatorial space of sequences; characterizing how much data is needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we study the sparsity of fitness functions sampled from a generalization of the NK model, a widely-used random field model of fitness functions. In particular, we present theoretical results that allow us to test the effect of the Generalized NK (GNK) model's interpretable parameters---sequence length, alphabet size, and assumed interactions between sequence positions---on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. Further, we show that GNK fitness functions with parameters set according to protein structural contacts can be used to accurately approximate the number of samples required to estimate two empirical protein fitness functions, and are able to identify important higher-order epistatic interactions in these functions using only structural information.

2021 ◽  
Vol 119 (1) ◽  
pp. e2109649118
Author(s):  
David H. Brookes ◽  
Amirali Aghazadeh ◽  
Jennifer Listgarten

Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the fitness datasets available to learn these functions are typically small relative to the large combinatorial space of sequences; characterizing how much data are needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we develop a framework to study the sparsity of fitness functions sampled from a generalization of the NK model, a widely used random field model of fitness functions. In particular, we present results that allow us to test the effect of the Generalized NK (GNK) model’s interpretable parameters—sequence length, alphabet size, and assumed interactions between sequence positions—on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. We validate our framework by demonstrating that GNK models with parameters set according to structural considerations can be used to accurately approximate the number of samples required to recover two empirical protein fitness functions and an RNA fitness function. In addition, we show that these GNK models identify important higher-order epistatic interactions in the empirical fitness functions using only structural information.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
María Elena Domínguez-Jiménez ◽  
David Luengo ◽  
Gabriela Sansigre-Vidal

The problem of channel estimation for multicarrier communications is addressed. We focus on systems employing the Discrete Cosine Transform Type-I (DCT1) even at both the transmitter and the receiver, presenting an algorithm which achieves an accurate estimation of symmetric channel filters using only a small number of training symbols. The solution is obtained by using either matrix inversion or compressed sensing algorithms. We provide the theoretical results which guarantee the validity of the proposed technique for the DCT1. Numerical simulations illustrate the good behaviour of the proposed algorithm.


Textiles ◽  
2021 ◽  
Vol 1 (2) ◽  
pp. 322-336
Author(s):  
Julia Orlik ◽  
Maxime Krier ◽  
David Neusius ◽  
Kathrin Pietsch ◽  
Olena Sivak ◽  
...  

In many textiles and fiber structures, the behavior of the material is determined by the structural arrangements of the fibers, their thickness and cross-section, as well as their material properties. Textiles are thin plates made of thin long yarns in frictional contact with each other that are connected via a rule defined by a looping diagram. The yarns themselves are stretchable or non-stretchable. All these structural parameters of a textile define its macroscopic behavior. Its folding is determined by all these parameters and the kind of the boundary fixation or loading direction. The next influencing characteristic is the value of the loading. The same textile can behave similar to a shell and work just for bending, or behave as a membrane with large tension deformations under different magnitudes of the loading forces. In our research, bounds on the loading and frictional parameters for both types of behavior are found. Additionally, algorithms for the computation of effective textile properties based on the structural information are proposed. Further focus of our research is the nature of folding, induced by pre-strain in yarns and some in-plane restriction of the textile movements, or by the local knitting or weaving pattern and the yarn’s cross-sections. Further investigations concern different applications with spacer fabrics. Structural parameters influencing the macroscopic fabric behavior are investigated and a way for optimization is proposed. An overview of our published mathematical and numerical papers with developed algorithms is given and our numerical tools based on these theoretical results are demonstrated.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Amirali Aghazadeh ◽  
Hunter Nisonoff ◽  
Orhan Ocal ◽  
David H. Brookes ◽  
Yijie Huang ◽  
...  

AbstractDespite recent advances in high-throughput combinatorial mutagenesis assays, the number of labeled sequences available to predict molecular functions has remained small for the vastness of the sequence space combined with the ruggedness of many fitness functions. While deep neural networks (DNNs) can capture high-order epistatic interactions among the mutational sites, they tend to overfit to the small number of labeled sequences available for training. Here, we developed Epistatic Net (EN), a method for spectral regularization of DNNs that exploits evidence that epistatic interactions in many fitness functions are sparse. We built a scalable extension of EN, usable for larger sequences, which enables spectral regularization using fast sparse recovery algorithms informed by coding theory. Results on several biological landscapes show that EN consistently improves the prediction accuracy of DNNs and enables them to outperform competing models which assume other priors. EN estimates the higher-order epistatic interactions of DNNs trained on massive sequence spaces-a computational problem that otherwise takes years to solve.


2020 ◽  
Vol 12 (24) ◽  
pp. 4040
Author(s):  
Ke Xu ◽  
Jingchao Zhang ◽  
Huaimin Li ◽  
Weixing Cao ◽  
Yan Zhu ◽  
...  

The accurate estimation of nitrogen accumulation is of great significance to nitrogen fertilizer management in wheat production. To overcome the shortcomings of spectral technology, which ignores the anisotropy of canopy structure when predicting the nitrogen accumulation in wheat, resulting in low accuracy and unstable prediction results, we propose a method for predicting wheat nitrogen accumulation based on the fusion of spectral and canopy structure features. After depth images are repaired using a hole-filling algorithm, RGB images and depth images are fused through IHS transformation, and textural features of the fused images are then extracted in order to express the three-dimensional structural information of the canopy. The fused images contain depth information of the canopy, which breaks through the limitation of extracting canopy structure features from a two-dimensional image. By comparing the experimental results of multiple regression analyses and BP neural networks, we found that the characteristics of the canopy structure effectively compensated for the model prediction of nitrogen accumulation based only on spectral characteristics. Our prediction model displayed better accuracy and stability, with prediction accuracy values (R2) based on BP neural network for the leaf layer nitrogen accumulation (LNA) and shoot nitrogen accumulation (SNA) during a full growth period of 0.74 and 0.73, respectively, and corresponding relative root mean square errors (RRMSEs) of 40.13% and 35.73%.


1993 ◽  
Vol 1 (4) ◽  
pp. 335-360 ◽  
Author(s):  
Heinz Mühlenbein ◽  
Dirk Schlierkamp-Voosen

The breeder genetic algorithm (BGA) models artificial selection as performed by human breeders. The science of breeding is based on advanced statistical methods. In this paper a connection between genetic algorithm theory and the science of breeding is made. We show how the response to selection equation and the concept of heritability can be applied to predict the behavior of the BGA. Selection, recombination, and mutation are analyzed within this framework. It is shown that recombination and mutation are complementary search operators. The theoretical results are obtained under the assumption of additive gene effects. For general fitness landscapes, regression techniques for estimating the heritability are used to analyze and control the BGA. The method of decomposing the genetic variance into an additive and a nonadditive part connects the case of additive fitness functions with the general case.


2019 ◽  
Vol 25 (2) ◽  
pp. 301-308 ◽  
Author(s):  
Isabelle Mouton ◽  
Shyam Katnagallu ◽  
Surendra Kumar Makineni ◽  
Oana Cojocaru-Mirédin ◽  
Torsten Schwarz ◽  
...  

AbstractAlthough atom probe tomography (APT) reconstructions do not directly influence the local elemental analysis, any structural inferences from APT volumes demand a reliable reconstruction of the point cloud. Accurate estimation of the reconstruction parameters is crucial to obtain reliable spatial scaling. In the current work, a new automated approach of calibrating atom probe reconstructions is developed using only one correlative projection electron microscopy (EM) image. We employed an algorithm that implements a 2D cross-correlation of microstructural features observed in both the APT reconstructions and the corresponding EM image. We apply this protocol to calibrate reconstructions in a Cu(In,Ga)Se2-based semiconductor and in a Co-based superalloy. This work enables us to couple chemical precision to structural information with relative ease.


2012 ◽  
Vol 60 (2) ◽  
pp. 101-114 ◽  
Author(s):  
Jana Pařílková ◽  
Jaromír Říha ◽  
Zbyněk Zachoval

The Influence of Roughness on the Discharge Coefficient of a Broad-Crested Weir The use of environmentally-friendly materials in hydraulic engineering (e.g. the stone lining of weirs at levees) calls for the more accurate estimation of the discharge coefficient for broad-crested weirs with a rough crest surface. However, in the available literature sources the discharge coefficient of broad-crested weirs is usually expressed for a smooth crest. The authors of this paper have summarized the theoretical knowledge related to the effect of weir crest surface roughness on the discharge coefficient. The method of determination of the head-discharge relation for broad-crested weirs with a rough crest surface is proposed based on known discharge coefficient values for smooth surfaces and on the roughness parameters of the weir. For selected scenarios the theoretical results were compared with experimental research carried out at the Laboratory of Water Management Research, Faculty of Civil Engineering (FCE), Brno University of Technology (BUT).


2017 ◽  
Vol 2017 ◽  
pp. 1-16
Author(s):  
Yue Wang ◽  
Liang Chen ◽  
Li Zhang ◽  
Haifeng Li ◽  
Huaihu Cao ◽  
...  

We make a detailed study on carrier sensing of 802.11 in Nakagami fading channels. We prove that to maximize sensing accuracy, the optimal channel accessing probability is solely determined by the path-loss SIR (Signal to Interference Ratio). We define pfail-interference range and pbusy-carrier sensing range for fading channels and prove that their scaling laws in Nakagami fading channels are similar to those in the static channel. The newly derived theoretical results show a unified property between the static and fading channels. By extensive simulations, we reveal that fading depresses the probability of a dominating transmission state, and therefore it can mitigate severe hidden and exposed terminal problems, but fading harms the average sensing accuracy for an optimally adjusted carrier sensing threshold.


1999 ◽  
Vol 32 (5) ◽  
pp. 963-967 ◽  
Author(s):  
Angela Altomare ◽  
Carmelo Giacovazzo ◽  
Antonietta Guagliardi ◽  
Anna Grazia Giuseppine Moliterni ◽  
Rosanna Rizzi

In direct procedures for crystal structure solution from powder data, information on the location and orientation of a molecular fragment may readily become available. Such information may be used retrospectively to improve the powder-pattern decomposition, with favourable effects on the phasing process. A method is described by which accurate estimation of a large number of structure-factor moduli is possible by exploiting the prior partial structural information.


Sign in / Sign up

Export Citation Format

Share Document