combinatorial space
Recently Published Documents


TOTAL DOCUMENTS

39
(FIVE YEARS 20)

H-INDEX

5
(FIVE YEARS 1)

2022 ◽  
Vol 41 (2) ◽  
pp. 1-14
Author(s):  
Marco Livesu ◽  
Luca Pitzalis ◽  
Gianmarco Cherchi

Hexahedral meshes are a ubiquitous domain for the numerical resolution of partial differential equations. Computing a pure hexahedral mesh from an adaptively refined grid is a prominent approach to automatic hexmeshing, and requires the ability to restore the all hex property around the hanging nodes that arise at the interface between cells having different size. The most advanced tools to accomplish this task are based on mesh dualization. These approaches use topological schemes to regularize the valence of inner vertices and edges, such that dualizing the grid yields a pure hexahedral mesh. In this article, we study in detail the dual approach, and propose four main contributions to it: (i) We enumerate all the possible transitions that dual methods must be able to handle, showing that prior schemes do not natively cover all of them; (ii) We show that schemes are internally asymmetric, therefore not only their construction is ambiguous, but different implementative choices lead to hexahedral meshes with different singular structure; (iii) We explore the combinatorial space of dual schemes, selecting the minimum set that covers all the possible configurations and also yields the simplest singular structure in the output hexmesh; (iv) We enlarge the class of adaptive grids that can be transformed into pure hexahedral meshes, relaxing one of the tight topological requirements imposed by previous approaches. Our extensive experiments show that our transition schemes consistently outperform prior art in terms of ability to converge to a valid solution, amount and distribution of singular mesh edges, and element count. Last but not least, we publicly release our code and reveal a conspicuous amount of technical details that were overlooked in previous literature, lowering an entry barrier that was hard to overcome for practitioners in the field.


2021 ◽  
Author(s):  
Amir Pandi ◽  
Christoph Diehl ◽  
Ali Yazdizadeh Kharrazi ◽  
Lèon Faure ◽  
Scott A. Scholz ◽  
...  

The study, engineering and application of biological networks require practical and efficient approaches. Current optimization efforts of these systems are often limited by wet lab labor and cost, as well as the lack of convenient, easily adoptable computational tools. Aimed at democratization and standardization, we describe METIS, a modular and versatile active machine learning workflow with a simple online interface for the optimization of biological target functions with minimal experimental datasets. We demonstrate our workflow for various applications, from simple to complex gene circuits and metabolic networks, including several cell-free transcription and translation systems, a LacI-based multi-level controller and a 27-variable synthetic CO2-fixation cycle (CETCH cycle). Using METIS, we could improve above systems between one and two orders of magnitude compared to their original setup with minimal experimental efforts. For the CETCH cycle, we explored the combinatorial space of ~1025 conditions with only 1,000 experiments to yield the most efficient CO2-fixation cascade described to date. Beyond optimization, our workflow also quantifies the relative importance of individual factors to the performance of a system. This allows to identify so far unknown interactions and bottlenecks in complex systems, which paves the way for their hypothesis-driven improvement, which we demonstrate for the LacI multi-level controller that we were able to improve by 100-fold after having identified resource competition as limiting factor. Overall, our workflow opens the way for convenient optimization and prototyping of genetic and metabolic networks with customizable adjustments according to user experience, experimental setup, and laboratory facilities.


2021 ◽  
Author(s):  
Vladimir Gligorijevic ◽  
Daniel Berenberg ◽  
Stephen Ra ◽  
Andrew Watkins ◽  
Simon Kelow ◽  
...  

Protein design is challenging because it requires searching through a vast combinatorial space that is only sparsely functional. Self-supervised learning approaches offer the potential to navigate through this space more effectively and thereby accelerate protein engineering. We introduce a sequence denoising autoencoder (DAE) that learns the manifold of protein sequences from a large amount of potentially unlabelled proteins. This DAE is combined with a function predictor that guides sampling towards sequences with higher levels of desired functions. We train the sequence DAE on more than 20M unlabeled protein sequences spanning many evolutionarily diverse protein families and train the function predictor on approximately 0.5M sequences with known function labels. At test time, we sample from the model by iteratively denoising a sequence while exploiting the gradients from the function predictor. We present a few preliminary case studies of protein design that demonstrate the effectiveness of this proposed approach, which we refer to as "deep manifold sampling", including metal binding site addition, function-preserving diversification, and global fold change.


2021 ◽  
Vol 119 (1) ◽  
pp. e2109649118
Author(s):  
David H. Brookes ◽  
Amirali Aghazadeh ◽  
Jennifer Listgarten

Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the fitness datasets available to learn these functions are typically small relative to the large combinatorial space of sequences; characterizing how much data are needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we develop a framework to study the sparsity of fitness functions sampled from a generalization of the NK model, a widely used random field model of fitness functions. In particular, we present results that allow us to test the effect of the Generalized NK (GNK) model’s interpretable parameters—sequence length, alphabet size, and assumed interactions between sequence positions—on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. We validate our framework by demonstrating that GNK models with parameters set according to structural considerations can be used to accurately approximate the number of samples required to recover two empirical protein fitness functions and an RNA fitness function. In addition, we show that these GNK models identify important higher-order epistatic interactions in the empirical fitness functions using only structural information.


2021 ◽  
Vol 22 (18) ◽  
pp. 10163
Author(s):  
Lauren V. Cairns ◽  
Katrina M. Lappin ◽  
Alexander Mutch ◽  
Ahlam Ali ◽  
Kyle B. Matchett ◽  
...  

Paediatric acute myeloid leukaemia (AML) is a heterogeneous disease characterised by the malignant transformation of myeloid precursor cells with impaired differentiation. Standard therapy for paediatric AML has remained largely unchanged for over four decades and, combined with inadequate understanding of the biology of paediatric AML, has limited the progress of targeted therapies in this cohort. In recent years, the search for novel targets for the treatment of paediatric AML has accelerated in parallel with advanced genomic technologies which explore the mutational and transcriptional landscape of this disease. Exploiting the large combinatorial space of existing drugs provides an untapped resource for the identification of potential combination therapies for the treatment of paediatric AML. We have previously designed a multiplex screening strategy known as Multiplex Screening for Interacting Compounds in AML (MuSICAL); using an algorithm designed in-house, we screened all pairings of 384 FDA-approved compounds in less than 4000 wells by pooling drugs into 10 compounds per well. This approach maximised the probability of identifying new compound combinations with therapeutic potential while minimising cost, replication and redundancy. This screening strategy identified the triple combination of glimepiride, a sulfonylurea; pancuronium dibromide, a neuromuscular blocking agent; and vinblastine sulfate, a vinca alkaloid, as a potential therapy for paediatric AML. We envision that this approach can be used for a variety of disease-relevant screens allowing the efficient repurposing of drugs that can be rapidly moved into the clinic.


Author(s):  
Yuri Tanuma ◽  
Toru Maekawa ◽  
Chris Ewels

Hydrogenated small fullerenes (Cn, n<60) are of interest as potential astrochemical species, and as intermediates in hydrogen catalysed fullerene growth. However computational identification of key stable species is difficult due to the vast combinatorial space of structures. In this study we explore routes to predict stable hydrogenated small fullerenes. We show that neither local fullerene geometry nor local electronic structure analysis are able to correctly predict subsequent low energy hydrogenation sites, and indeed sequential stable addition searches also sometimes fail to identify most stable hydrogenated fullerene isomers. Of the empirical and semi-empirical methods tested, GFN2-xTB consistently gives highly accurate energy correlation (r>0.99) to full DFT-LDA calculations at a fraction of the computational cost. This allows identification of the most stable hydrogenated fullerenes up to 4H for four fullerenes, namely two isomers of C28 and C40, via “brute force” systematic testing of all symmetry inequivalent combinations. The approach shows promise for wider systematic studies of smaller hydrogenated fullerenes.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Pei-Pei Yang ◽  
Yi-Jing Li ◽  
Yan Cao ◽  
Lu Zhang ◽  
Jia-Qi Wang ◽  
...  

AbstractSelf-assembling peptides have shown tremendous potential in the fields of material sciences, nanoscience, and medicine. Because of the vast combinatorial space of even short peptides, identification of self-assembling sequences remains a challenge. Herein, we develop an experimental method to rapidly screen a huge array of peptide sequences for self-assembling property, using the one-bead one-compound (OBOC) combinatorial library method. In this approach, peptides on beads are N-terminally capped with nitro-1,2,3-benzoxadiazole, a hydrophobicity-sensitive fluorescence molecule. Beads displaying self-assembling peptides would fluoresce under aqueous environment. Using this approach, we identify eight pentapeptides, all of which are able to self-assemble into nanoparticles or nanofibers. Some of them are able to interact with and are taken up efficiently by HeLa cells. Intracellular distribution varied among these non-toxic peptidic nanoparticles. This simple screening strategy has enabled rapid identification of self-assembling peptides suitable for the development of nanostructures for various biomedical and material applications.


2021 ◽  
Author(s):  
Yuchi Qiu ◽  
Jian Hu ◽  
Guo-Wei Wei

Abstract Directed evolution (DE), a strategy for protein engineering, optimizes protein properties (i.e. fitness) by expensive and time-consuming screen or selection of a large combinatorial sequence space. Machine learning-assisted directed evolution (MLDE) that screens variant properties in silico can reduce the experimental burden. However, the MLDE utilizing small experimentally labeled training data from random sampling renders low global maximal fitness hitting rates. This work introduces a cluster learning-assisted directed evolution (CLADE) framework, particularly designed for systems without high-throughput screening assays, that combines sampling through hierarchical unsupervised clustering and supervised learning to guide protein engineering. Based on general biological information, CLADE splits the genetic combinatorial space into various subspaces with heterogeneous evolutionary traits, which guides the selection of experimental sampling sets and the subsequent building up of supervised learning training sets. By virtually screening two four-site combinatorial fitness landscapes from protein G domain B1 (GB1) and PhoQ, our CLADE consistently showed near 3-fold improvement on global maximal fitness hitting rate than using randomly sampled training data. Our CLADE can be easily applied to various biological systems and customized for systems with different throughput levels to maximize its accuracy and efficiency. It promises a significant impact to protein engineering.


2021 ◽  
Vol 18 (179) ◽  
pp. 20210348
Author(s):  
Alan R. Pacheco ◽  
Daniel Segrè

Despite a growing understanding of how environmental composition affects microbial communities, it remains difficult to apply this knowledge to the rational design of synthetic multispecies consortia. This is because natural microbial communities can harbour thousands of different organisms and environmental substrates, making up a vast combinatorial space that precludes exhaustive experimental testing and computational prediction. Here, we present a method based on the combination of machine learning and metabolic modelling that selects optimal environmental compositions to produce target community phenotypes. In this framework, dynamic flux balance analysis is used to model the growth of a community in candidate environments. A genetic algorithm is then used to evaluate the behaviour of the community relative to a target phenotype, and subsequently adjust the environment to allow the organisms to approach this target. We apply this iterative process to thousands of in silico communities of varying sizes, showing how it can rapidly identify environments that yield desired taxonomic compositions and patterns of metabolic exchange. Moreover, this combination of approaches produces testable predictions for the assembly of experimental microbial communities with specific properties and can facilitate rational environmental design processes for complex microbiomes.


2021 ◽  
Author(s):  
David H Brookes ◽  
Amirali Aghazadeh ◽  
Jennifer Listgarten

Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the amount of fitness data available to learn these functions is typically small relative to the large combinatorial space of sequences; characterizing how much data is needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we study the sparsity of fitness functions sampled from a generalization of the NK model, a widely-used random field model of fitness functions. In particular, we present theoretical results that allow us to test the effect of the Generalized NK (GNK) model's interpretable parameters---sequence length, alphabet size, and assumed interactions between sequence positions---on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. Further, we show that GNK fitness functions with parameters set according to protein structural contacts can be used to accurately approximate the number of samples required to estimate two empirical protein fitness functions, and are able to identify important higher-order epistatic interactions in these functions using only structural information.


Sign in / Sign up

Export Citation Format

Share Document