sequence design
Recently Published Documents


TOTAL DOCUMENTS

521
(FIVE YEARS 98)

H-INDEX

35
(FIVE YEARS 7)

2021 ◽  
Vol 65 ◽  
pp. 18-27
Author(s):  
Zachary Wu ◽  
Kadina E. Johnston ◽  
Frances H. Arnold ◽  
Kevin K. Yang

2021 ◽  
Author(s):  
Chengxi Li ◽  
Genwei Zhang ◽  
Somesh Mohapatra ◽  
Alex Callahan ◽  
Andrei Loas ◽  
...  

Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, we leverage here machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data was collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. Our optimized ML model allows for 93% prediction accuracy and 0.97 Pearson’s r. The predicted synthesis scores were validated to be correlated with the experimental HPLC crude purities (correlation coefficient R2 = 0.95). Furthermore, we demonstrated a general applicability of ML through designing synthetically accessible antisense PNA sequences from 102,315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS-CoV-2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for rational PNA sequence design.


Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2413
Author(s):  
Yeong Jun Kim ◽  
Muhammad Asim ◽  
Tae Ho Im ◽  
Yong Soo Cho

In underwater acoustic cellular (UWAC) systems, underwater equipment or sensor nodes (UE/SN) should perform downlink synchronisation and a cell search during the initial access stage using the preambles received from adjacent underwater base stations (UWBSs). The UE/SN needs to estimate accurate timing and cell ID (CID) using the received preambles, and synchronise with a serving UWBS, even in high-Doppler environments. In this paper, a sequence design technique for joint estimation of accurate timing and CID in UWAC systems with a high Doppler is proposed to decrease the receiver complexity and processing time. A generalised Zadoff–Chu sequence is proposed for the preamble design. This sequence is decomposed into multiple short sub-sequences to reduce the effect of Doppler shift on the timing and CID estimation. The performance loss caused by the short sequence length is compensated by combining the sub-sequences using the repetition property of the ZC sequence. The properties (autocorrelation and cross-correlation) of the proposed sequence are derived analytically in the presence of Doppler shift and compared with the simulation results. The simulation results reveal that the proposed technique performs better than existing techniques in both additive white Gaussian noise and multipath channels with a high-Doppler. It is concluded that the proposed technique is suitable for accurate timing estimation and CID detection in UWAC systems with a high Doppler.


2021 ◽  
Vol 118 (40) ◽  
pp. e2106808118
Author(s):  
Oliver G. Hayes ◽  
Benjamin E. Partridge ◽  
Chad A. Mirkin

The structural and functional diversity of materials in nature depends on the controlled assembly of discrete building blocks into complex architectures via specific, multistep, hierarchical assembly pathways. Achieving similar complexity in synthetic materials through hierarchical assembly is challenging due to difficulties with defining multiple recognition areas on synthetic building blocks and controlling the sequence through which those recognition sites direct assembly. Here, we show that we can exploit the chemical anisotropy of proteins and the programmability of DNA ligands to deliberately control the hierarchical assembly of protein–DNA materials. Through DNA sequence design, we introduce orthogonal DNA interactions with disparate interaction strengths (“strong” and “weak”) onto specific geometric regions of a model protein, stable protein 1 (Sp1). We show that the spatial encoding of DNA ligands leads to highly directional assembly via strong interactions and that, by design, the first stage of assembly increases the multivalency of weak DNA–DNA interactions that give rise to an emergent second stage of assembly. Furthermore, we demonstrate that judicious DNA design not only directs assembly along a given pathway but can also direct distinct structural outcomes from a single pathway. This combination of protein surface and DNA sequence design allows us to encode the structural and chemical information necessary into building blocks to program their multistep hierarchical assembly. Our findings represent a strategy for controlling the hierarchical assembly of proteins to realize a diverse set of protein–DNA materials by design.


2021 ◽  
Vol 17 (9) ◽  
pp. e1009037
Author(s):  
Jack B. Maguire ◽  
Daniele Grattarola ◽  
Vikram Khipple Mulligan ◽  
Eugene Klyshko ◽  
Hans Melo

Graph representations are traditionally used to represent protein structures in sequence design protocols in which the protein backbone conformation is known. This infrequently extends to machine learning projects: existing graph convolution algorithms have shortcomings when representing protein environments. One reason for this is the lack of emphasis on edge attributes during massage-passing operations. Another reason is the traditionally shallow nature of graph neural network architectures. Here we introduce an improved message-passing operation that is better equipped to model local kinematics problems such as protein design. Our approach, XENet, pays special attention to both incoming and outgoing edge attributes. We compare XENet against existing graph convolutions in an attempt to decrease rotamer sample counts in Rosetta’s rotamer substitution protocol, used for protein side-chain optimization and sequence design. This use case is motivating because it both reduces the size of the search space for classical side-chain optimization algorithms, and allows larger protein design problems to be solved with quantum algorithms on near-term quantum computers with limited qubit counts. XENet outperformed competing models while also displaying a greater tolerance for deeper architectures. We found that XENet was able to decrease rotamer counts by 40% without loss in quality. This decreased the memory consumption for classical pre-computation of rotamer energies in our use case by more than a factor of 3, the qubit consumption for an existing sequence design quantum algorithm by 40%, and the size of the solution space by a factor of 165. Additionally, XENet displayed an ability to handle deeper architectures than competing convolutions.


Sign in / Sign up

Export Citation Format

Share Document