sequence composition
Recently Published Documents


TOTAL DOCUMENTS

257
(FIVE YEARS 109)

H-INDEX

29
(FIVE YEARS 5)

2022 ◽  
Author(s):  
William P Robins ◽  
John J Mekalanos

SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21st century and that likely emerged from animal reservoirs. Differences in nucleotide and protein sequence composition within related lower case Greek beta-coronaviruses are often used to better understand CoV evolution, host adaptation, and their emergence as human pathogens. Here we report the comprehensive analysis of amino acid residue changes that have occurred in lineage B lower case Greek betacoronaviruses (sarbecoviruses) that show covariance with each other. This analysis revealed patterns of covariance within conserved viral proteins that potentially define conserved interactions within and between core proteins encoded by SARS-CoV-2 related lower case Greek beta-coranaviruses. We identified not only individual pairs but also networks of amino acid residues that exhibited statistically high frequencies of covariance with each other using an independent pair model followed by a tandem model approach. Using 149 different CoV genomes that vary in their relatedness, we identified networks of unique combinations of alleles that can be incrementally traced genome by genome within different phylogenic lineages. Remarkably, covariant residues and their respective regions most abundantly represented are implicated in the emergence of SARS-CoV-2 and are also enriched in dominant SARS-CoV-2 variants.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Matthew W Parker ◽  
Jonchee A Kao ◽  
Alvin Huang ◽  
James M Berger ◽  
Michael R Botchan

Liquid-liquid phase separation (LLPS) of intrinsically disordered regions (IDRs) in proteins can drive the formation of membraneless compartments in cells. Phase-separated structures enrich for specific partner proteins and exclude others. Previously, we showed that the IDRs of metazoan DNA replication initiators drive DNA-dependent phase separation in vitro and chromosome binding in vivo, and that initiator condensates selectively recruit replication-specific partner proteins (Parker et al., 2019). How initiator IDRs facilitate LLPS and maintain compositional specificity is unknown. Here, using D. melanogaster (Dm) Cdt1 as a model initiation factor, we show that phase separation results from a synergy between electrostatic DNA-bridging interactions and hydrophobic inter-IDR contacts. Both sets of interactions depend on sequence composition (but not sequence order), are resistant to 1,6-hexanediol, and do not depend on aromaticity. These findings demonstrate that distinct sets of interactions drive condensate formation and specificity across different phase-separating systems and advance efforts to predict IDR LLPS propensity and partner selection a priori.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Carlo A. Klein ◽  
Marc Teufel ◽  
Carl J. Weile ◽  
Patrick Sobetzko

AbstractTranscription, the first step to gene expression, is a central coordination process in all living matter. Besides a plethora of regulatory mechanisms, the promoter architecture sets the foundation of expression strength, timing and the potential for further regulatory modulation. In this study, we investigate the effects of promoter spacer length and sequence composition on strength and supercoiling sensitivity in bacteria. Combining transcriptomics data analysis and standardized synthetic promoter libraries, we exclude effects of specific promoter sequence contexts. Analysis of promoter activity shows a strong variance with spacer length and spacer sequence composition. A detailed study of the spacer sequence composition under selective conditions reveals an extension to the -10 region that enhances RNAP binding but damps promoter activity. Using physiological changes in DNA supercoiling levels, we link promoter supercoiling sensitivity to overall spacer GC-content. Time-resolved promoter activity screens, only possible with a novel mild treatment approach, reveal strong promoter timing potentials solely based on DNA supercoiling sensitivity in the absence of regulatory sites or alternative sigma factors.


2021 ◽  
Author(s):  
Zexiang Han ◽  
Shayna Hilburg ◽  
Alfredo Alexander-Katz

Synthetic random heteropolymers (RHPs) with high chemical heterogeneity can self-assemble into single-chain nanoparticles that exhibit features reminiscent of natural proteins, such as topological polymorphism. Using all-atom molecular dynamics simulations, this work investigates the structure and single-chain mechanical unfolding of a library of four-component RHPs in water, studying the effects of sequence, composition, configuration, and molecular weight. Results show that compactified RHPs can have highly dynamic unfolding behaviors which are dominated by complex side-chain interactions and prove markedly different from their homopolymer counterparts. For a given sequence and conformation, an RHP’s backbone topology can strongly impact its unfolding response, hinting at the importance of topological design in the nanoscale mechanics of heteropolymers. In addition, we identify enthalpically-driven reconfiguration upon unfolding, observing a solvent-shielding protection mechanism similar to protein stabilization by PEGylation. This work provides the first computational evidence for the force-induced unfolding of protein-inspired multicomponent heteropolymers.


Statement of the Problem: The combinatorial paraphernalia in protein synthesis to be surveyed are multifarious, embracing, phenomena, processes, activities and materials, all characterized by plurality and dissimilarity. The materials usable are phenomenal and must be a set of discrete plural and dissimilar objects, e.g. the RNA four bases of Adenine, Uracil, Guanine, Cytosine (A,U,G,C) for the activity of permutation for building genetic code. Sequences for protein type sequence composition, proliferation and diversification as inherent in protein synthesis. Methodology and Theoretical Orientation: We are in for combinatorics which is the scientific study of the phenomenon of input/output productivity exhibited by a duality of numeral entities as in permutation of specified set (n) of dissimilar discrete plural. Things and selection (r) of them. The Dalina apparatus of Input/Output Multiplicative Replication system equipped with Square Kinematics View Mixing Technique sourced from inchoate Numeration Science literature being developed by this author is in use for the computation of 4 from 4 permutations of RNA four bases, A,U,G,C constituting the 24 quadruplet genetic code as the workforce in protein synthesis. Findings: The combinatorial paraphernalia in protein synthesis identified and surveyed comprise 14 characteristics, 3 materials and 11 processes/operatives. Conclusion and Significance: The relevance of the several identified and surveyed combinatorial paraphernalia in protein synthesis has been demonstrated by the test of agreeability with the working of the Dalina apparatus of Input/ Output Multiplicative Replication Combinatorial System using the Square Kinematics View Mixing technique for the computation of permutations of RNA four bases A,U,G,C making up the 24 quadruplet genetic code as the workforce in protein synthesis for the substance of all plants and animals throughout CREATION.


Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1689
Author(s):  
Lan Huang ◽  
Shaoqing Jiao ◽  
Sen Yang ◽  
Shuangquan Zhang ◽  
Xiaopeng Zhu ◽  
...  

Long noncoding RNA (lncRNA) plays a crucial role in many critical biological processes and participates in complex human diseases through interaction with proteins. Considering that identifying lncRNA–protein interactions through experimental methods is expensive and time-consuming, we propose a novel method based on deep learning that combines raw sequence composition features and hand-designed features, called LGFC-CNN, to predict lncRNA–protein interactions. The two sequence preprocessing methods and CNN modules (GloCNN and LocCNN) are utilized to extract the raw sequence global and local features. Meanwhile, we select hand-designed features by comparing the predictive effect of different lncRNA and protein features combinations. Furthermore, we obtain the structure features and unifying the dimensions through Fourier transform. In the end, the four types of features are integrated to comprehensively predict the lncRNA–protein interactions. Compared with other state-of-the-art methods on three lncRNA–protein interaction datasets, LGFC-CNN achieves the best performance with an accuracy of 94.14%, on RPI21850; an accuracy of 92.94%, on RPI7317; and an accuracy of 98.19% on RPI1847. The results show that our LGFC-CNN can effectively predict the lncRNA–protein interactions by combining raw sequence composition features and hand-designed features.


2021 ◽  
Author(s):  
Ibani Kapur ◽  
Elodie Boulier ◽  
Nicole Francis

Abstract The Polycomb group (PcG) complex PRC1 localizes in the nucleus in the form of condensed structures called Polycomb bodies. The PRC1 subunit Polyhomeotic (Ph) contains a polymerizing sterile alpha motif (SAM) that is implicated in both PcG body formation and chromatin organization in Drosophila and mammalian cells. A truncated version of Ph containing the SAM (mini-Ph), forms phase separated condensates with DNA or chromatin in vitro, suggesting PcG bodies may form by phase separation. In cells, Ph forms multiple condensates, while mini-Ph forms a single large nuclear condensate. We therefore hypothesize that sequences outside of mini-Ph are required for proper condensate formation. We identified three distinct Intrinsically Disordered Regions (IDRs) in Ph based on sequence composition and complexity. We tested the role of each IDR in Ph condensates using live imaging of transfected Drosophila S2 cells. We find that each IDR uniquely affects Ph SAM-dependent condensate size, number, and morphology.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Joel Gustafsson ◽  
Peter Norberg ◽  
Jan R. Qvick-Wester ◽  
Alexander Schliep

Abstract Background Alignment-free methods are a popular approach for comparing biological sequences, including complete genomes. The methods range from probability distributions of sequence composition to first and higher-order Markov chains, where a k-th order Markov chain over DNA has $$4^k$$ 4 k formal parameters. To circumvent this exponential growth in parameters, variable-length Markov chains (VLMCs) have gained popularity for applications in molecular biology and other areas. VLMCs adapt the depth depending on sequence context and thus curtail excesses in the number of parameters. The scarcity of available fast, or even parallel software tools, prompted the development of a parallel implementation using lazy suffix trees and a hash-based alternative. Results An extensive evaluation was performed on genomes ranging from 12Mbp to 22Gbp. Relevant learning parameters were chosen guided by the Bayesian Information Criterion (BIC) to avoid over-fitting. Our implementation greatly improves upon the state-of-the-art even in serial execution. It exhibits very good parallel scaling with speed-ups for long sequences close to the optimum indicated by Amdahl’s law of 3 for 4 threads and about 6 for 16 threads, respectively. Conclusions Our parallel implementation released as open-source under the GPLv3 license provides a practically useful alternative to the state-of-the-art which allows the construction of VLMCs even for very large genomes significantly faster than previously possible. Additionally, our parameter selection based on BIC gives guidance to end-users comparing genomes.


2021 ◽  
Author(s):  
Ibani Kapur ◽  
Élodie L Boulier ◽  
Nicole J Francis

The Polycomb group (PcG) complex PRC1 localizes in the nucleus in the form of condensed structures called Polycomb bodies. The PRC1 subunit Polyhomeotic (Ph) contains a polymerizing sterile alpha motif (SAM) that is implicated in both PcG body formation and chromatin organization in Drosophila and mammalian cells. A truncated version of Ph containing the SAM (mini-Ph), forms phase separated condensates with DNA or chromatin in vitro, suggesting PcG bodies may form by phase separation. In cells, Ph forms multiple condensates, while mini-Ph forms a single large nuclear condensate. We therefore hypothesize that sequences outside of mini-Ph are required for proper condensate formation. We identified three distinct Intrinsically Disordered Regions (IDRs) in Ph based on sequence composition and complexity. We tested the role of each IDR in Ph condensates using live imaging of transfected Drosophila S2 cells. We find that each IDR uniquely affects Ph SAM-dependent condensate size, number, and morphology.


2021 ◽  
Vol 16 (10) ◽  
pp. 75-77
Author(s):  
Parul Johri ◽  
Mala Trivedi ◽  
Sujeet Pratap Singh

Sequence analysis is a computational biology method to study protein sequences by comparing amino acids of one protein sequence with the other (residual level comparison). This study reveals a new concept of comparing protein sequences at their basic atomic level. Aquaporins from various origin were compared at their atomic level and the study revealed that all the aquaporin proteins have a closed range of 31.0% to 34.2% of carbon atoms irrespective of their origin and amino acid sequence. Further the protein interaction and functional enrichment analysis of AQP7 showed significant interaction with glycerol kinase and ATP-sensitive inward rectifier potassium channel protein. Our insilico analysis on aquaporin proteins exposed that nature tends to maintain the overall carbon atom composition in the proteins regardless of their amino acid sequence composition which could be further used for their classification. Also, the most highly interacting partners for AQPs are the potassium buffering channel proteins.


Sign in / Sign up

Export Citation Format

Share Document