Unevolved De Novo Proteins Have Innate Tendencies to Bind Transition Metals

Life as we know it would not exist without the ability of protein sequences to bind metal ions. Transition metals, in particular, play essential roles in a wide range of structural and catalytic functions. The ubiquitous occurrence of metalloproteins in all organisms leads one to ask whether metal binding is an evolved trait that occurred only rarely in ancestral sequences, or alternatively, whether it is an innate property of amino acid sequences, occurring frequently in unevolved sequence space. To address this question, we studied 52 proteins from a combinatorial library of novel sequences designed to fold into 4-helix bundles. Although these sequences were neither designed nor evolved to bind metals, the majority of them have innate tendencies to bind the transition metals copper, cobalt, and zinc with high nanomolar to low-micromolar affinity.

Download Full-text

In silico analysis of virulence associated genes in genomes of Escherichia coli strains causing colibacillosis in poultry

Journal of Veterinary Research ◽

10.1515/jvetres-2017-0051 ◽

2017 ◽

Vol 61 (4) ◽

pp. 421-426 ◽

Cited By ~ 2

Author(s):

Joanna Kołsut ◽

Paulina Borówka ◽

Błażej Marciniak ◽

Ewelina Wójcik ◽

Arkadiusz Wojtasik ◽

...

Keyword(s):

Escherichia Coli ◽

Amino Acid ◽

Virulence Factors ◽

De Novo ◽

Protein Sequences ◽

In Silico Analysis ◽

Amino Acid Sequences ◽

Common Disease ◽

Bacterial Genomes ◽

E Coli

AbstractIntroduction: Colibacillosis – the most common disease of poultry, is caused mainly by avian pathogenic Escherichia coli (APEC). However, thus far, no pattern to the molecular basis of the pathogenicity of these bacteria has been established beyond dispute. In this study, genomes of APEC were investigated to ascribe importance and explore the distribution of 16 genes recognised as their virulence factors.Material and Methods: A total of 14 pathogenic for poultry E. coli strains were isolated, and their DNA was sequenced, assembled de novo, and annotated. Amino acid sequences from these bacteria and an additional 16 freely available APEC amino acid sequences were analysed with the DIFFIND tool to define their virulence factors.Results: The DIFFIND tool enabled quick, reliable, and convenient assessment of the differences between compared amino acid sequences from bacterial genomes. The presence of 16 protein sequences indicated as pathogenicity factors in poultry resulted in the generation of a heatmap which categorises genomes in terms of the existence and similarity of the analysed protein sequences.Conclusion: The proposed method of detection of virulence factors using the capabilities of the DIFFIND tool may be useful in the analysis of similarities of E. coli and other sequences deriving from bacteria. Phylogenetic analysis resulted in reliable segregation of 30 APEC strains into five main clusters containing various virulence associated genes (VAGs).

Download Full-text

De novo protein design by deep network hallucination

10.1101/2020.07.22.211482 ◽

2020 ◽

Cited By ~ 2

Author(s):

Ivan Anishchenko ◽

Tamuka M. Chidyausiku ◽

Sergey Ovchinnikov ◽

Samuel J. Pellock ◽

David Baker

Keyword(s):

Amino Acid ◽

Protein Design ◽

Structure Prediction ◽

De Novo ◽

Protein Structures ◽

Monte Carlo Sampling ◽

Amino Acid Sequences ◽

Wide Range ◽

Physically Based ◽

Folded Proteins

AbstractThere has been considerable recent progress in protein structure prediction using deep neural networks to infer distance constraints from amino acid residue co-evolution1–3. We investigated whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occuring proteins used in training the models. We generated random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting distance maps, which as expected are quite featureless. We then carried out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (KL-divergence) between the distance distributions predicted by the network and the background distribution. Optimization from different random starting points resulted in a wide range of proteins with diverse sequences and all alpha, all beta sheet, and mixed alpha-beta structures. We obtained synthetic genes encoding 129 of these network hallucinated sequences, expressed and purified the proteins in E coli, and found that 27 folded to monomeric stable structures with circular dichroism spectra consistent with the hallucinated structures. Thus deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute, alongside traditional physically based models, to the de novo design of proteins with new functions.

Download Full-text

Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution

Genome Research ◽

10.1101/gr.275638.121 ◽

2021 ◽

Author(s):

Chris Papadopoulos ◽

Isabelle Callebaut ◽

Jean-Christophe Gelly ◽

Isabelle Hatin ◽

Olivier Namy ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

De Novo ◽

Protein Structures ◽

Building Blocks ◽

Amino Acid Sequences ◽

Novel Genes ◽

Noncoding Sequences ◽

Ancestral Sequences ◽

De Novo Gene

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences’ properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states’ diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.

Download Full-text

Application of the Innovative and Non-Invasive Technique, Molecular Music Therapy (MMT), Bio-Frequency Therapy, for the Treatment of a Wide Range of Disorders and Pathologies, with Consequent Verification of Molecular Parameters by Using Bi-Digital O-Ring Test (BDORT)

Acupuncture & Electro-Therapeutics Research ◽

10.3727/036012920x15779969212928 ◽

2020 ◽

Vol 44 (3) ◽

pp. 177-189

Author(s):

Momir Dunjic ◽

Stefano Turini ◽

Dejan Krstic ◽

Katarina Dunjic ◽

Marija Dunjic ◽

...

Keyword(s):

Amino Acid ◽

Music Therapy ◽

Amino Acid Sequences ◽

Sirtuin 1 ◽

Ring Test ◽

Invasive Technique ◽

Specific Gene ◽

Radiofrequency Therapy ◽

Wide Range ◽

Clinical Pictures

Radiofrequency therapy is an unconventional method, already applied for some time, with numerous results in numerous clinical pictures. Our group has developed a software, later called SONGENPROT-SOLARIS, capable of directly converting nucleotide sequences (DNA and/or RNA) and amino acid sequences (polypeptides and proteins) into musical sequences, based on mathematic matrices, designed by the French physicist and musician Joel Sternheimer, which allows to associate a musical note with a nucleotide or an amino acid. Innovation in our software is that, in the algorithm that defines it, a variant is directly implemented that allows the reproduction of sounds, phase-shifted by 30 Hz, between one ear and another reproducing the phenomenon of Binaural Tones, capable of induce a specific brain activity and also the release of particles called solitons. Thanks to this software we have developed a technique called MMT (Molecular Music Therapy) and currently, we are in the phase of applying the technique on a cohort of 91 patients, with a high spectrum of clinical pictures, examining the same, using the technique Bi-Digital-ORing-Test (BDORT), before and after treatment with MMT. Aim of project is to stimulate the expression of a specific gene (the same genetic sequence that the patient listens to, translated into music), only through the use of sound sequences. We have concentrated our attention on three main molecules: Sirtuin-1, Telomers and TP-53. The results obtained with BDORT, after treatment with MMT, showed a significant increase in the values of the three molecules, on all the examined patients, demonstrating the operative efficacy of the technique and the its applicability to numerous diseases. In order to confirm the data obtained by BDORT, we propose, with the help of an accredited laboratory, to perform epigenetic tests on the three parameters listed above, paving the way to understanding how frequencies can influence gene expression.

Download Full-text

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

Current Proteomics ◽

10.2174/1570164616666190617165107 ◽

2020 ◽

Vol 17 (1) ◽

pp. 59-77

Author(s):

Anand Kumar Nelapati ◽

JagadeeshBabu PonnanEttiyappan

Keyword(s):

Uric Acid ◽

Amino Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Physiochemical Properties ◽

Pharmaceutical Industries

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

In vitro and In Silico Approach For Characterization of Antimicrobial Peptide From Probiotics Against Staphylococcus Aureus and Escherichia Coli

10.21203/rs.3.rs-366314/v2 ◽

2021 ◽

Author(s):

Amrutha Bindu ◽

Lakshmi Devi

Keyword(s):

Amino Acid ◽

Antimicrobial Peptide ◽

Lactobacillus Plantarum ◽

Exchange Chromatography ◽

Amino Acid Sequences ◽

Proteinase K ◽

Kinetic Assay ◽

Wide Range

Abstract The focus of present study was to characterize antimicrobial peptide produced by probiotic cultures, Enterococcus durans DB-1aa (MCC4243), Lactobacillus plantarum Cu2-PM7 (MCC4246) and Lactobacillus fermentum Cu3-PM8 (MCC4233) against Staphylococus aureus and E. coli. The growth kinetic assay revealed 24 h of incubation to be optimum for bacteriocin production. The partially purified compound after ion-exchange chromatography was found to be thermoresistant and stable under wide range of pH. The compound was sensitive to proteinase-K, but resistant to trypsin, a-amylase and lipase. The apparent molecular weight of bacteriocin from MCC4243 and MCC4246 was found to be 3.5 KDa. Translated partial amino acid sequence of plnA gene in MCC4246 displayed 48 amino acid sequences showing 100% similarity with plantaricin A of Lactobacillus plantarum (WP_0036419). The sequence revealed 7 β sheets, 6 α sheets, 6 predicted coils and 9 predicted turns. The functions on cytoplasm show 10.82 isoelectric point and 48.6% hydrophobicity. The molecular approach of using Geneious Prime software and protein prediction data base for characterization of bacteriocin is novel and predicts “KSSAYSLQMGATAIKQVKKLFKKWGW” as peptide responsible for antimicrobial activity. The study provides information about broad spectrum bacteriocin in native probiotic culture and paves a way towards its application in functional foods as biopreservative agents.

Download Full-text

Evolutionary Algorithms for Improving De Novo Peptide Sequencing

10.26686/wgtn.17145581.v1 ◽

2021 ◽

Author(s):

◽

Samaneh Azari

Keyword(s):

Amino Acid ◽

De Novo ◽

Peptide Identification ◽

Peptide Sequencing ◽

De Novo Sequencing ◽

Amino Acid Sequences ◽

Full Length ◽

Multi Objective ◽

De Novo Peptide Sequencing ◽

De Novo Peptide

<p>De novo peptide sequencing algorithms have been developed for peptide identification in proteomics from tandem mass spectra (MS/MS), which can be used to identify and discover novel peptides and proteins that do not have a database available. Despite improvements in MS instrumentation and de novo sequencing methods, a significant number of CID MS/MS spectra still remain unassigned with the current algorithms, often leading to low confidence of peptide assignments to the spectra. Moreover, current algorithms often fail to construct the completely matched sequences, and produce partial matches. Therefore, identification of full-length peptides remains challenging. Another major challenge is the existence of noise in MS/MS spectra which makes the data highly imbalanced. Also missing peaks, caused by incomplete MS fragmentation makes it more difficult to infer a full-length peptide sequence. In addition, the large search space of all possible amino acid sequences for each spectrum leads to a high false discovery rate. This thesis focuses on improving the performance of current methods by developing new algorithms corresponding to three steps of preprocessing, sequence optimisation and post-processing using machine learning for more comprehensive interrogation of MS/MS datasets. From the machine learning point of view, the three steps can be addressed by solving different tasks such as classification, optimisation, and symbolic regression. Since Evolutionary Algorithms (EAs), as effective global search techniques, have shown promising results in solving these problems, this thesis investigates the capability of EAs in improving the de novo peptide sequencing. In the preprocessing step, this thesis proposes an effective GP-based method for classification of signal and noise peaks in highly imbalanced MS/MS spectra with the purpose of having a positive influence on the reliability of the peptide identification. The results show that the proposed algorithm is the most stable classification method across various noise ratios, outperforming six other benchmark classification algorithms. The experimental results show a significant improvement in high confidence peptide assignments to MS/MS spectra when the data is preprocessed by the proposed GP method. Moreover, the first multi-objective GP approach for classification of peaks in MS/MS data, aiming at maximising the accuracy of the minority class (signal peaks) and the accuracy of the majority class (noise peaks) is also proposed in this thesis. The results show that the multi-objective GP method outperforms the single objective GP algorithm and a popular multi-objective approach in terms of retaining more signal peaks and removing more noise peaks. The multi-objective GP approach significantly improved the reliability of peptide identification. This thesis proposes a GA-based method to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full-length sequences. The proposed GA method benefits the GA capability of searching a large search space of potential amino acid sequences to find the most likely full-length sequence. The experimental results show that the proposed method outperforms the most commonly used de novo sequencing method at both amino acid level and peptide level. This thesis also proposes a novel method for re-scoring and re-ranking the peptide spectrum matches (PSMs) from the result of de novo peptide sequencing, aiming at minimising the false discovery rate as a post-processing approach. The proposed GP method evolves the computer programs to perform regression and classification simultaneously in order to generate an effective scoring function for finding the correct PSMs from many incorrect ones. The results show that the new GP-based PSM scoring function significantly improves the identification of full-length peptides when it is used to post-process the de novo sequencing results.</p>

Download Full-text

Expression of intermediate filament proteins during development of Xenopus laevis. I. cDNA clones encoding different forms of vimentin

Development ◽

10.1242/dev.105.2.279 ◽

1989 ◽

Vol 105 (2) ◽

pp. 279-298

Author(s):

H. Herrmann ◽

B. Fouquet ◽

W.W. Franke

Keyword(s):

Amino Acid ◽

Xenopus Laevis ◽

Intermediate Filament ◽

De Novo ◽

Mesenchymal Cell ◽

Amino Acid Sequences ◽

Cytoskeletal Proteins ◽

Cdna Clones ◽

Mammalian Development ◽

Intermediate Filament Proteins

To provide a basis for studies of the expression of genes encoding the diverse kinds of intermediate-filament (IF) proteins during embryogenesis of Xenopus laevis we have isolated and characterized IF protein cDNA clones. Here we report the identification of two types of Xenopus vimentin, Vim1 and Vim4, with their complete amino acid sequences as deduced from the cloned cDNAs, both of which are expressed during early embryogenesis. In addition, we have obtained two further vimentin cDNAs (Vim2 and 3) which are sequence variants of closely related Vim1. The high evolutionary conservation of the amino acid sequences (Vim1: 458 residues; Mr approximately 52,800; Vim4: 463 residues; Mr approximately 53,500) to avian and mammalian vimentin and, to a lesser degree, to desmin from the same and higher vertebrate species, is emphasized, including conserved oligopeptide motifs in their head domains. Using these cDNAs in RNA blot and ribonuclease protection assays of various embryonic stages, we observed a dramatic increase of vimentin RNA at stage 14, in agreement with immunocytochemical results obtained with antibody VIM-3B4. The significance of very weak mRNA signals detected in earlier stages is discussed in relation to negative immunocytochemical results obtained in these stages. The first appearance of vimentin has been localized to a distinct mesenchymal cell layer underlying the neural plate or tube, respectively. The results are discussed in relation to programs of de novo synthesis of other cytoskeletal proteins in amphibian and mammalian development.

Download Full-text

Differentiating closely affiliated Dehalococcoides lineages by a novel genetic marker identified via computational pangenome analysis

Applied and Environmental Microbiology ◽

10.1128/aem.02181-21 ◽

2021 ◽

Author(s):

Siyan Zhao ◽

Chen Zhang ◽

Matthew J. Rogers ◽

Xuejie Zhao ◽

Jianzhong He

Keyword(s):

Amino Acid ◽

Microbial Communities ◽

Genetic Marker ◽

Genetic Markers ◽

Unknown Function ◽

Computational Approach ◽

Amino Acid Sequences ◽

Reductive Dehalogenation ◽

Wide Range

As a group, Dehalococcoides dehalogenate a wide range of organohalide pollutants but the range of organohalide compounds that can be utilized for reductive dehalogenation differs among the Dehalococcoides strains. Dehalococcoides lineages cannot be reliably disambiguated in mixed communities using typical phylogenetic markers, which often confounds bioremediation efforts. Here, we describe a computational approach to identify Dehalococcoides genetic markers with improved discriminatory resolution. Screening core genes from the Dehalococcoides pangenome for degree of similarity and frequency of 100% identity found a candidate genetic marker encoding a bacterial neuraminidase repeat (BNR)-containing protein of unknown function. This gene exhibits the fewest completely identical amino acid sequences and among the lowest average amino acid sequence identity in the core pangenome. Primers targeting BNR could effectively discriminate between 40 available BNR sequences ( in silico ) and 10 different Dehalococcoides isolates ( in vitro ). Amplicon sequencing of BNR fragments generated from 22 subsurface soil samples revealed a total of 109 amplicon sequence variants, suggesting a high diversity of Dehalococcoides distributed in environment. Therefore, the BNR gene can serve as an alternative genetic marker to differentiate strains of Dehalococcoides in complicated microbial communities. Importance The challenge of discriminating between phylogenetically similar but functionally distinct bacterial lineages is particularly relevant to the development of technologies seeking to exploit the metabolic or physiological characteristics of specific members of bacterial genera. A computational approach was developed to expedite screening of potential genetic markers among phylogenetically affiliated bacteria. Using this approach, a gene encoding a bacterial neuraminidase repeat (BNR)-containing protein of unknown function was selected and evaluated as a genetic marker to differentiate strains of Dehalococcoides , an environmentally relevant genus of bacteria whose members can transform and detoxify a range of halogenated organic solvents and persistent organic pollutants, in complex microbial communities to demonstrate the validity of the approach. Moreover, many apparently phylogenetically distinct, currently uncharacterized Dehalococcoides were detected in environmental samples derived from contaminated sites.

Download Full-text