scholarly journals PhANNs, a fast and accurate tool and web server to classify phage structural proteins

2020 ◽  
Vol 16 (11) ◽  
pp. e1007845
Author(s):  
Vito Adrian Cantu ◽  
Peter Salamon ◽  
Victor Seguritan ◽  
Jackson Redfield ◽  
David Salamon ◽  
...  

For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50–90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as “other,” providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.

2020 ◽  
Author(s):  
Vito Adrian Cantu ◽  
Peter Salamon ◽  
Victor Seguritan ◽  
Jackson Redfield ◽  
David Salamon ◽  
...  

AbstractFor any given bacteriophage genome or phage sequences in metagenomic data sets, we are unable to assign a function to 50-90% of genes. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten classes, and non-phage proteins are classified as “other”, providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.Author SummaryBacteriophages (phages, viruses that infect bacteria) are the most abundant biological entity on Earth. They outnumber bacteria by a factor of ten. As phages are very different within them and from bacteria, and we have comparatively few phage genes in our database, we are unable to assign function to 50%-90% of phage genes. In this work, we developed PhANNs, a machine learning tool that can classify a phage gene as one of ten structural roles, or “other”. This approach does not require a similar gene to be known.


2018 ◽  
Vol 10 (2) ◽  
pp. 84-94 ◽  
Author(s):  
M. Pershina ◽  
V.S. Bouksim ◽  
K. Arhid ◽  
F.R. Zakani ◽  
M. Aboulfatah ◽  
...  

Cells ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 821
Author(s):  
Rohitash Yadav ◽  
Jitendra Kumar Chaudhary ◽  
Neeraj Jain ◽  
Pankaj Kumar Chaudhary ◽  
Supriya Khanra ◽  
...  

Coronavirus belongs to the family of Coronaviridae, comprising single-stranded, positive-sense RNA genome (+ ssRNA) of around 26 to 32 kilobases, and has been known to cause infection to a myriad of mammalian hosts, such as humans, cats, bats, civets, dogs, and camels with varied consequences in terms of death and debilitation. Strikingly, novel coronavirus (2019-nCoV), later renamed as severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), and found to be the causative agent of coronavirus disease-19 (COVID-19), shows 88% of sequence identity with bat-SL-CoVZC45 and bat-SL-CoVZXC21, 79% with SARS-CoV and 50% with MERS-CoV, respectively. Despite key amino acid residual variability, there is an incredible structural similarity between the receptor binding domain (RBD) of spike protein (S) of SARS-CoV-2 and SARS-CoV. During infection, spike protein of SARS-CoV-2 compared to SARS-CoV displays 10–20 times greater affinity for its cognate host cell receptor, angiotensin-converting enzyme 2 (ACE2), leading proteolytic cleavage of S protein by transmembrane protease serine 2 (TMPRSS2). Following cellular entry, the ORF-1a and ORF-1ab, located downstream to 5′ end of + ssRNA genome, undergo translation, thereby forming two large polyproteins, pp1a and pp1ab. These polyproteins, following protease-induced cleavage and molecular assembly, form functional viral RNA polymerase, also referred to as replicase. Thereafter, uninterrupted orchestrated replication-transcription molecular events lead to the synthesis of multiple nested sets of subgenomic mRNAs (sgRNAs), which are finally translated to several structural and accessory proteins participating in structure formation and various molecular functions of virus, respectively. These multiple structural proteins assemble and encapsulate genomic RNA (gRNA), resulting in numerous viral progenies, which eventually exit the host cell, and spread infection to rest of the body. In this review, we primarily focus on genomic organization, structural and non-structural protein components, and potential prospective molecular targets for development of therapeutic drugs, convalescent plasm therapy, and a myriad of potential vaccines to tackle SARS-CoV-2 infection.


2007 ◽  
Vol 7 (5) ◽  
pp. 557-570 ◽  
Author(s):  
M. C. Tunusluoglu ◽  
C. Gokceoglu ◽  
H. Sonmez ◽  
H. A. Nefeslioglu

Abstract. Various statistical, mathematical and artificial intelligence techniques have been used in the areas of engineering geology, rock engineering and geomorphology for many years. However, among the techniques, artificial neural networks are relatively new approach used in engineering geology in particular. The attractiveness of ANN for the engineering geological problems comes from the information processing characteristics of the system, such as non-linearity, high parallelism, robustness, fault and failure tolerance, learning, ability to handle imprecise and fuzzy information, and their capability to generalize. For this reason, the purposes of the present study are to perform an application of ANN to a engineering geology problem having a very large database and to introduce a new approach to accelerate convergence. For these purposes, an ANN architecture having 5 neurons in one hidden layer was constructed. During the training stages, total 40 000 training cycles were performed and the minimum RMSE values were obtained at approximately 10 000th cycle. At this cycle, the obtained minimum RMSE value is 0.22 for the second training set, while that of value is calculated as 0.064 again for the second test set. Using the trained ANN model at 10 000th cycle for the second random sampling, the debris source area susceptibility map was produced and adjusted. Finally, a potential debris source susceptibility map for the study area was produced. When considering the field observations and existing inventory map, the produced map has a high prediction capacity and it can be used when assessing debris flow hazard mitigation efforts.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Ju Hyeon Kim ◽  
Do Eun Lee ◽  
SangYoun Park ◽  
John M. Clark ◽  
Si Hyeock Lee

Abstract Background Head louse females secrete liquid glue during oviposition, which is solidified to form the nit sheath over the egg. Recently, two homologous proteins, named louse nit sheath protein (LNSP) 1 and LNSP 2, were identified as adhesive proteins but the precise mechanism of nit sheath solidification is unknown. Methods We determined the temporal transcriptome profiles of the head louse accessory glands plus oviduct, from which putative major structural proteins and those with functional importance were deduced. A series of RNA interference (RNAi) experiments and treatment of an inhibitor were conducted to elucidate the function and action mechanism of each component. Results By transcriptome profiling of genes expressed in the louse accessory glands plus uterus, the LNSP1 and LNSP2 along with two hypothetical proteins were confirmed to be the major structural proteins. In addition, several proteins with functional importance, including transglutaminase (TG), defensin 1 and defensin 2, were identified. When LNSP1 was knocked down via RNA interference, most eggs became nonviable via desiccation, suggesting its role in desiccation resistance. Knockdown of LNSP2, however, resulted in oviposition failure, which suggests that LNSP2 may serve as the basic platform to form the nit sheath and may have an additional function of lubrication. Knockdown of TG also impaired egg hatching, demonstrating its role in the cross-linking of nit sheath proteins. The role of TG in cross-linking was further confirmed by injecting or hair coating of GGsTop, a TG inhibitor. Conclusions Both LNSP1 and LNSP2 are essential for maintaining egg viability besides their function as glue. The TG-mediated cross-linking plays critical roles in water preservation that are essential for ensuring normal embryogenesis. TG-mediated cross-linking mechanism can be employed as a therapeutic target to control human louse eggs, and any topically applied TG inhibitors can be exploited as potential ovicidal agents. Graphical abstract


1998 ◽  
Vol 09 (01) ◽  
pp. 71-85 ◽  
Author(s):  
A. Bevilacqua ◽  
D. Bollini ◽  
R. Campanini ◽  
N. Lanconelli ◽  
M. Galli

This study investigates the possibility of using an Artificial Neural Network (ANN) for reconstructing Positron Emission Tomography (PET) images. The network is trained with simulated data which include physical effects such as attenuation and scattering. Once the training ends, the weights of the network are held constant. The network is able to reconstruct every type of source distribution contained inside the area mapped during the learning. The reconstruction of a simulated brain phantom in a noiseless case shows an improvement if compared with Filtered Back-Projection reconstruction (FBP). In noisy cases there is still an improvement, even if we do not compensate for noise fluctuations. These results show that it is possible to reconstruct PET images using ANNs. Initially we used a Dec Alpha; then, due to the high data parallelism of this reconstruction problem, we ported the learning on a Quadrics (SIMD) machine, suited for the realization of a small medical dedicated system. These results encourage us to continue in further studies that will make possible reconstruction of images of bigger dimension than those used in the present work (32 × 32 pixels).


Sign in / Sign up

Export Citation Format

Share Document