protein families
Recently Published Documents


TOTAL DOCUMENTS

657
(FIVE YEARS 124)

H-INDEX

81
(FIVE YEARS 6)

2022 ◽  
Vol 12 ◽  
Author(s):  
Theo Tasoulis ◽  
Tara L. Pukala ◽  
Geoffrey K. Isbister

Understanding snake venom proteomes is becoming increasingly important to understand snake venom biology, evolution and especially clinical effects of venoms and approaches to antivenom development. To explore the current state of snake venom proteomics and transcriptomics we investigated venom proteomic methods, associations between methodological and biological variability and the diversity and abundance of protein families. We reviewed available studies on snake venom proteomes from September 2017 to April 2021. This included 81 studies characterising venom proteomes of 79 snake species, providing data on relative toxin abundance for 70 species and toxin diversity (number of different toxins) for 37 species. Methodologies utilised in these studies were summarised and compared. Several comparative studies showed that preliminary decomplexation of crude venom by chromatography leads to increased protein identification, as does the use of transcriptomics. Combining different methodological strategies in venomic approaches appears to maximize proteome coverage. 48% of studies used the RP-HPLC →1D SDS-PAGE →in-gel trypsin digestion → ESI -LC-MS/MS pathway. Protein quantification by MS1-based spectral intensity was used twice as commonly as MS2-based spectral counting (33–15 studies). Total toxin diversity was 25–225 toxins/species, with a median of 48. The relative mean abundance of the four dominant protein families was for elapids; 3FTx–52%, PLA2–27%, SVMP–2.8%, and SVSP–0.1%, and for vipers: 3FTx–0.5%, PLA2–24%, SVMP–27%, and SVSP–12%. Viper venoms were compositionally more complex than elapid venoms in terms of number of protein families making up most of the venom, in contrast, elapid venoms were made up of fewer, but more toxin diverse, protein families. No relationship was observed between relative toxin diversity and abundance. For equivalent comparisons to be made between studies, there is a need to clarify the differences between methodological approaches and for acceptance of a standardised protein classification, nomenclature and reporting procedure. Correctly measuring and comparing toxin diversity and abundance is essential for understanding biological, clinical and evolutionary implications of snake venom composition.


2022 ◽  
Vol 8 (1) ◽  
pp. 67
Author(s):  
Małgorzata Orłowska ◽  
Anna Muszewska

Early-diverging fungi (EDF) are ubiquitous and versatile. Their diversity is reflected in their genome sizes and complexity. For instance, multiple protein families have been reported to expand or disappear either in particular genomes or even whole lineages. The most commonly mentioned are CAZymes (carbohydrate-active enzymes), peptidases and transporters that serve multiple biological roles connected to, e.g., metabolism and nutrients intake. In order to study the link between ecology and its genomic underpinnings in a more comprehensive manner, we carried out a systematic in silico survey of protein family expansions and losses among EDF with diverse lifestyles. We found that 86 protein families are represented differently according to EDF ecological features (assessed by median count differences). Among these there are 19 families of proteases, 43 CAZymes and 24 transporters. Some of these protein families have been recognized before as serine and metallopeptidases, cellulases and other nutrition-related enzymes. Other clearly pronounced differences refer to cell wall remodelling and glycosylation. We hypothesize that these protein families altogether define the preliminary fungal adaptasome. However, our findings need experimental validation. Many of the protein families have never been characterized in fungi and are discussed in the light of fungal ecology for the first time.


Author(s):  
S. Dinesh

Abstract: Homology detection plays a major role in bioinformatics. Different type of methods is used for Homology detection. Here we extract the information from protein sequences and then uses the various algorithm to predict the similarity between protein families. SVM most commonly used the algorithm in homology detection. Classification techniques are not suitable for homology detection because theyare not suitable for high dimensional datasets. Soreducing the higher dimensionality is very important than easily can predict the similarity of protein families. Keywords: Homology detection, Protein, Sequence, Reducing dimensionality, BLAST, SCOP.


2021 ◽  
Vol 3 (2) ◽  
pp. 3-18
Author(s):  
Partha Mukherjee ◽  
Youakim Badr ◽  
Srushti Karvekar ◽  
Shanmugapriya Viswanathan

The world currently is going through a serious pandemic due to the coronavirus disease (COVID-19). In this study, we investigate the gene structure similarity of coronavirus genomes isolated from COVID-19 patients, Severe Acute Respiratory Syndrome (SARS) patients and bats genes. We also explore the extent of similarity between their genome structures to find if the new coronavirus is similar to either of the other genome structures. Our experimental results show that there is 82.42% similarity between the CoV-2 genome structure and the bat genome structure. Moreover, we have used a bidirectional Gated Recurrent Unit (GRU) model as the deep learning technique and an improved variant of Recurrent Neural networks (i.e., Bidirectional Long Short Term Memory model) to classify the protein families of these genomes to isolate the prominent protein family accession. The accuracy of Gated Recurrent Unit (GRU) is 98% for labeled protein sequences against the protein families. By comparing the performance of the Gated Recurrent Unit (GRU) model with the Bidirectional Long Short Term Memory (Bi-LSTM) model results, we found that the GRU model is 1.6% more accurate than the Bi-LSTM model for our multiclass protein classification problem. Our experimental results would be further support medical research purposes in targeting the protein family similarity to better understand the coronavirus genomic structure.


2021 ◽  
Author(s):  
Michael Y. Galperin ◽  
Shan-Ho Chou

The HD-GYP domain, named after two of its conserved sequence motifs, was first described in 1999 as a specialized version of the widespread HD phosphohydrolase domain that had additional highly conserved amino acid residues. Domain associations of HD-GYP indicated its involvement in bacterial signal transduction and distribution patterns of this domain suggested that it could serve as a hydrolase of the bacterial second messenger c-di-GMP, in addition to or instead of the EAL domain. Subsequent studies confirmed the ability of various HD-GYP domains to hydrolyze c-di-GMP to linear pGpG and/or GMP. Certain HD-GYP-containing proteins hydrolyze another second messenger, cGAMP, and some HD-GYP domains participate in regulatory protein-protein interactions. The recently solved structures of HD-GYP domains from four distinct organisms clarified the mechanisms of c-di-GMP binding and metal-assisted hydrolysis. However, the HD-GYP domain is poorly represented in public domain databases, which causes certain confusion about its phylogenetic distribution, functions, and domain architectures. Here, we present a refined sequence model for the HD-GYP domain and describe the roles of its most conserved residues in metal and/or substrate binding. We also calculate the numbers of HD-GYPs encoded in various genomes and list the most common domain combinations involving HD-GYP, such as the RpfG (REC–HD-GYP), Bd1817 (DUF3391– HD-GYP), and PmGH (GAF–HD-GYP) protein families. We also provide the descriptions of six HD-GYP–associated domains, including four novel integral membrane sensor domains. This work is expected to stimulate studies of diverse HD-GYP-containing proteins, their N-terminal sensor domains and the signals to which they respond. IMPORTANCE The HD-GYP domain forms class II of c-di-GMP phosphodiesterases that control the cellular levels of the universal bacterial second messenger c-di-GMP and therefore affect flagellar and/or twitching motility, cell development, biofilm formation, and, often, virulence. Despite more than 20 years of research, HD-GYP domains are insufficiently characterized; they are often confused with ‘classical’ HD domains that are involved in various housekeeping activities and may participate in signaling, hydrolyzing (p)ppGpp and c-di-AMP. This work provides an updated description of the HD-GYP domain, including its sequence conservation, phylogenetic distribution, domain architectures, and the most widespread HD-GYP-containing protein families. This work shows that HD-GYP domains are widespread in many environmental bacteria and are predominant c-di-GMP hydrolases in many lineages, including clostridia and deltaproteobacteria .


Insects ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1116
Author(s):  
Elkin Aguirre-Ramirez ◽  
Sandra Velasco-Cuervo ◽  
Nelson Toro-Perea

Anastrepha obliqua (Macquart) (Diptera: Tephritidae) is an important pest in the neotropical region. It is considered a polyphagous insect, meaning it infests plants of different taxonomic families and readily colonizes new host plants. The change to new hosts can lead to diversification and the formation of host races. Previous studies investigating the effect of host plants on population structure and selection in Anastrepha obliqua have focused on the use of data from the mitochondrial DNA sequence and microsatellite markers of nuclear DNA, and there are no analyses at the genomic level. To better understand this issue, we used a pooled restriction site-associated DNA sequencing (pooled RAD-seq) approach to assess genomic differentiation and population structure across sympatric populations of Anastrepha obliqua that infest three host plants—Spondias purpurea (red mombin), Mangifera indica (mango) of the family Anacardiaceae and Averrhoa carambola (carambola) of the family Oxalidaceae—in sympatric populations of the species Anastrepha obliqua of Inter-Andean Valley of the Cauca River in southwestern Colombia. Our results show genomic differentiation of populations from carambola compared to mango and red mombin populations, but the genetic structure was mainly established by geography rather than by the host plant. On the other hand, we identified 54 SNPs in 23 sequences significantly associated with the use of the host plant. Of these 23 sequences, we identified 17 candidate genes and nine protein families, of which four protein families are involved in the nutrition of these flies. Future studies should investigate the adaptive processes undergone by phytophagous insects in the Neotropics, using fruit flies as a model and state-of-the-art molecular tools.


Author(s):  
Soledad N. Gonzalez ◽  
Valeria Sulzyk ◽  
Mariana Weigel Muñoz ◽  
Patricia S. Cuasnicu

Mammalian fertilization is a complex process involving a series of successive sperm-egg interaction steps mediated by different molecules and mechanisms. Studies carried out during the past 30 years, using a group of proteins named CRISP (Cysteine-RIch Secretory Proteins), have significantly contributed to elucidating the molecular mechanisms underlying mammalian gamete interaction. The CRISP family is composed of four members (i.e., CRISP1-4) in mammals, mainly expressed in the male tract, present in spermatozoa and exhibiting Ca2+ channel regulatory abilities. Biochemical, molecular and genetic approaches show that each CRISP protein participates in more than one stage of gamete interaction (i.e., cumulus penetration, sperm-ZP binding, ZP penetration, gamete fusion) by either ligand-receptor interactions or the regulation of several capacitation-associated events (i.e., protein tyrosine phosphorylation, acrosome reaction, hyperactivation, etc.) likely through their ability to regulate different sperm ion channels. Moreover, deletion of different numbers and combination of Crisp genes leading to the generation of single, double, triple and quadruple knockout mice showed that CRISP proteins are essential for male fertility and are involved not only in gamete interaction but also in previous and subsequent steps such as sperm transport within the female tract and early embryo development. Collectively, these observations reveal that CRISP have evolved to perform redundant as well as specialized functions and are organized in functional modules within the family that work through independent pathways and contribute distinctly to fertility success. Redundancy and compensation mechanisms within protein families are particularly important for spermatozoa which are transcriptionally and translationally inactive cells carrying numerous protein families, emphasizing the importance of generating multiple knockout models to unmask the true functional relevance of family proteins. Considering the high sequence and functional homology between rodent and human CRISP proteins, these observations will contribute to a better understanding and diagnosis of human infertility as well as the development of new contraceptive options.


2021 ◽  
Author(s):  
Geoffroy Dubourg-Felonneau ◽  
Shahab Shams ◽  
Eyal Akiva ◽  
Lawrence Lee

We present a method to provide a biologically meaningful representation of the space of protein sequences. While billions of protein sequences are available, organizing this vast amount of information into functional categories is daunting, time-consuming and incomplete. We present our unsupervised approach that combines Transformer protein language models, UMAP graphs, and spectral clustering to create meaningful clusters in the protein spaces. To demonstrate the meaningfulness of the clusters, we show that they preserve most of the signal present in a dataset of manually curated enzyme protein families.


2021 ◽  
Author(s):  
Hiral Sanghavi ◽  
Richa Rashmi ◽  
Anirban Dasgupta ◽  
Sharmistha Majumdar

Abstract Guanine nucleotide binding proteins are characterized by a structurally and mechanistically conserved GTP-binding domain (G domain), indispensable for binding GTP. The G domain comprises five adjacent consensus motifs called G boxes, which are separated by amino acid spacers of different lengths. Several G proteins, discovered over time, are characterized by diverse function and sequence. This sequence diversity is also observed in the G box motifs (specifically the G5 box) as well as the inter-G box spacer length. The Spacers and Mismatch Algorithm (SMA) introduced in this study can predict G-domains in a given protein sequence, based on user-specified constraints for approximate G-box patterns and inter-box gaps in each G protein family. The SMA parameters can be customized as more G proteins are discovered and characterized structurally. Family-specific G box motifs including the less characterized G5 box were predicted with higher accuracy. Overall, our analysis suggests the possible classification of G protein families based on family-specific G box sequences and lengths of inter-G box spacers. SMA can be implemented via a web-based server at https://labs.iitgn.ac.in/datascience/gboxes/


Sign in / Sign up

Export Citation Format

Share Document