pfam domain
Recently Published Documents


TOTAL DOCUMENTS

27
(FIVE YEARS 18)

H-INDEX

6
(FIVE YEARS 2)

2022 ◽  
Vol 8 ◽  
Author(s):  
Sucharita Dey ◽  
Jaime Prilusky ◽  
Emmanuel D. Levy

The identification of physiologically relevant quaternary structures (QSs) in crystal lattices is challenging. To predict the physiological relevance of a particular QS, QSalign searches for homologous structures in which subunits interact in the same geometry. This approach proved accurate but was limited to structures already present in the Protein Data Bank (PDB). Here, we introduce a webserver (www.QSalign.org) allowing users to submit homo-oligomeric structures of their choice to the QSalign pipeline. Given a user-uploaded structure, the sequence is extracted and used to search homologs based on sequence similarity and PFAM domain architecture. If structural conservation is detected between a homolog and the user-uploaded QS, physiological relevance is inferred. The web server also generates alternative QSs with PISA and processes them the same way as the query submitted to widen the predictions. The result page also shows representative QSs in the protein family of the query, which is informative if no QS conservation was detected or if the protein appears monomeric. These representative QSs can also serve as a starting point for homology modeling.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Stefan Wichmann ◽  
Siegfried Scherer ◽  
Zachary Ardern

Abstract Background Overlapping genes (OLGs) with long protein-coding overlapping sequences are disallowed by standard genome annotation programs, outside of viruses. Recently however they have been discovered in Archaea, diverse Bacteria, and Mammals. The biological factors underlying life’s ability to create overlapping genes require more study, and may have important applications in understanding evolution and in biotechnology. A previous study claimed that protein domains from viruses were much better suited to forming overlaps than those from other cellular organisms - in this study we assessed this claim, in order to discover what might underlie taxonomic differences in the creation of gene overlaps. Results After overlapping arbitrary Pfam domain pairs and evaluating them with Hidden Markov Models we find OLG construction to be much less constrained than expected. For instance, close to 10% of the constructed sequences cannot be distinguished from typical sequences in their protein family. Most are also indistinguishable from natural protein sequences regarding identity and secondary structure. Surprisingly, contrary to a previous study, virus domains were much less suitable for designing OLGs than bacterial or eukaryotic domains were. In general, the amount of amino acid change required to force a domain to overlap is approximately equal to the variation observed within a typical domain family. The resulting high similarity between natural sequences and those altered so as to overlap is mostly due to the combination of high redundancy in the genetic code and the evolutionary exchangeability of many amino acids. Conclusions Synthetic overlapping genes which closely resemble natural gene sequences, as measured by HMM profiles, are remarkably easy to construct, and most arbitrary domain pairs can be altered so as to overlap while retaining high similarity to the original sequences. Future work however will need to assess important factors not considered such as intragenic interactions which affect protein folding. While the analysis here is not sufficient to guarantee functional folding proteins, further analysis of constructed OLGs will improve our understanding of the origin of these remarkable genetic elements across life and opens up exciting possibilities for synthetic biology.


2021 ◽  
Vol 12 ◽  
Author(s):  
Wenying Xu ◽  
Tong Liu ◽  
Huiying Zhang ◽  
Hong Zhu

DIRIGENT (DIR) genes are key players in environmental stress responses that have been identified in many vascular plant species. However, few studies have examined the VrDIR genes in mungbean. In this study, we characterized 37 VrDIR genes in mungbean using a genome-wide identification method. VrDIRs were distributed on seven of the 11 mungbean chromosomes, and chromosome three contained the most VrDIR genes, with seven members. Thirty-two of the 37 VrDIRs contained a typical DIR gene structure, with one exon; the conserved DIR domain (i.e., Pfam domain) occupied most of the protein in 33 of the 37 VrDIRs. The gene structures of VrDIR genes were analyzed, and a total of 19 distinct motifs were detected. VrDIR genes were classified into five groups based on their phylogenetic relationships, and 13 duplicated gene pairs were identified. In addition, a total of 92 cis-acting elements were detected in all 37 VrDIR promoter regions, and VrDIR genes contained different numbers and types of cis-acting elements. As a result, VrDIR genes showed distinct expression patterns in different tissues and in response to salt and drought stress.


2021 ◽  
Author(s):  
Tülay Karakulak ◽  
Damian Szklarczyk ◽  
Holger Moch ◽  
Christian von Mering ◽  
Abdullah Kahraman

Motivation: Alternative splicing, as an essential regulatory mechanism in normal mammalian cells, is frequently disturbed in cancer. Switches in the expression of alternative isoforms can alter protein interaction networks of associated genes giving rise to cancer progression and metastases. We have recently analysed the pathogenic impact of switching events in 1209 cancer samples covering 24 different cancer types. Here, we are presenting CanIsoNet (Cancer Isoform specific interaction Network), a database to view, browse and search these isoform switching events. CanIsoNet is the first webserver that incorporates isoform expression data with STRING interaction networks and COSMIC annotations to predict the pathogenic impact of isoform switching events in various cancer types. Results: Data in CanIsoNet can be browsed by cancer types or searched by genes or isoforms in annotation rich data tables. Various annotations for 11,041 isoforms and 31,748 unique isoform switching events are provided across 24 cancer types, including proximity information to COSMIC cancer census genes, network density data for each cancer-specific isoform, PFAM domain IDs of disrupted interactions, domain structure visualization of transcripts and expression data of switched isoforms for each sample. Availability: CanIsoNet is freely available at https://caniso.net under a Creative Common License. The source codes can be found at https://github.com/KarakulakTulay/CanIsoNet_Web


2021 ◽  
Author(s):  
Marc-André Lemay ◽  
Jonas A. Sibbesen ◽  
Davoud Torkamaneh ◽  
Jérémie Hamel ◽  
Roger C. Levesque ◽  
...  

Background: Structural variant (SV) discovery based on short reads is challenging due to their complex signatures and tendency to occur in repeated regions. The increasing availability of long-read technologies has greatly facilitated SV discovery, however these technologies remain too costly to apply routinely to population-level studies. Here, we combined short-read and long-read sequencing technologies to provide a comprehensive population-scale assessment of structural variation in a panel of Canadian soybean cultivars. Results: We used Oxford Nanopore sequencing data (~12X mean coverage) for 17 samples to both benchmark SV calls made from the Illumina data and predict SVs that were subsequently genotyped in a population of 102 samples using Illumina data. Benchmarking results show that variants discovered using Oxford Nanopore can be accurately genotyped from the Illumina data. We first use the genotyped SVs for population structure analysis and show that results are comparable to those based on single-nucleotide variants. We observe that the population frequency and distribution within the genome of SVs are constrained by the location of genes. Gene Ontology and PFAM domain enrichment analyses also confirm previous reports that genes harboring high-frequency SVs are enriched for functions in defense response. Finally, we discover polymorphic transposable elements from the SVs and report evidence of the recent activity of a Stowaway MITE. Conclusions: Our results demonstrate that long-read and short-read sequencing technologies can be efficiently combined to enhance SV analysis in large populations, providing a reusable framework for their study in a wider range of samples and non-model species.


2021 ◽  
Author(s):  
Jorge Augusto Hongo ◽  
Giovanni Marques de Castro ◽  
Agnello Cesar Rios Picorelli ◽  
Thieres Tayroni Martins da Silva ◽  
Eddie Luidy Imada ◽  
...  

The increasing availability of high-quality genomic, annotation and phenotypic data for different species contrasts with the lack of general software for comparative genomics that integrates these data types in a statistically sound framework in order to produce biologically meaningful knowledge. In this work, we present CALANGO (Comparative AnaLysis with ANnotation-based Genomic cOmponentes), a first-principles comparative genomics tool to search for annotation terms, such as GO terms or Pfam domain IDs, associated with a quantitative variable used to rank species data, after correcting for phylogenetic relatedness. This information can be used to annotate genomes at any level, including protein domains, genes, or promoters, allowing comparative analyses of genomes at several resolutions and from distinct functional and evolutionary angles. CALANGO outputs a set of HTML5 files that can be opened in any conventional web browser, featuring interactive heatmaps, scatter plots, and tables, stimulating scientific reproducibility, data sharing, and exploratory analysis. Detailed results and a reproducibility-focused data structure are also returned after each run of the tool. CALANGO provides classic association statistics used in comparative genomics, including correlation coefficients, probabilities, and phylogeny-aware linear models. To illustrate how CALANGO can be used to produce biologically meaningful, statistically sound knowledge, we present a case study of the co-evolution of Escherichia coli and their integrated bacteriophages (prophages). Through controlled in silico experiments, we demonstrate that terms from a functional annotation are both more prevalent across genomes and more abundant than the homology-based annotation terms commonly used in traditional comparative genomics studies. This result demonstrates how GO-based annotation captures information of non-homologous sequences fulfilling the same biological roles. Most homologous regions positively associated with prophage occurrence are found in genes of viral origin (e.g. capsids, lysozymes, and integrases), as expected, while the second most abundant category is virulence factors. The removal of viral genes demonstrated that most of the virulence factors associated with prophage density are located outside viral genes, suggesting a more complex biological process than the archetypal bacteriophage-mediated horizontal gene transfer of virulence factors. The functional annotation performed by CALANGO revealed several GO terms describing general and specific aspects of viral biology (e.g. "viral life cycle", "DNA integration"). We also found an association of the GO term "pathogenicity", used to annotate several non-homologous virulence factors, as well as terms describing several known virulence mechanisms in pathogenic E. coli (e.g. "Type III secretion system"). Moreover, CALANGO also detected previously unknown associations that unveil a richer scenario of the bacteriophage-host biological interaction. An interesting example is the association of GO term "response to stress", used to annotate several classes of non-homologous genes components of distinct stress response mechanisms (e.g. peroxidases, DNA repair enzymes, heat shock proteins), indicating that the horizontal transfer of such genes may be adaptive for host cells and, consequently, advantageous for the integrated prophages as well. CALANGO is provided as a fully operational, out-of-the-box R package that can be freely installed directly from CRAN. Usage examples and longer-format documentation are also available at maintainer's github page.


2021 ◽  
Author(s):  
Sarah E Jensen ◽  
Edward S. Buckler

The increase in global temperatures predicted by climate change models presents a serious problem for agriculture because high temperatures reduce crop yields. Protein biochemistry is at the core of plant heat stress response, and understanding the interactions between protein biochemistry and temperature will be key to developing heat-tolerant crop varieties. Current experimental studies of proteome-wide plant thermostability are limited by the complexity of plant proteomes: evaluating function for thousands of proteins across a variety of temperatures is simply not feasible with existing technologies. In this paper, we use homologous prokaryote sequences to predict plant Pfam temperature adaptation and gain insights into how thermostability varies across the proteome for three species: maize, Arabidopsis, and poplar. We find that patterns of Pfam domain adaptation across organelles are consistent and highly significant between species, with cytosolic proteins having the largest range of predicted Pfam stabilities and a long tail of highly-stable ribosomal proteins. Pfam adaptation in leaf and root organs varies between species, and maize root proteins have more low-temperature Pfam domains than do Arabidopsis or poplar root proteins. Both poplar and maize populations have an excess of low-temperature mutations in Pfam domains, but only the mutations identified in poplar accessions have a negative effect on Pfam temperature adaptation overall. These Pfam domain adaptation profiles provide insight into how different plant structures adapt to their surrounding environment and can help inform breeding or protein editing strategies to produce heat-tolerant crops.


2021 ◽  
Author(s):  
Sarah E Jensen ◽  
Lynn C Johnson ◽  
Terry Casstevens ◽  
Edward S. Buckler

Protein thermostability is important for fitness but difficult to measure across the proteome. Fortunately, protein thermostability is correlated with prokaryote optimal growth temperatures (OGTs), which can be predicted from genome features. Models that can predict temperature sensitivity across the prokaryote-eukaryote divide would help inform how eukaryotes adapt to elevated temperatures, such as those predicted by climate change models. In this study we test whether prediction models can cross the prokaryote-eukaryote divide to predict protein stability in both prokaryotes and eukaryotes. We compare models built using a) the whole proteome, b) Pfam domains, and c) individual amino acid residues. Proteome-wide models accurately predict prokaryote optimal growth temperatures (r2 up to 0.93), while site-specific models demonstrate that nearly half of the proteome is associated with optimal growth temperature in both Archaea and Bacteria. Comparisons with the small number of eukaryotes with temperature sensitivity data suggest that site-specific models are the most transferable across the prokaryote-eukaryote divide. Using the site-specific models, we evaluated temperature sensitivity for 323,850 amino acid residues in 2,088 Pfam domain clusters in Archaea and Bacteria species separately. 59.0% of tested residues are significantly associated with OGT in Archaea and 75.2% of tested residues are significantly associated with OGT in Bacteria species at a 5% false discovery rate. These models make it possible to identify which Pfam domains and amino acid residues are involved in temperature adaptation and facilitate future research questions about how species will fare in the face of increasing environmental temperatures.


F1000Research ◽  
2021 ◽  
Vol 9 ◽  
pp. 1395
Author(s):  
Shahram Mesdaghi ◽  
David L. Murphy ◽  
Filomeno Sánchez Rodríguez ◽  
J. Javier Burgos-Mármol ◽  
Daniel J. Rigden

Background: Recent strides in computational structural biology have opened up an opportunity to understand previously uncharacterised proteins.  The under-representation of transmembrane proteins in the Protein Data Bank highlights the need to apply new and advanced bioinformatics methods to shed light on their structure and function.  This study focuses on a family of transmembrane proteins containing the Pfam domain PF09335 ('SNARE_ASSOC'/ ‘VTT ‘/’Tvp38’/'DedA'). One prominent member, Tmem41b, has been shown to be involved in early stages of autophagosome formation and is vital in mouse embryonic development as well as being identified as a viral host factor of SARS-CoV-2. Methods: We used evolutionary covariance-derived information to construct and validate ab initio models, make domain boundary predictions and infer local structural features.  Results: The results from the structural bioinformatics analysis of Tmem41b and its homologues showed that they contain a tandem repeat that is clearly visible in evolutionary covariance data but much less so by sequence analysis.  Furthermore, cross-referencing of other prediction data with covariance analysis showed that the internal repeat features two-fold rotational symmetry.  Ab initio modelling of Tmem41b and homologues reinforces these structural predictions.  Local structural features predicted to be present in Tmem41b were also present in Cl-/H+ antiporters.  Conclusions: The results of this study strongly point to Tmem41b and its homologues being transporters for an as-yet uncharacterised substrate and possibly using H+ antiporter activity as its mechanism for transport.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0246287
Author(s):  
Signe Tang Karlsen ◽  
Tammi Camilla Vesth ◽  
Gunnar Oregaard ◽  
Vera Kuzina Poulsen ◽  
Ole Lund ◽  
...  

Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these properties. Fast acidification of milk is such a strain-dependent property. To predict the maximum hourly acidification rate (Vmax), we trained Random Forest (RF) models on four different genomic representations: Presence/absence of gene families, counts of Pfam domains, the 8 nucleotide long subsequences of their DNA (8-mers), and the 9 nucleotide long subsequences of their DNA (9-mers). Vmax was measured at different temperatures, volumes, and in the presence or absence of yeast extract. These conditions were added as features in each RF model. The four models were trained on 257 strains, and the correlation between the measured Vmax and the predicted Vmax was evaluated with Pearson Correlation Coefficients (PC) on a separate dataset of 85 strains. The models all had high PC scores: 0.83 (gene presence/absence model), 0.84 (Pfam domain model), 0.76 (8-mer model), and 0.85 (9-mer model). The models all based their predictions on relevant genetic features and showed consensus on systems for lactose metabolism, degradation of casein, and pH stress response. Each model also predicted a set of features not found by the other models.


Sign in / Sign up

Export Citation Format

Share Document