scholarly journals Identification of Essential Protein Domains From High-density Transposon Insertion Sequencing

Author(s):  
A.S.M. Zisanur Rahman ◽  
Lukas Timmerman ◽  
Flyn Gallardo ◽  
Silvia T. Cardona

Abstract A first clue to gene function can be obtained by examining whether a gene is required for life in certain standard conditions, that is, whether a gene is essential. In bacteria, essential genes are usually identified by high-density transposon mutagenesis followed by sequencing of insertion sites (Tn-seq). These studies assign the term “essential” to whole genes rather than the protein domain sequences that confer the essential functions. However, genes can code for multiple protein domains that evolve their functions independently. Therefore, when essential genes code for more than one protein domain, only one of them could be essential. In this study, we defined this subset of genes as “essential domain-containing” (EDC) genes. Using a Tn-seq data set built-in Burkholderia cenocepacia K56-2, we developed an in silico pipeline to identify EDC genes and the essential protein domains they encode. We found forty candidate EDC genes and demonstrated growth defect phenotypes using CRISPR interference (CRISPRi). This analysis included two knockdowns of genes encoding the protein domains of unknown function DUF2213 and DUF4148. These essential domains are conserved in more than two hundred bacterial species, including human and plant pathogens. Together, our study suggests that essentiality should be assigned to individual protein domains rather than genes, contributing to a first functional characterization of protein domains of unknown function.

2019 ◽  
Author(s):  
Daniel Buchan ◽  
David Jones

AbstractIn this paper, using word2vec, we demonstrate that proteins domains may have semantic “meaning” in the context of multi-domain proteins. Word2vec is a group of models which can be used to produce semantically meaningful embeddings of words or tokens in a vector space. In this work we treat multi-domain proteins as “sentences” where domain identifiers are tokens which may be considered as “words”. Using all Interpro (Finn, Attwood et al. 2017) eukaryotic proteins as a corpus of “sentences” we demonstrate that Word2vec creates functionally meaningful embeddings of protein domains. We additionally show how this can be applied to identifying the putative functional roles for Pfam (Finn, Coggill et al. 2016) Domains of Unknown Function.


Proteomes ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 19
Author(s):  
Yoji Igarashi ◽  
Daisuke Mori ◽  
Susumu Mitsuyama ◽  
Kazutoshi Yoshitake ◽  
Hiroaki Ono ◽  
...  

Metagenomic data have mainly been addressed by showing the composition of organisms based on a small part of a well-examined genomic sequence, such as ribosomal RNA genes and mitochondrial DNAs. On the contrary, whole metagenomic data obtained by the shotgun sequence method have not often been fully analyzed through a homology search because the genomic data in databases for living organisms on earth are insufficient. In order to complement the results obtained through homology-search-based methods with shotgun metagenomes data, we focused on the composition of protein domains deduced from the sequences of genomes and metagenomes, and we utilized them in characterizing genomes and metagenomes, respectively. First, we compared the relationships based on similarities in the protein domain composition with the relationships based on sequence similarities. We searched for protein domains of 325 bacterial species produced using the Pfam database. Next, the correlation coefficients of protein domain compositions between every pair of bacteria were examined. Every pairwise genetic distance was also calculated from 16S rRNA or DNA gyrase subunit B. We compared the results of these methods and found a moderate correlation between them. Essentially, the same results were obtained when we used partial random 100 bp DNA sequences of the bacterial genomes, which simulated raw sequence data obtained from short-read next-generation sequences. Then, we applied the method for analyzing the actual environmental data obtained by shotgun sequencing. We found that the transition of the microbial phase occurred because the seasonal change in water temperature was shown by the method. These results showed the usability of the method in characterizing metagenomic data based on protein domain compositions.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Wei He ◽  
Liang Zhang ◽  
Oscar D. Villarreal ◽  
Rongjie Fu ◽  
Ella Bedford ◽  
...  

Abstract High-throughput CRISPR-Cas9 knockout screens using a tiling-sgRNA design permit in situ evaluation of protein domain function. Here, to facilitate de novo identification of essential protein domains from such screens, we propose ProTiler, a computational method for the robust mapping of CRISPR knockout hyper-sensitive (CKHS) regions, which refer to the protein regions associated with a strong sgRNA dropout effect in the screens. Applied to a published CRISPR tiling screen dataset, ProTiler identifies 175 CKHS regions in 83 proteins. Of these CKHS regions, more than 80% overlap with annotated Pfam domains, including all of the 15 known drug targets in the dataset. ProTiler also reveals unannotated essential domains, including the N-terminus of the SWI/SNF subunit SMARCB1, which is validated experimentally. Surprisingly, the CKHS regions are negatively correlated with phosphorylation and acetylation sites, suggesting that protein domains and post-translational modification sites have distinct sensitivities to CRISPR-Cas9 mediated amino acids loss.


2019 ◽  
Author(s):  
Wei He ◽  
Liang Zhang ◽  
Oscar D. Villarreal ◽  
Rongjie Fu ◽  
Ella Bedford ◽  
...  

AbstractHigh-throughput CRISPR/Cas9 knockout screens using a tiling-sgRNA design permit in situ evaluation of protein domain function. To facilitate de novo identification of essential protein domains from such screens, we developed ProTiler, a computational method for the robust mapping of CRISPR knockout hyper-sensitive (CKHS) regions, which refers to the protein regions that are associated with strong sgRNA dropout effect in the screens. We used ProTiler to analyze a published CRISPR tiling screen dataset, and identified 175 CKHS regions in 83 proteins. Of these CKHS regions, more than 80% overlapped with annotated Pfam domains, including all of the 15 known drug targets in the dataset. ProTiler also revealed unannotated essential domains, including the N-terminus of the SWI/SNF subunit SMARCB1, which we validated experimentally. Surprisingly, the CKHS regions were negatively correlated with phosphorylation and acetylation sites, suggesting that protein domains and post-translational modification sites have distinct sensitivities to CRISPR/Cas9 mediated amino acids loss.


2018 ◽  
Author(s):  
Stefania Daghino ◽  
Luigi Di Vietro ◽  
Luca Petiti ◽  
Elena Martino ◽  
Cristina Dallabona ◽  
...  

AbstractProtein domains are structurally and functionally distinct units responsible for particular protein functions or interactions. Although protein domains contribute to the overall protein function(s) and can be used for protein classification, about 20% of protein domains are currently annotated as “domains of an unknown function” (DUFs). DUF 614, a cysteine-rich domain better known as PLAC8 (Placenta-Specific Gene 8), occurs in proteins found in the majority of Eukaryotes. PLAC8-containing proteins play important yet diverse roles in different organisms, such as control of cell proliferation in animals and plants or heavy metal resistance in plants and fungi. For example, Onzin from Mus musculus is a key regulator of cell proliferation, whereas FCR1 from the ascomycete Oidiodendron maius confers cadmium resistance. Onzin and FCR1 are small, single-domain PLAC8 proteins and we hypothesized that, despite their apparently different role, a common molecular function of these proteins may be linked to the PLAC8 domain. To address this hypothesis, we compared these two PLAC8-containing proteins by heterologous expression in the PLAC8-free yeast Saccharomyces cerevisiae. When expressed in yeast, both Onzin and FCR1 improved cadmium resistance, reduced cadmium-induced DNA mutagenesis, localized in the nucleus and induced similar transcriptional changes. Our results support the hypothesis of a common ancestral function of the PLAC8 domain that may link some mitochondrial biosynthetic pathways (i.e. leucine biosynthesis and Fe-S cluster biogenesis) with the control of DNA damage, thus opening new perspectives to understand the role of this protein domain in the cellular biology of Eukaryotes.Author SummaryProtein domains are the functional units of proteins and typically have distinct structure and function. However, many widely distributed protein domains are currently annotated as “domains of unknown function” (DUFs). We have focused on DUF 614, a protein domain found in many Eukaryotes and better known as PLAC8 (Placenta-Specific Gene 8). The functional role of DUF 614 is unclear because PLAC8 proteins seem to play important yet different roles in taxonomically distant organisms such as animals, plants and fungi. We used S. cerevisiae to test whether these apparently different functions, namely in cell proliferation and metal tolerance, respectively reported for the murine Onzin and the fungal FCR1, are mediated by the same molecular mechanisms. Our data demonstrate that the two PLAC8 proteins induced the same growth phenotype and transcriptional changes in S. cerevisiae. In particular, they both induced the biosynthesis of the amino acid leucine and of the iron-sulfur cluster, one of the most ancient protein cofactors. These similarities support the hypothesis of an ancestral function of the DUF 164 domain, whereas the transcriptomic data open new perspectives to understand the role of PLAC8-proteins in Eukaryotes.


2020 ◽  
Vol 9 (2) ◽  
pp. 78-88
Author(s):  
Mulugeta Mulat ◽  
Raksha Anand ◽  
Fazlurrahman Khan

The diversity of indole concerning its production and functional role has increased in both prokaryotic and eukaryotic systems. The bacterial species produce indole and use it as a signaling molecule at interspecies, intraspecies, and even at an interkingdom level for controlling the capability of drug resistance, level of virulence, and biofilm formation. Numerous indole derivatives have been found to play an important role in the different systems and are reported to occur in various bacteria, plants, human, and plant pathogens. Indole and its derivatives have been recognized for a defensive role against pests and insects in the plant kingdom. These indole derivatives are produced as a result of the breakdown of glucosinolate products at the time of insect attack or physical damages. Apart from the defensive role of these products, in plants, they also exhibit several other secondary responses that may contribute directly or indirectly to the growth and development. The present review summarized recent signs of progress on the functional properties of indole and its derivatives in different plant systems. The molecular mechanism involved in the defensive role played by indole as well as its’ derivative in the plants has also been explained. Furthermore, the perspectives of indole and its derivatives (natural or synthetic) in understanding the involvement of these compounds in diverse plants have also been discussed.


2017 ◽  
Vol 727 ◽  
pp. 447-449 ◽  
Author(s):  
Jun Dai ◽  
Hua Yan ◽  
Jian Jian Yang ◽  
Jun Jun Guo

To evaluate the aging behavior of high density polyethylene (HDPE) under an artificial accelerated environment, principal component analysis (PCA) was used to establish a non-dimensional expression Z from a data set of multiple degradation parameters of HDPE. In this study, HDPE samples were exposed to the accelerated thermal oxidative environment for different time intervals up to 64 days. The results showed that the combined evaluating parameter Z was characterized by three-stage changes. The combined evaluating parameter Z increased quickly in the first 16 days of exposure and then leveled off. After 40 days, it began to increase again. Among the 10 degradation parameters, branching degree, carbonyl index and hydroxyl index are strongly associated. The tensile modulus is highly correlated with the impact strength. The tensile strength, tensile modulus and impact strength are negatively correlated with the crystallinity.


2016 ◽  
Vol 7 (1) ◽  
Author(s):  
Susan R. McCouch ◽  
Mark H. Wright ◽  
Chih-Wei Tung ◽  
Lyza G. Maron ◽  
Kenneth L. McNally ◽  
...  

Abstract Increasing food production is essential to meet the demands of a growing human population, with its rising income levels and nutritional expectations. To address the demand, plant breeders seek new sources of genetic variation to enhance the productivity, sustainability and resilience of crop varieties. Here we launch a high-resolution, open-access research platform to facilitate genome-wide association mapping in rice, a staple food crop. The platform provides an immortal collection of diverse germplasm, a high-density single-nucleotide polymorphism data set tailored for gene discovery, well-documented analytical strategies, and a suite of bioinformatics resources to facilitate biological interpretation. Using grain length, we demonstrate the power and resolution of our new high-density rice array, the accompanying genotypic data set, and an expanded diversity panel for detecting major and minor effect QTLs and subpopulation-specific alleles, with immediate implications for rice improvement.


Insects ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 86
Author(s):  
Xiaohui Yang ◽  
Yu Hui ◽  
Daohong Zhu ◽  
Yang Zeng ◽  
Lvquan Zhao ◽  
...  

Dryocosmus kuriphilus (Hymenoptera: Cynipidae) induces galls on chestnut trees, which results in massive yield losses worldwide. Torymus sinensis (Hymenoptera: Torymidae) is a host-specific parasitoid that phenologically synchronizes with D. kuriphilus. Bacteria play important roles in the life cycle of galling insects. The aim of this research is to investigate the bacterial communities and predominant bacteria of D. kuriphilus, T. sinensis, D. kuriphilus galls and the galled twigs of Castanea mollissima. We sequenced the V5–V7 region of the bacterial 16S ribosomal RNA in D. kuriphilus, T. sinensis, D. kuriphilus galls and galled twigs using high-throughput sequencing for the first time. We provide the first evidence that D. kuriphilus shares most bacterial species with T. sinensis, D. kuriphilus galls and galled twigs. The predominant bacteria of D. kuriphilus are Serratia sp. and Pseudomonas sp. Furthermore, the bacterial community structures of D. kuriphilus and T. sinensis clearly differ from those of the other groups. Many species of the Serratia and Pseudomonas genera are plant pathogenic bacteria, and we suggest that D. kuriphilus may be a potential vector of plant pathogens. Furthermore, a total of 111 bacteria are common to D. kuriphilus adults, T. sinensis, D. kuriphilus galls and galled twigs, and we suggest that the bacteria may transmit horizontally among D. kuriphilus, T. sinensis, D. kuriphilus galls and galled twigs on the basis of their ecological associations.


2017 ◽  
Author(s):  
Marcus M. Dillon ◽  
Way Sung ◽  
Michael Lynch ◽  
Vaughn S. Cooper

ABSTRACTThe causes and consequences of spatiotemporal variation in mutation rates remains to be explored in nearly all organisms. Here we examine relationships between local mutation rates and replication timing in three bacterial species whose genomes have multiple chromosomes:Vibrio fischeri, Vibrio cholerae, andBurkholderia cenocepacia. Following five evolution experiments with these bacteria conducted in the near-absence of natural selection, the genomes of clones from each lineage were sequenced and analyzed to identify variation in mutation rates and spectra. In lineages lacking mismatch repair, base-substitution mutation rates vary in a mirrored wave-like pattern on opposing replichores of the large chromosome ofV. fischeriandV. cholerae, where concurrently replicated regions experience similar base-substitution mutation rates. The base-substitution mutation rates on the small chromosome are less variable in both species but occur at similar rates as the concurrently replicated regions of the large chromosome. Neither nucleotide composition nor frequency of nucleotide motifs differed among regions experiencing high and low base-substitution rates, which along with the inferred ~800 Kb wave period suggests that the source of the periodicity is not sequence-specific but rather a systematic process related to the cell cycle. These results support the notion that base-substitution mutation rates are likely to vary systematically across many bacterial genomes, which exposes certain genes to elevated deleterious mutational load.


Sign in / Sign up

Export Citation Format

Share Document