scholarly journals Origins and structural properties of novel and de novo protein domains during insect evolution

FEBS Journal ◽  
2018 ◽  
Vol 285 (14) ◽  
pp. 2605-2625 ◽  
Author(s):  
Steffen Klasberg ◽  
Tristan Bitard‐Feildel ◽  
Isabelle Callebaut ◽  
Erich Bornberg‐Bauer
2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Wei He ◽  
Liang Zhang ◽  
Oscar D. Villarreal ◽  
Rongjie Fu ◽  
Ella Bedford ◽  
...  

2016 ◽  
Author(s):  
Walter Basile ◽  
Oxana Sachenkova ◽  
Sara Light ◽  
Arne Elofsson

AbstractDe novo creation of protein coding genes involves formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. De novo created proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not cause aggregation. Therefore, although the creation of the short ORFs could be truly random, but the fixation should be of subject to some selective pressure. The selective forces acting on de novo created proteins have been elusive and contradictory results have been reported. In Drosophila they are more disordered, i.e. are enriched in polar residues, than ancient proteins, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed.To solve this riddle we studied structural properties and age of all proteins in 187 eukaryotic species. We find that, on average, there are small differences between proteins of different ages, with the exception that younger proteins are shorter. However, when we take the GC content into account we find that this can explain the opposite trends observed in yeast (low GC) and drosophila (high GC). GC content is correlated with codons coding for disorder-promoting amino acids, and inversely correlated with transmembrane, helix and sheet promoting residues. We find that for the youngest proteins, i.e. the ones that are most likely to be de novo created, there exists a strong correlation with GC and structural properties. In contrast, this strong relationship is not seen for ancient proteins. This leads us to propose that structural features are not a strong determining factor for fixation of de novo created genes. Instead these proteins resemble random proteins given a particular GC level. The dependency on GC content is then gradually weakened during evolution.Author SummaryWe show that the GC content of a genomic area is of great importance for the properties of a protein-coding de novo created gene. The GC content affects the frequency of the codons and this affects the probability for each amino acid to be included in a de novo created protein. The codons encoding for Ala, Pro and Glu contain 80% GC, while codons for Lys, Phe, Asn, Tyr and Ile contain 20% or less. Pro and Gly are disorder-promoting, while Phe, Tyr and Ile are order-promoting. Therefore random protein sequences at a high GC will be more disordered than the ones created at a low GC. The structural properties of the youngest (orphan) proteins match to a large degree the properties of random proteins when the GC content is taken into account. In contrast structural properties of ancient proteins only show a weak correlation with GC content. This suggests that even after fixation of de novo created proteins largely resemble random proteins given a certain GC content. Thereafter, during evolution the correlation between structural properties and GC weakens.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Wei He ◽  
Liang Zhang ◽  
Oscar D. Villarreal ◽  
Rongjie Fu ◽  
Ella Bedford ◽  
...  

Abstract High-throughput CRISPR-Cas9 knockout screens using a tiling-sgRNA design permit in situ evaluation of protein domain function. Here, to facilitate de novo identification of essential protein domains from such screens, we propose ProTiler, a computational method for the robust mapping of CRISPR knockout hyper-sensitive (CKHS) regions, which refer to the protein regions associated with a strong sgRNA dropout effect in the screens. Applied to a published CRISPR tiling screen dataset, ProTiler identifies 175 CKHS regions in 83 proteins. Of these CKHS regions, more than 80% overlap with annotated Pfam domains, including all of the 15 known drug targets in the dataset. ProTiler also reveals unannotated essential domains, including the N-terminus of the SWI/SNF subunit SMARCB1, which is validated experimentally. Surprisingly, the CKHS regions are negatively correlated with phosphorylation and acetylation sites, suggesting that protein domains and post-translational modification sites have distinct sensitivities to CRISPR-Cas9 mediated amino acids loss.


2019 ◽  
Author(s):  
Wei He ◽  
Liang Zhang ◽  
Oscar D. Villarreal ◽  
Rongjie Fu ◽  
Ella Bedford ◽  
...  

AbstractHigh-throughput CRISPR/Cas9 knockout screens using a tiling-sgRNA design permit in situ evaluation of protein domain function. To facilitate de novo identification of essential protein domains from such screens, we developed ProTiler, a computational method for the robust mapping of CRISPR knockout hyper-sensitive (CKHS) regions, which refers to the protein regions that are associated with strong sgRNA dropout effect in the screens. We used ProTiler to analyze a published CRISPR tiling screen dataset, and identified 175 CKHS regions in 83 proteins. Of these CKHS regions, more than 80% overlapped with annotated Pfam domains, including all of the 15 known drug targets in the dataset. ProTiler also revealed unannotated essential domains, including the N-terminus of the SWI/SNF subunit SMARCB1, which we validated experimentally. Surprisingly, the CKHS regions were negatively correlated with phosphorylation and acetylation sites, suggesting that protein domains and post-translational modification sites have distinct sensitivities to CRISPR/Cas9 mediated amino acids loss.


Sign in / Sign up

Export Citation Format

Share Document