Identification of residue inversions in large phylogenies of duplicated proteins

Mapping Intimacies ◽

10.1101/2021.11.04.467263 ◽

2021 ◽

Author(s):

Stefano Pascarelli ◽

Paola Laurino

Keyword(s):

Gene Duplication ◽

Protein Sequence ◽

Protein Function ◽

High Throughput Sequencing ◽

Functional Divergence ◽

Growth Factor Receptor ◽

Sequence Evolution ◽

Protein Database ◽

Sequencing Studies ◽

Homology Relationship

Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. Protein database annotation helps to bridge this gap; however, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify "residue inversions", i.e., sites where the relationship between the ancestry and the functional signal is decoupled. Residues in these sites play a role in functional divergence and could indicate a shift in protein function. We develop a method to recognize residue inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 positions that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain.

Download Full-text

Evidence for an episodic model of protein sequence evolution

Biochemical Society Transactions ◽

10.1042/bst0370783 ◽

2009 ◽

Vol 37 (4) ◽

pp. 783-786 ◽

Cited By ~ 12

Author(s):

Romain A. Studer ◽

Marc Robinson-Rechavi

Keyword(s):

Amino Acid ◽

Protein Sequence ◽

Protein Function ◽

Rapid Change ◽

Sequence Evolution ◽

Functional Changes ◽

Codon Models ◽

Episodic Evolution ◽

Protein Sequence Evolution ◽

Amino Acid Conservation

The evolution of protein function appears to involve alternating periods of conservative evolution and of relatively rapid change. Evidence for such episodic evolution, consistent with some theoretical expectations, comes from the application of increasingly sophisticated models of evolution to large sequence datasets. We present here some of the recent methods to detect functional shifts, using amino acid or codon models. Both provide evidence for punctual shifts in patterns of amino acid conservation, including the fixation of key changes by positive selection. Although a link to gene duplication, a presumed source of functional changes, has been difficult to establish, this episodic model appears to apply to a wide variety of proteins and organisms.

Download Full-text

A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast

Genome Research ◽

10.1101/gr.6341207 ◽

2007 ◽

Vol 18 (1) ◽

pp. 137-147 ◽

Cited By ~ 74

Author(s):

D. R. Scannell ◽

K. H. Wolfe

Keyword(s):

Gene Duplication ◽

Protein Sequence ◽

Sequence Evolution ◽

Follow Gene Duplication ◽

Protein Sequence Evolution

Download Full-text

A Computational Strategy for Protein Function Assignment which Addresses the Multidomain Problem

Comparative and Functional Genomics ◽

10.1002/cfg.208 ◽

2002 ◽

Vol 3 (5) ◽

pp. 423-440 ◽

Cited By ~ 6

Author(s):

A. J. Pérez ◽

A. Rodríguez ◽

O. Trelles ◽

G. Thode

Keyword(s):

Protein Function ◽

Sequence Similarity ◽

Protein Database ◽

Specific Sequence ◽

Functional Annotations ◽

Cluster Set ◽

Sequence Position ◽

Computational Strategy ◽

Function Assignment

A method for assigning functions to unknown sequences based on finding correlations between short signals and functional annotations in a protein database is presented. This approach is based on keyword (KW) and feature (FT) information stored in the SWISS-PROT database. The former refers to particular protein characteristics and the latter locates these characteristics at a specific sequence position. In this way, a certain keyword is only assigned to a sequence if sequence similarity is found in the position described by the FT field. Exhaustive tests performed over sequences with homologues (cluster set) and without homologues (singleton set) in the database show that assigning functions is much ’cleaner’ when information about domains (FT field) is used, than when only the keywords are used.

Download Full-text

First report of Cherry virus F infecting Japanese plum in Korea

Plant Disease ◽

10.1094/pdis-08-20-1725-pdn ◽

2020 ◽

Author(s):

Yeonhwa Jo ◽

Hoseong Choi ◽

Jin Kyong Cho ◽

Won Kyong Cho

Keyword(s):

Czech Republic ◽

High Throughput Sequencing ◽

Prunus Avium ◽

De Novo ◽

Protein Database ◽

Specific Primers ◽

The Czech Republic ◽

First Report ◽

Japanese Plum ◽

Leaf Spots

Cherry virus F (CVF) is a tentative member of the genus Fabavirus in the family Secoviridae, consisting of two RNA segments (Koloniuk et al. 2018). To date, CVF has been documented in only sweet cherry (Prunus avium) in the Czech Republic (Koloniuk et al. 2018), Canada, and Greece. In May 2014, we collected leaf samples from four symptomatic (leaf spots and dapple fruits) and two asymptomatic Japanese plum cultivars (Sun and Gadam) grown in an orchard in Hoengseong, South Korea, to identify viruses and viroids infecting plum trees. Total RNA from individual plum trees was extracted using two commercial kits: Fruit-mate for RNA Purification Kit (Takara, Shiga, Japan) and RNeasy Plant Mini Kit (Qiagen, Hilden, Germany). We generated six mRNA libraries from the six different plum cultivars for RNA-sequencing using the TruSeq RNA Library Preparation Kit v2 (Illumina, CA, U.S.A.) as described previously (Jo et al. 2017). The mRNA libraries were paired-end (2 X 100 bp) sequenced with a HiSeq 2000 system (Macrogen, Seoul, Korea). The raw sequence reads were de novo assembled by Trinity program v. 2.8.6, with default parameters (Haas et al. 2013). The assembled contigs were subjected to BLASTX search against the non-redundant protein database in NCBI. Of the two asymptomatic cultivars, the transcriptome of asymptomatic plum cv. Gadam contained five contigs specific to CVF. Two and three contigs were specific to CVF RNA1 (2,571 reads, coverage 42.15%) and RNA2 (2,025 reads, coverage 53.04%), respectively. The size of these five contigs ranged from 241 to 5,986 bp. Contigs of 5,986 and 3,867 bp in length, referred to as CVF isolate Gadam RNA1 (GenBank MN896996) and RNA2 (GenBank MN896995), respectively, were subjected to BLASTP search against NCBI’s non-redundant protein database. The results showed that the polyprotein sequences of RNA1 and RNA2 shared 95.3% and 93.11% amino acid identities with isolates SwC-H_1a from the Czech Republic (GenBank acc. no. AWB36326) and Stac-3B_c8 from Canada (AZZ10055), respectively. To confirm the infection of CVF in cv. Gadam, RT-PCR was conducted using CVF RNA1-specific primers designed based on the CVF reference genome sequences (MH998210 and MH998216), including 5’-CCACCAAATAGGCAAGAGGTCAC-3’ (position 3190–3212) and 5’-CACAATCACCATCAATGGTCTCTGC-3’ (position 3742–3766), and CVF RNA2-specific primers, including 5’-CTGCTTTATGATGCTAGACATCAAGATG-3’ (position 1015–1042) and 5’-ACAATAGGCATGCTCATCTCAACCTC-3’ (position 1594–1619). We amplified 577-bp RNA1-specific and 605-bp RNA2-specific amplicons that were cloned and then performed Sanger sequencing. Sequencing of the cloned amplicons for isolate Gadam RNA1 (GenBank MN896993) and RNA2 (GenBank MN896994) revealed values of 99.48% and 99.17% nucleotide identity to that of RNA1 and RNA2 determined by high-throughput sequencing, respectively. Additionally, we tested five plants for each of the six plum cultivars grown in the same orchard. The detection of CVF was carried out through PCR using the primers and protocol described above. Of the 30 trees, CVF was detected in three trees of cv. Gadam by both primer pairs. To our knowledge, this is the first report of CVF infecting Japanese plum and the first report of the virus in Korea. However, its prevalence in other Prunus species, including apricot, European plum, and peach, should be further elucidated.

Download Full-text

Assessing and Maximizing Cultivated Diversity with Plate-Wash PCR and High Throughput Sequencing

10.1101/2020.11.19.390864 ◽

2020 ◽

Author(s):

Emily N. Junkins ◽

Bradley S. Stevenson

Keyword(s):

High Throughput ◽

Drug Targets ◽

High Throughput Sequencing ◽

Multidrug Resistant ◽

Molecular Techniques ◽

Bioactive Metabolites ◽

Plate Count ◽

Molecular Tools ◽

Vast Number ◽

Sequencing Studies

AbstractMolecular techniques continue to reveal a growing disparity between the immense diversity of microbial life and the small proportion that is in pure culture. The disparity, originally dubbed “the great plate count anomaly” by Staley and Konopka, has become even more vexing given our increased understanding of the importance of microbiomes to a host and the role of microorganisms in the vital biogeochemical functions of our biosphere. Searching for novel antimicrobial drug targets often focuses on screening a broad diversity of microorganisms. If diverse microorganisms are to be screened, they need to be cultivated. Recent innovative research has used molecular techniques to assess the efficacy of cultivation efforts, providing invaluable feedback to cultivation strategies for isolating targeted and/or novel microorganisms. Here, we aimed to determine the efficiency of cultivating representative microorganisms from a non-human, mammalian microbiome, identify those microorganisms, and determine the bioactivity of isolates. Molecular methods indicated that around 57% of the ASVs detected in the original inoculum were cultivated in our experiments, but nearly 53% of the total ASVs that were present in our cultivation experiments were not detected in the original inoculum. In light of our controls, our data suggests that when molecular tools were used to characterize our cultivation efforts, they provided a more complete, albeit more complex, understanding of which organisms were present compared to what was eventually cultivated. Lastly, about 3% of the isolates collected from our cultivation experiments showed inhibitory bioactivity against a multidrug-resistant pathogen panel, further highlighting the importance of informing and directing future cultivation efforts with molecular tools.ImportanceCultivation is the definitive tool to understand a microorganism’s physiology, metabolism, and ecological role(s). Despite continuous efforts to hone this skill, researchers are still observing yet-to-be cultivated organisms through high-throughput sequencing studies. Here, we use the very same tool that highlights biodiversity to assess cultivation efficiency. When applied to drug discovery, where screening a vast number of isolates for bioactive metabolites is common, cultivating redundant organisms is a hindrance. However, we observed that cultivating in combination with molecular tools can expand the observed diversity of an environment and its community, potentially increasing the number of microorganisms to be screened for natural products.

Download Full-text

Gli proteins encode context-dependent positive and negative functions: implications for development and disease

Development ◽

10.1242/dev.126.14.3205 ◽

1999 ◽

Vol 126 (14) ◽

pp. 3205-3216 ◽

Cited By ~ 14

Author(s):

A. Ruiz i Altaba

Keyword(s):

Nuclear Localization ◽

Protein Function ◽

Hedgehog Signaling ◽

Functional Divergence ◽

Dominant Negative ◽

Full Length ◽

Gli Proteins ◽

A Cell ◽

Context Dependent ◽

Modified Forms

Several lines of evidence implicate zinc finger proteins of the Gli family in the final steps of Hedgehog signaling in normal development and disease. C-terminally truncated mutant GLI3 proteins are also associated with human syndromes, but it is not clear whether these C-terminally truncated Gli proteins fulfil the same function as full-length ones. Here, structure-function analyses of Gli proteins have been performed using floor plate and neuronal induction assays in frog embryos, as well as induction of alkaline phosphatase (AP) in SHH-responsive mouse C3H10T1/2 (10T1/2) cells. These assays show that C-terminal sequences are required for positive inducing activity and cytoplasmic localization, whereas N-terminal sequences determine dominant negative function and nuclear localization. Analyses of nuclear targeted Gli1 and Gli2 proteins suggest that both activator and dominant negative proteins are modified forms. In embryos and COS cells, tagged Gli cDNAs yield C-terminally deleted forms similar to that of Ci. These results thus provide a molecular basis for the human Polydactyly type A and Pallister-Hall Syndrome phenotypes, derived from the deregulated production of C-terminally truncated GLI3 proteins. Analyses of full-length Gli function in 10T1/2 cells suggest that nuclear localization of activating forms is a regulated event and show that only Gli1 mimics SHH in inducing AP activity. Moreover, full-length Gli3 and all C-terminally truncated forms act antagonistically whereas Gli2 is inactive in this assay. In 10T1/2 cells, protein kinase A (PKA), a known inhibitor of Hh signaling, promotes Gli3 repressor formation and inhibits Gli1 function. Together, these findings suggest a context-dependent functional divergence of Gli protein function, in which a cell represses Gli3 and activates Gli1/2 prevents the formation of repressor Gli forms to respond to Shh. Interpretation of Hh signals by Gli proteins therefore appears to involve a fine balance of divergent functions within each and among different Gli proteins, the misregulation of which has profound biological consequences.

Download Full-text

Identifying Biomarkers to Pair with Targeting Treatments within Triple Negative Breast Cancer for Improved Patient Stratification

Cancers ◽

10.3390/cancers11121864 ◽

2019 ◽

Vol 11 (12) ◽

pp. 1864 ◽

Cited By ~ 2

Author(s):

Holly Tovey ◽

Maggie Chon U. Cheang

Keyword(s):

Breast Cancer ◽

Triple Negative Breast Cancer ◽

High Throughput Sequencing ◽

Triple Negative ◽

Treatment Options ◽

Unmet Need ◽

Growth Factor Receptor ◽

Parp Inhibitors ◽

Genetic Features ◽

The Uk

The concept of precision medicine has been around for many years and recent advances in high-throughput sequencing techniques are enabling this to become reality. Within the field of breast cancer, a number of signatures have been developed to molecularly sub-classify tumours. Notable examples recently approved by National Institute for Health and Care Excellence in the UK to guide treatment decisions for oestrogen receptors (ER)+ human epidermal growth factor receptor 2 (HER2)- patients include Prosigna® test, EndoPredict®, and Oncotype DX®. However, a population of still unmet need are those with triple negative breast cancer (TNBC). Accounting for 15–20% of patients, this population has comparatively poor prognosis and as yet no targeted treatment options. Studies have shown that some patients with TNBC respond favourably to DNA damaging drugs (carboplatin) or agents which inhibit DNA damage response (poly ADP ribose polymerase (PARP) inhibitors). Known to be a heterogeneous population, there is a need to identify further TNBC patients who may benefit from these treatments. A number of signatures have been identified based on association with treatment response or specific genetic features/pathways however many of these were not restricted to TNBC patients and as of yet are not common practice in the clinic.

Download Full-text