scholarly journals Integrated synteny- and similarity-based inference on the polyploidization–fractionation cycle

2021 ◽  
Vol 11 (4) ◽  
pp. 20200059 ◽  
Author(s):  
Yue Zhang ◽  
Zhe Yu ◽  
Chunfang Zheng ◽  
David Sankoff

Whole-genome doubling, tripling or replicating to a greater degree, due to fixation of polyploidization events, is attested in almost all lineages of the flowering plants, recurring in the ancestry of some plants two, three or more times in retracing their history to the earliest angiosperm. This major mechanism in plant genome evolution, which generally appears as instantaneous on the evolutionary time scale, sets in operation a compensatory process called fractionation, the loss of duplicate genes, initially rapid, but continuing at a diminishing rate over millions and tens of millions of years. We study this process by statistically comparing the distribution of duplicate gene pairs as a function of their time of creation through polyploidization, as measured by sequence similarity. The stochastic model that accounts for this distribution, though exceedingly simple, still has too many parameters to be estimated based only on the similarity distribution, while the computational procedures for compiling the distribution from annotated genomic data is heavily biased against earlier polyploidization events—syntenic ‘crumble’. Other parameters, such as the size of the initial gene complement and the ploidy of the various events giving rise to duplicate gene pairs, are even more inaccessible to estimation. Here, we show how the frequency of unpaired genes, identified via their embedding in stretches of duplicate pairs, together with previously established constraints among some parameters, adds enormously to the range of successive polyploidization events that can be analysed. This also allows us to estimate the initial gene complement and to correct for the bias due to crumble. We explore the applicability of our methodology to four flowering plant genomes covering a range of different polyploidization histories.

2015 ◽  
Vol 16 (S17) ◽  
Author(s):  
David Sankoff ◽  
Chunfang Zheng ◽  
Baoyong Wang ◽  
Carlos Fernando Buen Abad Najar

Plants ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 1004
Author(s):  
Salvatore Esposito ◽  
Riccardo Aversano ◽  
Pasquale Tripodi ◽  
Domenico Carputo

Whole-genome doubling (polyploidy) is common in angiosperms. Several studies have indicated that it is often associated with molecular, physiological, and phenotypic changes. Mounting evidence has pointed out that micro-RNAs (miRNAs) may have an important role in whole-genome doubling. However, an integrative approach that compares miRNA expression in polyploids is still lacking. Here, a re-analysis of already published RNAseq datasets was performed to identify microRNAs’ precursors (pre-miRNAs) in diploids (2x) and tetraploids (4x) of five species (Arabidopsis thaliana L., Morus alba L., Brassica rapa L., Isatis indigotica Fort., and Solanum commersonii Dun). We found 3568 pre-miRNAs, three of which (pre-miR414, pre-miR5538, and pre-miR5141) were abundant in all 2x, and were absent/low in their 4x counterparts. They are predicted to target more than one mRNA transcript, many belonging to transcription factors (TFs), DNA repair mechanisms, and related to stress. Sixteen pre-miRNAs were found in common in all 2x and 4x. Among them, pre-miRNA482, pre-miRNA2916, and pre-miRNA167 changed their expression after polyploidization, being induced or repressed in 4x plants. Based on our results, a common ploidy-dependent response was triggered in all species under investigation, which involves DNA repair, ATP-synthesis, terpenoid biosynthesis, and several stress-responsive transcripts. In addition, an ad hoc pre-miRNA expression analysis carried out solely on 2x vs. 4x samples of S. commersonii indicated that ploidy-dependent pre-miRNAs seem to actively regulate the nucleotide metabolism, probably to cope with the increased requirement for DNA building blocks caused by the augmented DNA content. Overall, the results outline the critical role of microRNA-mediated responses following autopolyploidization in plants.


2021 ◽  
Vol 5 (1) ◽  
Author(s):  
Raul Caso ◽  
James G. Connolly ◽  
Jian Zhou ◽  
Kay See Tan ◽  
James J. Choi ◽  
...  

AbstractWhile next-generation sequencing (NGS) is used to guide therapy in patients with metastatic lung adenocarcinoma (LUAD), use of NGS to determine pathologic LN metastasis prior to surgery has not been assessed. To bridge this knowledge gap, we performed NGS using MSK-IMPACT in 426 treatment-naive patients with clinical N2-negative LUAD. A multivariable logistic regression model that considered preoperative clinical and genomic variables was constructed. Most patients had cN0 disease (85%) with pN0, pN1, and pN2 rates of 80%, 11%, and 9%, respectively. Genes altered at higher rates in pN-positive than in pN-negative tumors were STK11 (p = 0.024), SMARCA4 (p = 0.006), and SMAD4 (p = 0.011). Fraction of genome altered (p = 0.037), copy number amplifications (p = 0.001), and whole-genome doubling (p = 0.028) were higher in pN-positive tumors. Multivariable analysis revealed solid tumor morphology, tumor SUVmax, clinical stage, SMARCA4 and SMAD4 alterations were independently associated with pathologic LN metastasis. Incorporation of clinical and tumor genomic features can identify patients at risk of pathologic LN metastasis; this may guide therapy decisions before surgical resection.


Author(s):  
Jeong Eun Kim ◽  
Jaeyong Choi ◽  
Chang-Ohk Sung ◽  
Yong Sang Hong ◽  
Sun Young Kim ◽  
...  

AbstractThe global incidence of early-onset colorectal cancer (EO-CRC) is rapidly rising. However, the reason for this rise in incidence as well as the genomic characteristics of EO-CRC remain largely unknown. We performed whole-exome sequencing in 47 cases of EO-CRC and targeted deep sequencing in 833 cases of CRC. Mutational profiles of EO-CRC were compared with previously published large-scale studies. EO-CRC and The Cancer Genome Atlas (TCGA) data were further investigated according to copy number profiles and mutation timing. We classified colorectal cancer into three subgroups: the hypermutated group consisted of mutations in POLE and mismatch repair genes; the whole-genome doubling group had early functional loss of TP53 that led to whole-genome doubling and focal oncogene amplification; the genome-stable group had mutations in APC and KRAS, similar to conventional colon cancer. Among non-hypermutated samples, whole-genome doubling was more prevalent in early-onset than in late-onset disease (54% vs 38%, Fisher’s exact P = 0.04). More than half of non-hypermutated EO-CRC cases involved early TP53 mutation and whole-genome doubling, which led to notable differences in mutation frequencies between age groups. Alternative carcinogenesis involving genomic instability via loss of TP53 may be related to the rise in EO-CRC.


2000 ◽  
Vol 15 (1) ◽  
pp. 26-32 ◽  
Author(s):  
M. Cattaneo ◽  
R. Orlandi ◽  
C. Ronchini ◽  
P. Granelli ◽  
G. Malferrari ◽  
...  

We have previously reported on the isolation and chromosomal mapping of a novel human gene (SEL1L), which shows sequence similarity to sel-1, an extragenic suppressor of C. elegans. sel-1 functions as a negative regulator of lin-12 activity, the latter being implicated in the control of diverse cellular differentiation events. In the present study we compare the expression patterns of SEL1L and TAN-1, the human ortholog of lin-12 in normal and neoplastic cells. We found that, whereas both genes are expressed in fetal tissues at similar levels, they are differentially expressed in normal adult and neoplastic cells. In normal adult cells SEL1L is generally present at very low levels; only in the cells of the pancreas does it show maximum expression. By contrast, SEL1L is generally well represented in most neoplastic cells but not in those of pancreatic and gastric carcinomas, where transcription is either downregulated or completely repressed. TAN-1 on the other hand is well represented in almost all normal and neoplastic cells, with very few exceptions. Our observations suggest that SEL1L is presumably implicated in pancreatic and gastric carcinogenesis and that, along with TAN-1, it is very important for normal cell function. Alterations in the expression of SEL1L may be used as a prognostic marker for gastric and pancreatic cancers.


2020 ◽  
Author(s):  
Jennifer E. Hurtig ◽  
Minseon Kim ◽  
Luisa J. Orlando-Coronel ◽  
Jellisa Ewan ◽  
Michelle Foreman ◽  
...  

AbstractMany eukaryotes use alternative splicing to express multiple proteins from the same gene. However, while the majority of mammalian genes are alternatively spliced, other eukaryotes use this process less frequently. The budding yeast Saccharomyces cerevisiae has been successfully used to study the mechanism of splicing and the splicing machinery, but alternative splicing in yeast is relatively rare and has not been extensively studied. We have recently shown that the alternative splicing of SKI7/HBS1 is widely conserved, but that yeast and a few other eukaryotes have replaced this one alternatively spliced gene with a pair of duplicated unspliced genes as part of a whole genome doubling (WGD). Here we show that other examples of alternative splicing that were previously found to have functional consequences are widely conserved within the Saccharomycotina. We also show that the most common mechanism by which alternative splicing has disappeared is by the replacement of an alternatively spliced gene with duplicate genes. Saccharomycetaceae that diverged before WGD use alternative splicing more frequently than S. cerevisiae. This suggests that the WGD is a major reason for the low frequency of alternative splicing in yeast. We anticipate that whole genome doublings in other lineages may have had the same effect.


2021 ◽  
Author(s):  
Sara Vanessa Bernhard ◽  
Katarzyna Seget-Trzensiok ◽  
Christian Kuffer ◽  
Dragomir B. Krastev ◽  
Lisa-Marie Stautmeister ◽  
...  

Abstract Background Whole genome doubling is a frequent event during cancer evolution and shapes the cancer genome due to the occurrence of chromosomal instability. Yet, erroneously arising human tetraploid cells usually do not proliferate due to p53 activation that leads to CDKN1A expression, cell cycle arrest, senescence and/or apoptosis. Methods To uncover the barriers that block the proliferation of tetraploids, we performed a RNAi mediated genome-wide screen in a human colorectal cancer cell line (HCT116). Results We identified 140 genes whose depletion improved the survival of tetraploid cells and characterized in depth two of them: SPINT2 and USP28. We found that SPINT2 is a general regulator of CDKN1A transcription via histone acetylation. Using mass spectrometry and immunoprecipitation, we found that USP28 interacts with NuMA1 and affects centrosome clustering. Tetraploid cells accumulate DNA damage and loss of USP28 reduces checkpoint activation, thus facilitating their proliferation. Conclusions Our results indicate three aspects that contribute to the survival of tetraploid cells: (i) increased mitogenic signaling and reduced expression of cell cycle inhibitors, (ii) the ability to establish functional bipolar spindles and (iii) reduced DNA damage signaling.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Swati Sinha ◽  
Andrew M. Lynn ◽  
Dhwani K. Desai

Abstract Background Homology based methods are one of the most important and widely used approaches for functional annotation of high-throughput microbial genome data. A major limitation of these methods is the absence of well-characterized sequences for certain functions. The non-homology methods based on the context and the interactions of a protein are very useful for identifying missing metabolic activities and functional annotation in the absence of significant sequence similarity. In the current work, we employ both homology and context-based methods, incrementally, to identify local holes and chokepoints, whose presence in the Mycobacterium tuberculosis genome is indicated based on its interaction with known proteins in a metabolic network context, but have not been annotated. We have developed two computational procedures using network theory to identify orphan enzymes (‘Hole finding protocol’) coupled with the identification of candidate proteins for the predicted orphan enzyme (‘Hole filling protocol’). We propose an integrated interaction score based on scores from the STRING database to identify candidate protein sequences for the orphan enzymes from M. tuberculosis, as a case study, which are most likely to perform the missing function. Results The application of an automated homology-based enzyme identification protocol, ModEnzA, on M. tuberculosis genome yielded 56 novel enzyme predictions. We further predicted 74 putative local holes, 6 choke points, and 3 high confidence local holes in the genome using ‘Hole finding protocol’. The ‘Hole-filling protocol’ was validated on the E. coli genome using artificial in-silico enzyme knockouts where our method showed 25% increased accuracy, compared to other methods, in assigning the correct sequence for the knocked-out enzyme amongst the top 10 ranks. The method was further validated on 8 additional genomes. Conclusions We have developed methods that can be generalized to augment homology-based annotation to identify missing enzyme coding genes and to predict a candidate protein for them. For pathogens such as M. tuberculosis, this work holds significance in terms of increasing the protein repertoire and thereby, the potential for identifying novel drug targets.


2020 ◽  
Vol 11 ◽  
Author(s):  
Zhe Yu ◽  
Chunfang Zheng ◽  
Victor A. Albert ◽  
David Sankoff

We take advantage of synteny blocks, the analytical construct enabled at the evolutionary moment of speciation or polyploidization, to follow the independent loss of duplicate genes in two sister species or the loss through fractionation of syntenic paralogs in a doubled genome. By examining how much sequence remains after a contiguous series of genes is deleted, we find that this residue remains at a constant low level independent of how many genes are lost—there are few if any relics of the missing sequence. Pseudogenes are rare or extremely transient in this context. The potential exceptions lie exclusively with a few examples of speciation, where the synteny blocks in some larger genomes tolerate degenerate sequence during genomic divergence of two species, but not after whole genome doubling in the same species where fractionation pressure eliminates virtually all non-coding sequence.


Sign in / Sign up

Export Citation Format

Share Document