scholarly journals Evidence that inconsistent gene prediction can mislead analysis of algal genomes

2019 ◽  
Author(s):  
Yibi Chen ◽  
Raúl A. González-Pech ◽  
Timothy G. Stephens ◽  
Debashish Bhattacharya ◽  
Cheong Xin Chan

AbstractComparative algal genomics often relies on predicted gene models from de novo assembled genomes. However, the artifacts introduced by different gene-prediction approaches, and their impact on comparative genomic analysis, remain poorly understood. Here, using available genome data from six dinoflagellate species in Symbiodiniaceae, we identified potential methodological biases in the published gene models that were predicted using different approaches. We developed and applied a comprehensive customized workflow to predict genes from these genomes. The observed variation among predicted gene models resulting from our workflow agreed with current understanding of phylogenetic relationships among these taxa, whereas those published earlier were largely biased by the distinct approaches used in each instance. Importantly, these biases mislead the inference of homologous gene families and synteny among genomes, thus impacting biological interpretation of these data. Our results demonstrate that a consistent gene-prediction approach is critical for comparative genomics, particularly for non-model algal genomes.

mSphere ◽  
2019 ◽  
Vol 4 (6) ◽  
Author(s):  
Marian Dominguez-Mirazo ◽  
Rong Jin ◽  
Joshua S. Weitz

ABSTRACT Huanglongbing disease (HLB; yellow shoot disease) is a severe worldwide infectious disease for citrus family plants. The pathogen “Candidatus Liberibacter asiaticus” is an alphaproteobacterium of the Rhizobiaceae family that has been identified as the causative agent of HLB. The virulence of “Ca. Liberibacter asiaticus” has been attributed, in part, to prophage-carried genes. Prophage and prophage-like elements have been identified in 12 of the 15 available “Ca. Liberibacter asiaticus” genomes and are classified into three prophage types. Here, we reexamined all 15 “Ca. Liberibacter asiaticus” genomes using a de novo prediction approach and expanded the number of prophage-like elements from 16 to 33. Further, we found that all of the “Ca. Liberibacter asiaticus” genomes contained at least one prophage-like sequence. Comparative analysis revealed a prevalent, albeit previously unknown, prophage-like sequence type that is a remnant of an integrated prophage. Notably, this remnant prophage is found in the Ishi-1 “Ca. Liberibacter asiaticus” strain that had previously been reported as lacking prophages. Our findings provide both a resource for data and new insights into the evolutionary relationship between phage and “Ca. Liberibacter asiaticus” pathogenicity. IMPORTANCE Huanglongbing (HLB) disease is threatening citrus production worldwide. The causative agent is “Candidatus Liberibacter asiaticus.” Prior work using mapping-based approaches identified prophage-like sequences in some “Ca. Liberibacter asiaticus” genomes but not all. Here, we utilized a de novo approach that expands the number of prophage-like elements found in “Ca. Liberibacter asiaticus” from 16 to 33 and identified at least one prophage-like sequence in all “Ca. Liberibacter asiaticus” strains. Furthermore, we identified a prophage-like sequence type that is a remnant of an integrated prophage—expanding the number of prophage types in “Ca. Liberibacter asiaticus” from 3 to 4. Overall, the findings will help researchers investigate the role of prophage in the ecology, evolution, and pathogenicity of “Ca. Liberibacter asiaticus.”


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Xian-Gui Yi ◽  
Xia-Qing Yu ◽  
Jie Chen ◽  
Min Zhang ◽  
Shao-Wei Liu ◽  
...  

Abstract Cerasus serrulata is a flowering cherry germplasm resource for ornamental purposes. In this work, we present a de novo chromosome-scale genome assembly of C. serrulata by the use of Nanopore and Hi-C sequencing technologies. The assembled C. serrulata genome is 265.40 Mb across 304 contigs and 67 scaffolds, with a contig N50 of 1.56 Mb and a scaffold N50 of 31.12 Mb. It contains 29,094 coding genes, 27,611 (94.90%) of which are annotated in at least one functional database. Synteny analysis indicated that C. serrulata and C. avium have 333 syntenic blocks composed of 14,072 genes. Blocks on chromosome 01 of C. serrulata are distributed on all chromosomes of C. avium, implying that chromosome 01 is the most ancient or active of the chromosomes. The comparative genomic analysis confirmed that C. serrulata has 740 expanded gene families, 1031 contracted gene families, and 228 rapidly evolving gene families. By the use of 656 single-copy orthologs, a phylogenetic tree composed of 10 species was constructed. The present C. serrulata species diverged from Prunus yedoensis ~17.34 million years ago (Mya), while the divergence of C. serrulata and C. avium was estimated to have occurred ∼21.44 Mya. In addition, a total of 148 MADS-box family gene members were identified in C. serrulata, accompanying the loss of the AGL32 subfamily and the expansion of the SVP subfamily. The MYB and WRKY gene families comprising 372 and 66 genes could be divided into seven and eight subfamilies in C. serrulata, respectively, based on clustering analysis. Nine hundred forty-one plant disease-resistance genes (R-genes) were detected by searching C. serrulata within the PRGdb. This research provides high-quality genomic information about C. serrulata as well as insights into the evolutionary history of Cerasus species.


2019 ◽  
Author(s):  
Marian Dominguez-Mirazo ◽  
Rong Jin ◽  
Joshua S. Weitz

AbstractHuanglongbing (HLB; yellow shoot disease) is a severe worldwide infectious disease for citrus family plants. The pathogen Candidatus Liberibacter asiaticus (CLas) is an alphapro-teobacterium of the Rhizobiaceae family that has been identified as the cause. The virulence of CLas has been attributed, in part, to prophage encoded genes. Prophage and prophage like elements have been identified in 12 of the 15 CLas available genomes, and are classified into three prophage types. Here, we re-examined all 15 CLas genomes using a de novo prediction approach and expanded the number of prophage like elements from 16 to 33. Further, we find that all CLas contain at least one prophage-like sequence. Comparative analysis reveals a prevalent, albeit previously unknown, prophage-like sequence type that is a remnant of an integrated prophage. Notably, this remnant prophage is found in the Ishi-1 CLas strain that had previously been reported as lacking prophages. Our findings provide both a resource and new insights into the evolutionary relationship between phage and CLas pathogenicity.


Author(s):  
Natalia Zajac ◽  
Stefan Zoller ◽  
Katri Seppälä ◽  
David Moi ◽  
Christophe Dessimoz ◽  
...  

Abstract Gene duplications and novel genes have been shown to play a major role in helminth adaptation to a parasitic lifestyle because they provide the novelty necessary for adaptation to a changing environment, such as living in multiple hosts. Here we present the de novo sequenced and annotated genome of the parasitic trematode Atriophallophorus winterbourni and its comparative genomic analysis to other major parasitic trematodes. First, we reconstructed the species phylogeny, and dated the split of A. winterbourni from the Opisthorchiata suborder to approximately 237.4 MYA (± 120.4 MY). We then addressed the question of which expanded gene families and gained genes are potentially involved in adaptation to parasitism. To do this, we used Hierarchical Orthologous Groups to reconstruct three ancestral genomes on the phylogeny leading to A. winterbourni and performed a GO enrichment analysis of the gene composition of each ancestral genome, allowing us to characterize the subsequent genomic changes. Out of the 11,499 genes in the A. winterbourni genome, as much as 24% have arisen through duplication events since the speciation of A. winterbourni from the Opisthorchiata, and as much as 31.9% appear to be novel, i.e. newly acquired. We found 13 gene families in A. winterbourni to have had more than 10 genes arising through these recent duplications; all of which have functions potentially relating to host behavioural manipulation, host tissue penetration, and hiding from host immunity through antigen presentation. We identified several families with genes evolving under positive selection. Our results provide a valuable resource for future studies on the genomic basis of adaptation to parasitism and point to specific candidate genes putatively involved in antagonistic host-parasite adaptation.


2020 ◽  
Author(s):  
Ke Cao ◽  
Zhen Peng ◽  
Xing Zhao ◽  
Yong Li ◽  
Kuozhan Liu ◽  
...  

AbstractAs a foundation to understand the molecular mechanisms of peach evolution and high-altitude adaptation, we performed de novo genome assembling of four wild relatives of P. persica, P. mira, P. kansuensis, P. davidiana and P. ferganensis. Through comparative genomic analysis, abundant genetic variations were identified in four wild species when compared to P. persica. Among them, a deletion, located at the promoter of Prupe.2G053600 in P. kansuensis, was validated to regulate the resistance to nematode. Next, a pan-genome was constructed which comprised 15,216 core gene families among four wild peaches and P. perisca. We identified the expanded and contracted gene families in different species and investigated their roles during peach evolution. Our results indicated that P. mira was the primitive ancestor of cultivated peach, and peach evolution was non-linear and a cross event might have occurred between P. mira and P. dulcis during the process. Combined with the selective sweeps identified using accessions of P. mira originating from different altitude regions, we proposed that nitrogen recovery was essential for high-altitude adaptation of P. mira through increasing its resistance to low temperature. The pan-genome constructed in our study provides a valuable resource for developing elite cultivars, studying the peach evolution, and characterizing the high-altitude adaptation in perennial crops.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Xiaodong Qin ◽  
Zhonghua Zhang ◽  
Qunfeng Lou ◽  
Lei Xia ◽  
Ji Li ◽  
...  

AbstractCucumis hystrix Chakr. (2n = 2x = 24) is a wild species that can hybridize with cultivated cucumber (C. sativus L., 2n = 2x = 14), a globally important vegetable crop. However, cucumber breeding is hindered by its narrow genetic base. Therefore, introgression from C. hystrix has been anticipated to bring a breakthrough in cucumber improvement. Here, we report the chromosome-scale assembly of C. hystrix genome (289 Mb). Scaffold N50 reached 14.1 Mb. Over 90% of the sequences were anchored onto 12 chromosomes. A total of 23,864 genes were annotated using a hybrid method. Further, we conducted a comprehensive comparative genomic analysis of cucumber, C. hystrix, and melon (C. melo L., 2n = 2x = 24). Whole-genome comparisons revealed that C. hystrix is phylogenetically closer to cucumber than to melon, providing a molecular basis for the success of its hybridization with cucumber. Moreover, expanded gene families of C. hystrix were significantly enriched in “defense response,” and C. hystrix harbored 104 nucleotide-binding site–encoding disease resistance gene analogs. Furthermore, 121 genes were positively selected, and 12 (9.9%) of these were involved in responses to biotic stimuli, which might explain the high disease resistance of C. hystrix. The alignment of whole C. hystrix genome with cucumber genome and self-alignment revealed 45,417 chromosome-specific sequences evenly distributed on C. hystrix chromosomes. Finally, we developed four cucumber–C. hystrix alien addition lines and identified the exact introgressed chromosome using molecular and cytological methods. The assembled C. hystrix genome can serve as a valuable resource for studies on Cucumis evolution and interspecific introgression breeding of cucumber.


2021 ◽  
Author(s):  
Zhenghui Liu ◽  
Yitong Zhao ◽  
Frederick Leo Sossah ◽  
Benjamin Azu Okorley ◽  
Daniel G. Amoako ◽  
...  

Since 2016, devastating bacterial blotch affecting the fruiting bodies of Agaricus bisporus, Cordyceps militaris, Flammulina filiformis, and Pleurotus ostreatus in China has caused severe economic losses. We isolated 102 bacterial strains and characterized them polyphasically. We identified the causal agent as Pseudomonas tolaasii and confirmed the pathogenicity of the strains. A host range test further confirmed the pathogen’s ability to infect multiple hosts. This is the first report in China of bacterial blotch in C. militaris caused by P. tolaasii. Whole-genome sequences were generated for three strains: Pt11 (6.48 Mb), Pt51 (6.63 Mb), and Pt53 (6.80 Mb), and pangenome analysis was performed with 13 other publicly accessible P. tolaasii genomes to determine their genetic diversity, virulence, antibiotic resistance, and mobile genetic elements. The pangenome of P. tolaasii is open, and many more gene families are likely to emerge with further genome sequencing. Multilocus sequence analysis using the sequences of four common housekeeping genes (glns, gyrB, rpoB, and rpoD) showed high genetic variability among the P. tolaasii strains, with 115 strains clustered into a monophyletic group. The P. tolaasii strains possess various genes for secretion systems, virulence factors, carbohydrate-active enzymes, toxins, secondary metabolites, and antimicrobial resistance genes that are associated with pathogenesis and adapted to different environments. The myriad of insertion sequences, integrons, prophages, and genome islands encoded in the strains may contribute to genome plasticity, virulence, and antibiotic resistance. These findings advance understanding of the determinants of virulence, which can be targeted for the effective control of bacterial blotch disease.


2021 ◽  
Author(s):  
Xinxin Yi ◽  
Jing Liu ◽  
Shengcai Chen ◽  
Hao Wu ◽  
Min Liu ◽  
...  

Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromsome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with three published soybeans (WM82, ZH13 and W05) , which identified five large inversions and two large translocations specific to JD17, 20,984 - 46,912 PAVs spanning 13.1 - 46.9 Mb in size, and 5 - 53 large PAV clusters larger than 500kb. 1,695,741 - 3,664,629 SNPs and 446,689 - 800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation (SNF) genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Pingping Liang ◽  
Hafiz Sohaib Ahmed Saqib ◽  
Xiaomin Ni ◽  
Yingjia Shen

Abstract Background Marine medaka (Oryzias melastigma) is considered as an important ecotoxicological indicator to study the biochemical, physiological and molecular responses of marine organisms towards increasing amount of pollutants in marine and estuarine waters. Results In this study, we reported a high-quality and accurate de novo genome assembly of marine medaka through the integration of single-molecule sequencing, Illumina paired-end sequencing, and 10X Genomics linked-reads. The 844.17 Mb assembly is estimated to cover more than 98% of the genome and is more continuous with fewer gaps and errors than the previous genome assembly. Comparison of O. melastigma with closely related species showed significant expansion of gene families associated with DNA repair and ATP-binding cassette (ABC) transporter pathways. We identified 274 genes that appear to be under significant positive selection and are involved in DNA repair, cellular transportation processes, conservation and stability of the genome. The positive selection of genes and the considerable expansion in gene numbers, especially related to stimulus responses provide strong supports for adaptations of O. melastigma under varying environmental stresses. Conclusions The highly contiguous marine medaka genome and comparative genomic analyses will increase our understanding of the underlying mechanisms related to its extraordinary adaptation capability, leading towards acceleration in the ongoing and future investigations in marine ecotoxicology.


2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Dmitrii E. Polev ◽  
Iuliia K. Karnaukhova ◽  
Larisa L. Krukovskaya ◽  
Andrei P. Kozlov

Human geneLOC100505644 uncharacterized LOC100505644 [Homo sapiens](Entrez Gene ID 100505644) is abundantly expressed in tumors but weakly expressed in few normal tissues. Till now the function of this gene remains unknown. Here we identified the chromosomal borders of the transcribed region and the major splice form of theLOC100505644-specific transcript. We characterised the major regulatory motifs of the gene and its splice sites. Analysis of the secondary structure of the major transcript variant revealed a hairpin-like structure characteristic for precursor microRNAs. Comparative genomic analysis of the locus showed that it originated in primatesde novo. Taken together, our data indicate that human geneLOC100505644encodes some non-protein coding RNA, likely a microRNA. It was assigned a gene symbolELFN1-AS1(ELFN1 antisense RNA 1 (non-protein coding)). This gene combines features of evolutionary novelty and predominant expression in tumors.


Sign in / Sign up

Export Citation Format

Share Document