scholarly journals The community-curated Pristionchus pacificus genome facilitates automated gene annotation improvement in related nematodes

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Christian Rödelsperger

Abstract Background The nematode Pristionchus pacificus is an established model organism for comparative studies with Caenorhabditis elegans. Over the past years, it developed into an independent animal model organism for elucidating the genetic basis of phenotypic plasticity. Community-based curations were employed recently to improve the quality of gene annotations of P. pacificus and to more easily facilitate reverse genetic studies using candidate genes from C. elegans. Results Here, I demonstrate that the reannotation of phylogenomic data from nine related nematode species using the community-curated P. pacificus gene set as homology data substantially improves the quality of gene annotations. Benchmarking of universal single copy orthologs (BUSCO) estimates a median completeness of 84% which corresponds to a 9% increase over previous annotations. Nevertheless, the ability to infer gene models based on homology already drops beyond the genus level reflecting the rapid evolution of nematode lineages. This also indicates that the highly curated C. elegans genome is not optimally suited for annotating non-Caenorhabditis genomes based on homology. Furthermore, comparative genomic analysis of apparently missing BUSCO genes indicates a failure of ortholog detection by the BUSCO pipeline due to the insufficient sample size and phylogenetic breadth of the underlying OrthoDB data set. As a consequence, the quality of multiple divergent nematode genomes might be underestimated. Conclusions This study highlights the need for optimizing gene annotation protocols and it demonstrates the benefit of a high quality genome for phylogenomic data of related species.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Marko Premzl

AbstractThe eutherian connexins were characterized as protein constituents of gap junctions implicated in cell-cell communications between adjoining cells in multiple cell types, regulation of major physiological processes and disease pathogeneses. However, conventional connexin gene and protein classifications could be regarded as unsuitable in descriptions of comprehensive eutherian connexin gene data sets, due to ambiguities and inconsistencies in connexin gene and protein nomenclatures. Using eutherian comparative genomic analysis protocol and 35 public eutherian reference genomic sequence data sets, the present analysis attempted to update and revise comprehensive eutherian connexin gene data sets, and address and resolve major discrepancies in their descriptions. Among 631 potential coding sequences, the tests of reliability of eutherian public genomic sequences annotated, in aggregate, 349 connexin complete coding sequences. The most comprehensive curated eutherian connexin gene data set described 21 major gene clusters, 4 of which included evidence of differential gene expansions. For example, the present gene annotations initially described human CXNK1 gene and annotated 22 human connexin genes. Phylogenetic tree calculations and calculations of pairwise nucleotide sequence identity patterns proposed revised and updated phylogenetic classification of eutherian connexin genes. Therefore, the present study integrating gene annotations, phylogenetic analysis and protein molecular evolution analysis proposed new nomenclature of eutherian connexin genes and proteins.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Marina Athanasouli ◽  
Hanh Witte ◽  
Christian Weiler ◽  
Tobias Loschko ◽  
Gabi Eberhardt ◽  
...  

Abstract Background Nematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics. Results Here, we combine comparative genomic screens for suspicious gene models with community-based curation to further improve the quality of gene annotations in P. pacificus. We extend previous curations of one-to-one orthologs to larger gene families and also orphan genes. Cross-species comparisons of protein lengths, screens for atypical domain combinations and species-specific orphan genes resulted in 4311 candidate genes that were subject to community-based curation. Corrections for 2946 gene models were implemented in a new version of the P. pacificus gene annotations. The new set of gene annotations contains 28,896 genes and has a single copy ortholog completeness level of 97.6%. Conclusions Our work demonstrates the effectiveness of comparative genomic screens to identify suspicious gene models and the scalability of community-based approaches to improve the quality of thousands of gene models. Similar community-based approaches can help to improve the quality of gene annotations in other invertebrate species, including parasitic nematodes.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Christian Rödelsperger ◽  
Marina Athanasouli ◽  
Maša Lenuzzi ◽  
Tobias Theska ◽  
Shuai Sun ◽  
...  

AbstractNematodes such as Caenorhabditis elegans are powerful systems to study basically all aspects of biology. Their species richness together with tremendous genetic knowledge from C. elegans facilitate the evolutionary study of biological functions using reverse genetics. However, the ability to identify orthologs of candidate genes in other species can be hampered by erroneous gene annotations. To improve gene annotation in the nematode model organism Pristionchus pacificus, we performed a genome-wide screen for C. elegans genes with potentially incorrectly annotated P. pacificus orthologs. We initiated a community-based project to manually inspect more than two thousand candidate loci and to propose new gene models based on recently generated Iso-seq and RNA-seq data. In most cases, misannotation of C. elegans orthologs was due to artificially fused gene predictions and completely missing gene models. The community-based curation raised the gene count from 25,517 to 28,036 and increased the single copy ortholog completeness level from 86% to 97%. This pilot study demonstrates how even small-scale crowdsourcing can drastically improve gene annotations. In future, similar approaches can be used for other species, gene sets, and even larger communities thus making manual annotation of large parts of the genome feasible.


2020 ◽  
Author(s):  
Marina Athanasouli ◽  
Hanh Witte ◽  
Christian Weiler ◽  
Tobias Loschko ◽  
Gabi Eberhardt ◽  
...  

AbstractBackgroundNematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics.ResultsHere, we combine comparative genomic screens for suspicious gene models with community-based curation to further improve the quality of gene annotations in P. pacificus. We extend previous curations of one-to-one orthologs to larger gene families and also orphan genes. Cross-species comparisons of protein lengths and screens for atypical domain combinations and species-specific orphan genes resulted in 4,221 candidate genes that were subject to community-based curation. Corrections for 2,851 gene models were implemented in a new version of the P. pacificus gene annotations. The new set of gene annotations contains 28,896 genes and has a single copy ortholog completeness level of 97.6%.ConclusionsOur work demonstrates the effectiveness of comparative genomic screens to identify suspicious gene models and the scalability of community-based approaches to improve the quality of thousands of gene models. Similar community-based approaches can help to improve the quality of gene annotations in other invertebrate species, including parasitic nematodes.


Biosensors ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 257
Author(s):  
Sebastian Fudickar ◽  
Eike Jannik Nustede ◽  
Eike Dreyer ◽  
Julia Bornhorst

Caenorhabditis elegans (C. elegans) is an important model organism for studying molecular genetics, developmental biology, neuroscience, and cell biology. Advantages of the model organism include its rapid development and aging, easy cultivation, and genetic tractability. C. elegans has been proven to be a well-suited model to study toxicity with identified toxic compounds closely matching those observed in mammals. For phenotypic screening, especially the worm number and the locomotion are of central importance. Traditional methods such as human counting or analyzing high-resolution microscope images are time-consuming and rather low throughput. The article explores the feasibility of low-cost, low-resolution do-it-yourself microscopes for image acquisition and automated evaluation by deep learning methods to reduce cost and allow high-throughput screening strategies. An image acquisition system is proposed within these constraints and used to create a large data-set of whole Petri dishes containing C. elegans. By utilizing the object detection framework Mask R-CNN, the nematodes are located, classified, and their contours predicted. The system has a precision of 0.96 and a recall of 0.956, resulting in an F1-Score of 0.958. Considering only correctly located C. elegans with an [email protected] IoU, the system achieved an average precision of 0.902 and a corresponding F1 Score of 0.906.


2019 ◽  
Author(s):  
Céline N. Martineau ◽  
André E. X. Brown ◽  
Patrick Laurent

AbstractAgeing affects a wide range of phenotypes at all scales, but an objective measure of ageing remains challenging, even in simple model organisms. We assumed that a wide range of phenotypes at the organismal scale rather than a limited number of biomarkers of ageing would best describe the ageing process. Hundreds of morphological, postural and behavioural features are extracted at once from high resolutions videos. A quantitative model using this multi-parametric dataset can predict the biological age and lifespan of individual C. elegans. We show that the quality of predictions on a held-out data set increases with the number of features added to the model, supporting our initial hypothesis. Despite the large diversity of ageing mechanisms, including stochastic insults, our results highlight a robust ageing trajectory, but variable ageing rates along that trajectory. We show that healthspan, which we defined as the range of abilities of the animals, is correlated to lifespan in wild-type worms.


2021 ◽  
Vol 22 (19) ◽  
pp. 10741
Author(s):  
Yaqian Xiao ◽  
Panning Wang ◽  
Xuesi Zhu ◽  
Zhixiong Xie

Pseudomonas donghuensis HYS is more virulent than P. aeruginosa toward Caenorhabditis elegans but the mechanism underlying virulence is unclear. This study is the first to report that the specific gene cluster gtrA/B/II in P. donghuensis HYS is involved in the virulence of this strain toward C. elegans, and there are no reports of GtrA, GtrB and GtrII in any Pseudomonas species. The pathogenicity of P. donghuensis HYS was evaluated using C. elegans as a host. Based on the prediction of virulence factors and comparative genomic analysis of P. donghuensis HYS, we identified 42 specific virulence genes in P. donghuensis HYS. Slow-killing assays of these genes showed that the gtrAB mutation had the greatest effect on the virulence of P. donghuensis HYS, and GtrA, GtrB and GtrII all positively affected P. donghuensis HYS virulence. Two critical GtrII residues (Glu47 and Lys480) were identified in P. donghuensis HYS. Transmission electron microscopy (TEM) showed that GtrA, GtrB and GtrII were involved in the glucosylation of lipopolysaccharide (LPS) O-antigen in P. donghuensis HYS. Furthermore, colony-forming unit (CFU) assays showed that GtrA, GtrB and GtrII significantly enhanced P. donghuensis HYS colonization in the gut of C. elegans, and glucosylation of LPS O-antigen and colonization in the host intestine contributed to the pathogenicity of P. donghuensis HYS. In addition, experiments using the worm mutants ZD101, KU4 and KU25 revealed a correlation between P. donghuensis HYS virulence and the TIR-1/SEK-1/PMK-1 pathways of the innate immune p38 MAPK pathway in C. elegans. In conclusion, these results reveal that the specific virulence gene cluster gtrA/B/II contributes to the unique pathogenicity of HYS compared with other pathogenic Pseudomonas, and that this process also involves C. elegans innate immunity. These findings significantly increase the available information about GtrA/GtrB/GtrII-based virulence mechanisms in the genus Pseudomonas.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3266 ◽  
Author(s):  
Emma C. Wallace ◽  
Lina M. Quesada-Ocampo

Downy mildew pathogens affect several economically important crops worldwide but, due to their obligate nature, few genetic resources are available for genomic and population analyses. Draft genomes for emergent downy mildew pathogens such as the oomycete Pseudoperonospora cubensis, causal agent of cucurbit downy mildew, have been published and can be used to perform comparative genomic analysis and develop tools such as microsatellites to characterize pathogen population structure. We used bioinformatics to identify 2,738 microsatellites in the P. cubensis predicted transcriptome and evaluate them for transferability to the hop downy mildew pathogen, Pseudoperonospora humuli, since no draft genome is available for this species. We also compared the microsatellite repertoire of P. cubensis to that of the model organism Hyaloperonospora arabidopsidis, which causes downy mildew in Arabidopsis. Although trends in frequency of motif-type were similar, the percentage of SSRs identified from P. cubensis transcripts differed significantly from H. arabidopsidis. The majority of a subset of microsatellites selected for laboratory validation (92%) produced a product in P. cubensis isolates, and 83 microsatellites demonstrated transferability to P. humuli. Eleven microsatellites were found to be polymorphic and consistently amplified in P. cubensis isolates. Analysis of Pseudoperonospora isolates from diverse hosts and locations revealed higher diversity in P. cubensis compared to P. humuli isolates. These microsatellites will be useful in efforts to better understand relationships within Pseudoperonospora species and P. cubensis on a population level.


2021 ◽  
Author(s):  
Milyausha Kaskinova ◽  
Bayazit Yunusbayev ◽  
Radick Altinbaev ◽  
Rika Raffiudin ◽  
Madeline H. Carpenter ◽  
...  

ABSTRACTApis mellifera L., the western honey bee is a major crop pollinator that plays a key role in beekeeping and serves as an important model organism in social behavior studies. Recent efforts have improved on the quality of the honey bee reference genome and developed a chromosome-level assembly of sixteen chromosomes, two of which are gapless. However, the rest suffer from 51 gaps, 160 unplaced/unlocalized scaffolds, and the lack of 2 distal telomeres. The gaps are located at the hard-to-assemble extended highly repetitive chromosomal regions that may contain functional genomic elements. Here, we use de-novo re-assemblies from the most recent reference genome Amel_HAv_3.1 raw reads and other long-read-based assemblies (INRA_AMelMel_1.0, ASM1384120v1, and ASM1384124v1) of the honey bee genome to resolve 13 gaps, five unplaced/unlocalized scaffolds and, the lacking telomeres of the Amel_HAv_3.1. The total length of the resolved gaps is 848,747 bp. The accuracy of the corrected assembly was validated by mapping PacBio reads and performing gene annotation assessment. Comparative analysis suggests that the PacBio-reads-based assemblies of the honey bee genomes failed in the same highly repetitive extended regions of the chromosomes, especially on chromosome 10. To fully resolve these extended repetitive regions, further work using ultra-long Nanopore sequencing would be needed. Our updated assembly facilitates more accurate reference-guided scaffolding and marker/sequence mapping in honey bee genomics studies.


2021 ◽  
Vol 9 ◽  
Author(s):  
Chun Fu ◽  
WenCong Long ◽  
ChaoBing Luo ◽  
Xiong Nong ◽  
XiMeng Xiao ◽  
...  

Background: The most severe insect damage to bamboo shoots is the bamboo-snout beetle (Cyrtotrachelus buqueti). Bamboo is a perennial plant that has significant economic value. C. buqueti also plays a vital role in the degradation of bamboo lignocellulose and causing damage. The genome sequencing and functional gene annotation of C. buqueti are of great significance to reveal the molecular mechanism of its efficient degradation of bamboo fiber and the development of the bamboo industry.Results: The size of C. buqueti genome was close to 600.92 Mb by building a one paired-end (PE) library and k-mer analysis. Then, we developed nine 20-kb SMRTbell libraries for genome sequencing and got a total of 51.12 Gb of the original PacBio sequel reads. Furthermore, after filtering with a coverage depth of 85.06×, clean reads with 48.71 Gb were obtained. The final size of C. buqueti genome is 633.85 Mb after being assembled and measured, and the contig N50 of C. buqueti genome is 27.93 Mb. The value of contig N50 shows that the assembly quality of C. buqueti genome exceeds that of most published insect genomes. The size of the gene sequence located on chromosomes reaches 630.86 Mb, accounting for 99.53% of the genome sequence. A 1,063 conserved genes were collected at this assembled genome, comprising 99.72% of the overall genes with 1,066 using the Benchmark Uniform Single-Copy Orthology (BUSCO). Moreover, 63.78% of the C. buqueti genome is repetitive, and 57.15% is redundant with long-term elements. A 12,569 protein-coding genes distributed on 12 chromosomes were acquired after function annotation, of which 96.18% were functional genes. The comparative genomic analysis results revealed that C. buqueti was similar to D. ponderosae. Moreover, the comparative analysis of specific genes in C. buqueti genome showed that it had 244 unique lignocellulose degradation genes and 240 genes related to energy production and conversion. At the same time, 73 P450 genes and 30 GST genes were identified, respectively, in the C. buqueti genome.Conclusion: The high-quality C. buqueti genome has been obtained in the present study. The assembly level of this insect’s genome is higher than that of other most reported insects’ genomes. The phylogenetic analysis of P450 and GST gene family showed that C. buqueti had a vital detoxification function to plant chemical components.


Sign in / Sign up

Export Citation Format

Share Document