Curated BLAST for Genomes

ABSTRACT Curated BLAST for Genomes finds candidate genes for a process or an enzymatic activity within a genome of interest. In contrast to annotation tools, which usually predict a single activity for each protein, Curated BLAST asks if any of the proteins in the genome are similar to characterized proteins that are relevant. Given a query such as an enzyme’s name or an EC number, Curated BLAST searches the curated descriptions of over 100,000 characterized proteins, and it compares the relevant characterized proteins to the predicted proteins in the genome of interest. In case of errors in the gene models, Curated BLAST also searches the six-frame translation of the genome. Curated BLAST is available at http://papers.genomics.lbl.gov/curated. IMPORTANCE Given a microbe’s genome sequence, we often want to predict what capabilities the organism has, such as which nutrients it requires or which energy sources it can use. Or, we know the organism has a capability and we want to find the genes involved. Scientists often use automated gene annotations to find relevant genes, but automated annotations are often vague or incorrect. Curated BLAST finds candidate genes for a capability without relying on automated annotations. First, Curated BLAST finds proteins (usually from other organisms) whose functions have been studied experimentally and whose curated descriptions match a query. Then, it searches the genome of interest for similar proteins and returns a list of candidates. Curated BLAST is fast and often finds relevant genes that are missed by automated annotation.

Download Full-text

Curated BLAST for Genomes

10.1101/533430 ◽

2019 ◽

Author(s):

Morgan N. Price ◽

Adam P. Arkin

Keyword(s):

Enzymatic Activity ◽

Candidate Genes ◽

Link Type ◽

A Genome ◽

Gene Models

Abstract“Curated BLAST for Genomes” finds candidate genes for a process or an enzymatic activity within a genome of interest. In contrast to annotation tools, which usually predict a single activity for each protein, Curated BLAST asks if any of the proteins in the genome are similar to characterized proteins that are relevant. Given a query such as an enzyme’s name or an EC number, Curated BLAST searches the curated descriptions of over 100,000 characterized proteins, and it compares the relevant characterized proteins to the predicted proteins in the genome of interest. In case of errors in the gene models, Curated BLAST also searches the six-frame translation of the genome. Curated BLAST is available at http://papers.genomics.lbl.gov/curated.

Download Full-text

Comparative genomics and community curation further improve gene annotations in the nematode Pristionchus pacificus

BMC Genomics ◽

10.1186/s12864-020-07100-0 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Marina Athanasouli ◽

Hanh Witte ◽

Christian Weiler ◽

Tobias Loschko ◽

Gabi Eberhardt ◽

...

Keyword(s):

Candidate Genes ◽

Model Organisms ◽

Parasitic Nematodes ◽

Comparative Genomic ◽

Orphan Genes ◽

Community Based ◽

Pristionchus Pacificus ◽

Gene Annotations ◽

Gene Models

Abstract Background Nematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics. Results Here, we combine comparative genomic screens for suspicious gene models with community-based curation to further improve the quality of gene annotations in P. pacificus. We extend previous curations of one-to-one orthologs to larger gene families and also orphan genes. Cross-species comparisons of protein lengths, screens for atypical domain combinations and species-specific orphan genes resulted in 4311 candidate genes that were subject to community-based curation. Corrections for 2946 gene models were implemented in a new version of the P. pacificus gene annotations. The new set of gene annotations contains 28,896 genes and has a single copy ortholog completeness level of 97.6%. Conclusions Our work demonstrates the effectiveness of comparative genomic screens to identify suspicious gene models and the scalability of community-based approaches to improve the quality of thousands of gene models. Similar community-based approaches can help to improve the quality of gene annotations in other invertebrate species, including parasitic nematodes.

Download Full-text

Development and mixed-methods evaluation of an online animation for young people about genome sequencing

European Journal of Human Genetics ◽

10.1038/s41431-019-0564-5 ◽

2020 ◽

Vol 28 (7) ◽

pp. 896-906 ◽

Cited By ~ 1

Author(s):

Celine Lewis ◽

Saskia C. Sanderson ◽

Jennifer Hammond ◽

Melissa Hill ◽

Beverly Searle ◽

...

Keyword(s):

Young People ◽

Genome Sequencing ◽

Genome Sequence ◽

Active Role ◽

Genomic Variation ◽

Future Research ◽

Children And Young People ◽

Link Type ◽

Mixed Methods Evaluation ◽

A Genome

AbstractChildren and young people with rare and inherited diseases will be significant beneficiaries of genome sequencing. However, most educational resources are developed for adults. To address this gap in informational resources, we have co-designed, developed and evaluated an educational resource about genome sequencing for young people. The first animation explains what a genome is, genomic variation and genome sequencing (“My Genome Sequence”: http://bit.ly/mygenomesequence), the second focuses on the limitations and uncertainties of genome sequencing (“My Genome Sequence part 2”: http://bit.ly/mygenomesequence2). In total, 554 school pupils (11–15 years) took part in the quantitative evaluation. Mean objective knowledge increased from before to after watching one or both animations (4.24 vs 7.60 respectively; t = 32.16, p < 0.001). Self-rated awareness and understanding of the words ‘genome’ and ‘genome sequencing’ increased significantly after watching the animation. Most pupils felt they understood the benefits of sequencing after watching one (75.4%) or both animations (76.6%). Only 17.3% felt they understood the limitations and uncertainties after watching the first, however this was higher among those watching both (58.5%, p < 0.001). Twelve young people, 14 parents and 3 health professionals consenting in the 100,000 Genomes Project reported that the animation was clear and engaging, eased concerns about the process and empowered young people to take an active role in decision-making. To increase accessibility, subtitles in other languages could be added, and the script could be made available in a leaflet format for those that do not have internet access. Future research could focus on formally evaluating the animations in a clinical setting.

Download Full-text

Crowdsourcing and the feasibility of manual gene annotation: A pilot study in the nematode Pristionchus pacificus

Scientific Reports ◽

10.1038/s41598-019-55359-5 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 6

Author(s):

Christian Rödelsperger ◽

Marina Athanasouli ◽

Maša Lenuzzi ◽

Tobias Theska ◽

Shuai Sun ◽

...

Keyword(s):

Pilot Study ◽

Reverse Genetics ◽

Gene Annotation ◽

Small Scale ◽

Community Based ◽

Pristionchus Pacificus ◽

C Elegans ◽

Gene Annotations ◽

A Genome ◽

Gene Models

AbstractNematodes such as Caenorhabditis elegans are powerful systems to study basically all aspects of biology. Their species richness together with tremendous genetic knowledge from C. elegans facilitate the evolutionary study of biological functions using reverse genetics. However, the ability to identify orthologs of candidate genes in other species can be hampered by erroneous gene annotations. To improve gene annotation in the nematode model organism Pristionchus pacificus, we performed a genome-wide screen for C. elegans genes with potentially incorrectly annotated P. pacificus orthologs. We initiated a community-based project to manually inspect more than two thousand candidate loci and to propose new gene models based on recently generated Iso-seq and RNA-seq data. In most cases, misannotation of C. elegans orthologs was due to artificially fused gene predictions and completely missing gene models. The community-based curation raised the gene count from 25,517 to 28,036 and increased the single copy ortholog completeness level from 86% to 97%. This pilot study demonstrates how even small-scale crowdsourcing can drastically improve gene annotations. In future, similar approaches can be used for other species, gene sets, and even larger communities thus making manual annotation of large parts of the genome feasible.

Download Full-text

Comparative genomics and community curation further improve gene annotations in the nematode Pristionchus pacificus

10.1101/2020.08.03.233726 ◽

2020 ◽

Author(s):

Marina Athanasouli ◽

Hanh Witte ◽

Christian Weiler ◽

Tobias Loschko ◽

Gabi Eberhardt ◽

...

Keyword(s):

Candidate Genes ◽

Model Organisms ◽

Parasitic Nematodes ◽

Comparative Genomic ◽

Orphan Genes ◽

Community Based ◽

Pristionchus Pacificus ◽

Gene Annotations ◽

Gene Models

AbstractBackgroundNematode model organisms such as Caenorhabditis elegans and Pristionchus pacificus are powerful systems for studying the evolution of gene function at a mechanistic level. However, the identification of P. pacificus orthologs of candidate genes known from C. elegans is complicated by the discrepancy in the quality of gene annotations, a common problem in nematode and invertebrate genomics.ResultsHere, we combine comparative genomic screens for suspicious gene models with community-based curation to further improve the quality of gene annotations in P. pacificus. We extend previous curations of one-to-one orthologs to larger gene families and also orphan genes. Cross-species comparisons of protein lengths and screens for atypical domain combinations and species-specific orphan genes resulted in 4,221 candidate genes that were subject to community-based curation. Corrections for 2,851 gene models were implemented in a new version of the P. pacificus gene annotations. The new set of gene annotations contains 28,896 genes and has a single copy ortholog completeness level of 97.6%.ConclusionsOur work demonstrates the effectiveness of comparative genomic screens to identify suspicious gene models and the scalability of community-based approaches to improve the quality of thousands of gene models. Similar community-based approaches can help to improve the quality of gene annotations in other invertebrate species, including parasitic nematodes.

Download Full-text

Faculty Opinions recommendation of Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.737748193.793574951 ◽

2020 ◽

Author(s):

Zhaosheng Kong

Keyword(s):

Gossypium Hirsutum ◽

Genome Evolution ◽

Genome Sequence ◽

Gossypium Arboreum ◽

A Genome ◽

Gossypium Herbaceum

Download Full-text

The Genetic Architecture of Ovariole Number in Drosophila melanogaster: Genes with Major, Quantitative, and Pleiotropic Effects

G3 Genes|Genome|Genetics ◽

10.1534/g3.117.042390 ◽

2017 ◽

Vol 7 (7) ◽

pp. 2391-2403 ◽

Cited By ~ 11

Author(s):

Amanda S Lobell ◽

Rachel R Kaspari ◽

Yazmin L Serrano Negron ◽

Susan T Harbison

Keyword(s):

Candidate Genes ◽

Genome Wide Association Study ◽

Natural Populations ◽

Direct Role ◽

Genome Wide ◽

A Genome ◽

Fitness Trait ◽

Sleep Parameters ◽

Activity Behavior ◽

Ovariole Number

Abstract Ovariole number has a direct role in the number of eggs produced by an insect, suggesting that it is a key morphological fitness trait. Many studies have documented the variability of ovariole number and its relationship to other fitness and life-history traits in natural populations of Drosophila. However, the genes contributing to this variability are largely unknown. Here, we conducted a genome-wide association study of ovariole number in a natural population of flies. Using mutations and RNAi-mediated knockdown, we confirmed the effects of 24 candidate genes on ovariole number, including a novel gene, anneboleyn (formerly CG32000), that impacts both ovariole morphology and numbers of offspring produced. We also identified pleiotropic genes between ovariole number traits and sleep and activity behavior. While few polymorphisms overlapped between sleep parameters and ovariole number, 39 candidate genes were nevertheless in common. We verified the effects of seven genes on both ovariole number and sleep: bin3, blot, CG42389, kirre, slim, VAChT, and zfh1. Linkage disequilibrium among the polymorphisms in these common genes was low, suggesting that these polymorphisms may evolve independently.

Download Full-text

Integrating genome sequence and structural data for statistical learning to predict transcription factor binding sites

Nucleic Acids Research ◽

10.1093/nar/gkaa1134 ◽

2020 ◽

Vol 48 (22) ◽

pp. 12604-12617

Author(s):

Pengpeng Long ◽

Lu Zhang ◽

Bin Huang ◽

Quan Chen ◽

Haiyan Liu

Keyword(s):

Genome Sequence ◽

Energy Function ◽

Structural Information ◽

Structural Data ◽

P Values ◽

A Genome ◽

Z Scores ◽

Transcription Regulators ◽

Dna Specificity ◽

Tetracycline Repressor

Abstract We report an approach to predict DNA specificity of the tetracycline repressor (TetR) family transcription regulators (TFRs). First, a genome sequence-based method was streamlined with quantitative P-values defined to filter out reliable predictions. Then, a framework was introduced to incorporate structural data and to train a statistical energy function to score the pairing between TFR and TFR binding site (TFBS) based on sequences. The predictions benchmarked against experiments, TFBSs for 29 out of 30 TFRs were correctly predicted by either the genome sequence-based or the statistical energy-based method. Using P-values or Z-scores as indicators, we estimate that 59.6% of TFRs are covered with relatively reliable predictions by at least one of the two methods, while only 28.7% are covered by the genome sequence-based method alone. Our approach predicts a large number of new TFBs which cannot be correctly retrieved from public databases such as FootprintDB. High-throughput experimental assays suggest that the statistical energy can model the TFBSs of a significant number of TFRs reliably. Thus the energy function may be applied to explore for new TFBSs in respective genomes. It is possible to extend our approach to other transcriptional factor families with sufficient structural information.

Download Full-text

Genomic Characterization Provides an Insight into the Pathogenicity of the Poplar Canker Bacterium Lonsdalea populi

Genes ◽

10.3390/genes12020246 ◽

2021 ◽

Vol 12 (2) ◽

pp. 246

Author(s):

Xiaomeng Chen ◽

Rui Li ◽

Yonglin Wang ◽

Aining Li

Keyword(s):

Genome Sequence ◽

Extracellular Enzymes ◽

De Novo ◽

Whole Genome Sequence ◽

Hybrid Poplars ◽

A Genome ◽

Conserved Genes ◽

Genomic Characterization ◽

Molecular Bases ◽

Insight Into

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.

Download Full-text

Genome-Wide Association Analysis of Growth Curve Parameters in Chinese Simmental Beef Cattle

Animals ◽

10.3390/ani11010192 ◽

2021 ◽

Vol 11 (1) ◽

pp. 192

Author(s):

Xinghai Duan ◽

Bingxing An ◽

Lili Du ◽

Tianpeng Chang ◽

Mang Liang ◽

...

Keyword(s):

Beef Cattle ◽

Candidate Genes ◽

Growth Curve ◽

Growth And Development ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Coefficient Of Determination ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

A Genome

The objective of the present study was to perform a genome-wide association study (GWAS) for growth curve parameters using nonlinear models that fit original weight–age records. In this study, data from 808 Chinese Simmental beef cattle that were weighed at 0, 6, 12, and 18 months of age were used to fit the growth curve. The Gompertz model showed the highest coefficient of determination (R2 = 0.954). The parameters’ mature body weight (A), time-scale parameter (b), and maturity rate (K) were treated as phenotypes for single-trait GWAS and multi-trait GWAS. In total, 9, 49, and 7 significant SNPs associated with A, b, and K were identified by single-trait GWAS; 22 significant single nucleotide polymorphisms (SNPs) were identified by multi-trait GWAS. Among them, we observed several candidate genes, including PLIN3, KCNS3, TMCO1, PRKAG3, ANGPTL2, IGF-1, SHISA9, and STK3, which were previously reported to associate with growth and development. Further research for these candidate genes may be useful for exploring the full genetic architecture underlying growth and development traits in livestock.

Download Full-text