Gene Prediction, Homology Based (Extrinsic Gene Prediction, Look-Up Gene Prediction, Sequence Similarity-Based Gene Prediction)

Author(s):  
Roderic Guig��
2006 ◽  
pp. 95-106 ◽  
Author(s):  
Roderic Guigó ◽  
Moisés Burset ◽  
Pankaj Agarwal ◽  
Josep F. Abril ◽  
Randall F. Smith ◽  
...  

2019 ◽  
Author(s):  
Luke Sargent ◽  
Yating Liu ◽  
Wilson Leung ◽  
Nathan T. Mortimer ◽  
David Lopatto ◽  
...  

AbstractScientists are sequencing new genomes at an increasing rate with the goal of associating genome contents with phenotypic traits. After a new genome is sequenced and assembled, structural gene annotation is often the first step in analysis. Despite advances in computational gene prediction algorithms, most eukaryotic genomes still benefit from manual gene annotation. Undergraduates can become skilled annotators, and in the process learn both about genes/genomes and about how to utilize large datasets. Data visualizations provided by a genome browser are essential for manual gene annotation, enabling annotators to quickly evaluate multiple lines of evidence (e.g., sequence similarity, RNA-Seq, gene predictions, repeats). However, creating genome browsers requires extensive computational skills; lack of the expertise required remains a major barrier for many biomedical researchers and educators.To address these challenges, the Genomics Education Partnership (GEP; https://gep.wustl.edu/) has partnered with the Galaxy Project (https://galaxyproject.org) to develop G-OnRamp (http://g-onramp.org), a web-based platform for creating UCSC Assembly Hubs and JBrowse genome browsers. G-OnRamp can also convert a JBrowse instance into an Apollo instance for collaborative genome annotations in research and educational settings. G-OnRamp enables researchers to easily visualize their experimental results, educators to create Course-based Undergraduate Research Experiences (CUREs) centered on genome annotation, and students to participate in genomics research.Development of G-OnRamp was guided by extensive user feedback from in-person workshops. Sixty-five researchers and educators from over 40 institutions participated in these workshops, which produced over 20 genome browsers now available for research and education. For example, genome browsers for four parasitoid wasp species were used in a CURE engaging 142 students taught by 13 faculty members — producing a total of 192 gene models. G-OnRamp can be deployed on a personal computer or on cloud computing platforms, and the genome browsers produced can be transferred to the CyVerse Data Store for long-term access.


2020 ◽  
Author(s):  
Zhe Zhang ◽  
Lei Liu ◽  
Melis Kucukoglu ◽  
Dongdong Tian ◽  
Robert M. Larkin ◽  
...  

Abstract Background: The CLV3/ESR-RELATED (CLE) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development by promoting cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity. Results: We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of 12 amino acid residues from the CLE motif in a site-weight dependent manner. In total, 2156 CLE candidates—including 627 novel candidates—were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor. Conclusions: Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Zhe Zhang ◽  
Lei Liu ◽  
Melis Kucukoglu ◽  
Dongdong Tian ◽  
Robert M. Larkin ◽  
...  

Abstract Background The CLV3/ESR-RELATED (CLE) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development by promoting cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity. Results We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of 12 amino acid residues from the CLE motif in a site-weight dependent manner. In total, 2156 CLE candidates—including 627 novel candidates—were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor. Conclusions Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis.


2020 ◽  
Author(s):  
Zhe Zhang ◽  
Lei Liu ◽  
Melis Kucukoglu ◽  
Dongdong Tian ◽  
Robert M. Larkin ◽  
...  

Abstract Background: The CLV3 / ESR-RELATED ( CLE ) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development through cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity. Results: We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of the 12 amino acid residues from CLE motifs in a site-weight dependent manner. In total, 2156 CLE candidates—including 627 novel candidates—were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor. Conclusions: Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis.


2020 ◽  
Author(s):  
Zhe Zhang ◽  
Lei Liu ◽  
Melis Kucukoglu ◽  
Dongdong Tian ◽  
Robert M. Larkin ◽  
...  

Abstract Background: The CLV3/ESR-RELATED (CLE) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development by promoting cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity. Results: We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of 12 amino acid residues from the CLE motif in a site-weight dependent manner. In total, 2156 CLE candidates—including 627 novel candidates—were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor. Conclusions: Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis.


2012 ◽  
Vol 40 (W1) ◽  
pp. W186-W192 ◽  
Author(s):  
Shiliang Wang ◽  
Jaideep P. Sundaram ◽  
Timothy B. Stockwell

Abstract A gene prediction program, VIGOR (Viral Genome ORF Reader), was developed at J. Craig Venter Institute in 2010 and has been successfully performing gene calling in coronavirus, influenza, rhinovirus and rotavirus for projects at the Genome Sequencing Center for Infectious Diseases. VIGOR uses sequence similarity search against custom protein databases to identify protein coding regions, start and stop codons and other gene features. Ribonucleicacid editing and other features are accurately identified based on sequence similarity and signature residues. VIGOR produces four output files: a gene prediction file, a complementary DNA file, an alignment file, and a gene feature table file. The gene feature table can be used to create GenBank submission. VIGOR takes a single input: viral genomic sequences in FASTA format. VIGOR has been extended to predict genes for 12 viruses: measles virus, mumps virus, rubella virus, respiratory syncytial virus, alphavirus and Venezuelan equine encephalitis virus, norovirus, metapneumovirus, yellow fever virus, Japanese encephalitis virus, parainfluenza virus and Sendai virus. VIGOR accurately detects the complex gene features like ribonucleicacid editing, stop codon leakage and ribosomal shunting. Precisely identifying the mat_peptide cleavage for some viruses is a built-in feature of VIGOR. The gene predictions for these viruses have been evaluated by testing from 27 to 240 genomes from GenBank.


1998 ◽  
Vol 80 (08) ◽  
pp. 242-245 ◽  
Author(s):  
Yoshihide Fukuda ◽  
Tetsuo Hayakawa ◽  
Junki Takamatsu ◽  
Hidehiko Saito ◽  
Hiroaki Okamoto ◽  
...  

SummaryJapanese haemophiliacs have been at high risk for infection with parenterally-transmissible viruses through the use of blood products, especially imported ones. Recently, novel transfusion-transmissible virus, GB virus C (GBV-C)/hepatitis G virus (HGV) were isolated. We investigated the origin and route of transmission of GBV-C/HGV isolates in haemophiliacs in Japan. GBV-C/HGV RNA was measured by nested reverse transcription polymerase chain reaction in 91 Japanese haemophiliacs. Phylogenetic analysis and genotypic grouping of GBV-C/HGV isolates in Japanese haemophiliacs were performed based on sequences in the 5’ untranslated region, and the characteristics were compared with those of reported isolates. GBV-C/HGV infection was present in 19 of 91 haemophiliacs (20.9%). Sequence analysis showed that 15 of the 19 isolates (78.9%) showed sequence similarity to a group in which mainly West African isolates have been reported. The other 4 isolates (21.1%) showed sequence similarity to Asian isolates. None of the GBV-C/HGV isolates showed sequences similar to those generally found in isolates from USA and Europe. The majority of GBV-C/HGV isolates found in Japanese haemophiliacs who are considered to have been infected by imported blood products were similar to those detected in West Africa.


1995 ◽  
Vol 74 (05) ◽  
pp. 1316-1322 ◽  
Author(s):  
Mary Ann McLane ◽  
Jagadeesh Gabbeta ◽  
A Koneti Rao ◽  
Lucia Beviglia ◽  
Robert A Lazarus ◽  
...  

SummaryNaturally-occurring fibrinogen receptor antagonists and platelet aggregation inhibitors that are found in snake venom (disintegrins) and leeches share many common features, including an RGD sequence, high cysteine content, and low molecular weight. There are, however, significant selectivity and potency differences. We compared the effect of three proteins on platelet function: albolabrin, a 7.5 kDa disintegrin, eristostatin, a 5.4 kDa disintegrin in which part of the disintegrin domain is deleted, and decorsin, a 4.5 kDa non-disintegrin derived from the leech Macrobdella decora, which has very little sequence similarity with either disintegrin. Decorsin was about two times less potent than albolabrin and six times less potent than eristostatin in inhibiting ADP- induced human platelet aggregation. It had a different pattern of interaction with glycoprotein IIb/IIIa as compared to the two disintegrins. Decorsin bound with a low affinity to resting platelets (409 nM) and to ADP-activated platelets (270 nM), and with high affinity to thrombin- activated platelets (74 nM). At concentrations up to 685 nM, it did not cause expression of a ligand-induced binding site epitope on the (β3 subunit of the GPIIb/IIIa complex. It did not significantly inhibit isolated GPIIb/IIIa binding to immobilized von Willebrand Factor. At low doses (1.5-3.0 μg/mouse), decorsin protected mice against death from pulmonary thromboembolism, showing an effect similar to eristostatin. This suggested that decorsin is a much more potent inhibitor of platelet aggregation in vivo than in vitro, and it may have potential as an antiplatelet drug.


Sign in / Sign up

Export Citation Format

Share Document