Gene Finding – Gene Prediction

2018 ◽  
pp. 403-403
Keyword(s):  
2021 ◽  
Vol 17 (2) ◽  
pp. e1008727
Author(s):  
Markus J. Sommer ◽  
Steven L. Salzberg

Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retrained on each new genome. We have developed a universal model of prokaryotic genes by fitting a temporal convolutional network to amino-acid sequences from a large, diverse set of microbial genomes. We incorporated the new model into a gene finding system, Balrog (Bacterial Annotation by Learned Representation Of Genes), which does not require genome-specific training and which matches or outperforms other state-of-the-art gene finding tools. Balrog is freely available under the MIT license at https://github.com/salzberg-lab/Balrog.


2005 ◽  
Vol 2 (1) ◽  
pp. 74-83
Author(s):  
Andigoni Malousi ◽  
Vassilis Koutkias ◽  
Nicos Maglaveras

SummaryWhile biological processes underlying gene expression are still under experimental research, computational gene prediction techniques have reached high level of sophistication with the employment of efficient intrinsic and extrinsic methods that identify protein-coding regions within query genomic sequences. Their ability though to delineate the exact exon boundaries is characterized by a trade off between sensitivity and specificity and still is prone to alternations in gene regulation during transcription and splicing and to inherent complexities introduced by the implemented methodology. Evaluation studies have shown that combinatorial approaches exhibit improved accuracy levels through the integration of evidence data from multiple resources that are further assessed in order to end up with the most probable gene assembly.In this work, we present an integration and information handling architecture that exploits evidence derived from multiple gene finding resources, in order to generate machine-readable representations of optimal/suboptimal gene structure predictions, signal features identification and high scoring similarity matches. Unlike most combinatorial techniques, which end up with the most probable gene assembly, the objective of this architecture is to support advanced information handling mechanisms that may give more in depth insights on the underlying gene expression machinery and the alternations that may occur. Technically, XML was adopted to build and interchange structured data among the architecture’s components together with relevant technologies offering graphical representations and queries formulation/execution over single/multiple information sources.


2014 ◽  
Vol 42 (15) ◽  
pp. e119-e119 ◽  
Author(s):  
Alexandre Lomsadze ◽  
Paul D. Burns ◽  
Mark Borodovsky

Abstract We present a new approach to automatic training of a eukaryotic ab initio gene finding algorithm. With the advent of Next-Generation Sequencing, automatic training has become paramount, allowing genome annotation pipelines to keep pace with the speed of genome sequencing. Earlier we developed GeneMark-ES, currently the only gene finding algorithm for eukaryotic genomes that performs automatic training in unsupervised ab initio mode. The new algorithm, GeneMark-ET augments GeneMark-ES with a novel method that integrates RNA-Seq read alignments into the self-training procedure. Use of ‘assembled’ RNA-Seq transcripts is far from trivial; significant error rate of assembly was revealed in recent assessments. We demonstrated in computational experiments that the proposed method of incorporation of ‘unassembled’ RNA-Seq reads improves the accuracy of gene prediction; particularly, for the 1.3 GB genome of Aedes aegypti the mean value of prediction Sensitivity and Specificity at the gene level increased over GeneMark-ES by 24.5%. In the current surge of genomic data when the need for accurate sequence annotation is higher than ever, GeneMark-ET will be a valuable addition to the narrow arsenal of automatic gene prediction tools.


Cells ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 1003
Author(s):  
Margarita L. Martinez-Fierro ◽  
Idalia Garza-Veloz

microRNAs are important regulators of cell processes and have been proposed as potential preeclampsia biomarkers. We evaluated serum microRNA expression profiling to identify microRNAs involved in preeclampsia development. Serum microRNA expression profiling was evaluated at 12, 16, and 20 weeks of gestation (WG), and at the time of preeclampsia diagnosis. Two groups were evaluated using TaqMan low-density array plates: a control group with 18 normotensive pregnant women and a case group with 16 patients who developed preeclampsia during the follow-up period. Fifty-three circulating microRNAs were differentially expressed between groups (p < 0.05). Compared with controls, hsa-miR-628-3p showed the highest relative quantity values (at 12 WG = 7.7 and at 20 WG = 3.45) and the hsa-miRs -151a-3p and -573 remained differentially expressed from 16 to 20 WG (p < 0.05). Signaling pathways including cancer-related, axon guidance, Neurotrophin, GnRH, VEGF, and B/T cell receptor, were most commonly altered. Further target gene prediction revealed that nuclear factor of activated T-cells 5 gene was included among the transcriptional targets of preeclampsia-modulated microRNAs. Specific microRNAs including hsa-miRs -628-3p, -151a-3p, and -573 were differentially expressed in serum of pregnant women before they developed preeclampsia compared with controls and their participation in the preeclampsia development should be considered.


2021 ◽  
Vol 32 ◽  
pp. S290
Author(s):  
Daisuke Kotani ◽  
Satoshi Fujii ◽  
Tomoyuki Yamada ◽  
Mizuto Suzuki ◽  
Takayuki Yoshino

Sign in / Sign up

Export Citation Format

Share Document