scholarly journals TransPrise: a novel machine learning approach for eukaryotic promoter prediction

Author(s):  
Stepan Pachganov ◽  
Khalimat Murtazalieva ◽  
Alexei Zarubin ◽  
Dmitry Sokolov ◽  
Duane Chartier ◽  
...  

As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise - an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise with the TSSPlant approach for well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at ( http://compubioverne.group /). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.

2019 ◽  
Author(s):  
Stepan Pachganov ◽  
Khalimat Murtazalieva ◽  
Alexei Zarubin ◽  
Dmitry Sokolov ◽  
Duane Chartier ◽  
...  

As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise - an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise with the TSSPlant approach for well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at ( http://compubioverne.group /). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7990 ◽  
Author(s):  
Stepan Pachganov ◽  
Khalimat Murtazalieva ◽  
Aleksei Zarubin ◽  
Dmitry Sokolov ◽  
Duane R. Chartier ◽  
...  

As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise—an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. Our pipeline consists of two parts: the binary classifier operates the first, and if a sequence is classified as TSS-containing the regression step follows, where the precise location of TSS is being identified. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise classification and regression models with the TSSPlant approach for the well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. The Matthews correlation coefficient value for TransPrise is 0.79, more than two times larger than the 0.31 for TSSPlant classification models. This represents a high level of prediction accuracy. Additionally, the mean absolute error for the regression model is 29.19 nt, allowing for accurate prediction of TSS location. TransPrise was also tested in Homo sapiens, where mean absolute error of the regression model was 47.986 nt. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at (http://compubioverne.group/). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.


2019 ◽  
Author(s):  
Bo Yan ◽  
George Tzertzinis ◽  
Ira Schildkraut ◽  
Laurence Ettwiller

AbstractMethodologies for determining eukaryotic Transcription Start Sites (TSS) rely on the selection of the 5’ canonical cap structure of Pol-II transcripts and are consequently ignoring entire classes of TSS derived from other RNA polymerases which play critical roles in various cell functions. To overcome this limitation, we developed ReCappable-seq and identified TSS from Pol-ll and non-Pol-II transcripts at nucleotide resolution. Applied to the human transcriptome, ReCappable-seq identifies Pol-II TSS with higher specificity than CAGE and reveals a rich landscape of TSS associated notably with Pol-III transcripts which have been previously not possible to study on a genome-wide scale. Novel TSS consistent with non-Pol-II transcripts can be found in the nuclear and mitochondrial genomes. By identifying TSS derived from all RNA-polymerases, ReCappable-seq reveals distinct epigenetic marks among Pol-lI and non-Pol-II TSS and provides a unique opportunity to concurrently interrogate the regulatory landscape of coding and non-coding RNA.


2007 ◽  
Author(s):  
Fredrick H. Rothganger ◽  
Kurt W. Larson ◽  
Antonio Ignacio Gonzales ◽  
Daniel S. Myers

2021 ◽  
Vol 22 (10) ◽  
pp. 5212
Author(s):  
Andrzej Bak

A key question confronting computational chemists concerns the preferable ligand geometry that fits complementarily into the receptor pocket. Typically, the postulated ‘bioactive’ 3D ligand conformation is constructed as a ‘sophisticated guess’ (unnecessarily geometry-optimized) mirroring the pharmacophore hypothesis—sometimes based on an erroneous prerequisite. Hence, 4D-QSAR scheme and its ‘dialects’ have been practically implemented as higher level of model abstraction that allows the examination of the multiple molecular conformation, orientation and protonation representation, respectively. Nearly a quarter of a century has passed since the eminent work of Hopfinger appeared on the stage; therefore the natural question occurs whether 4D-QSAR approach is still appealing to the scientific community? With no intention to be comprehensive, a review of the current state of art in the field of receptor-independent (RI) and receptor-dependent (RD) 4D-QSAR methodology is provided with a brief examination of the ‘mainstream’ algorithms. In fact, a myriad of 4D-QSAR methods have been implemented and applied practically for a diverse range of molecules. It seems that, 4D-QSAR approach has been experiencing a promising renaissance of interests that might be fuelled by the rising power of the graphics processing unit (GPU) clusters applied to full-atom MD-based simulations of the protein-ligand complexes.


2021 ◽  
Vol 20 (3) ◽  
pp. 1-22
Author(s):  
David Langerman ◽  
Alan George

High-resolution, low-latency apps in computer vision are ubiquitous in today’s world of mixed-reality devices. These innovations provide a platform that can leverage the improving technology of depth sensors and embedded accelerators to enable higher-resolution, lower-latency processing for 3D scenes using depth-upsampling algorithms. This research demonstrates that filter-based upsampling algorithms are feasible for mixed-reality apps using low-power hardware accelerators. The authors parallelized and evaluated a depth-upsampling algorithm on two different devices: a reconfigurable-logic FPGA embedded within a low-power SoC; and a fixed-logic embedded graphics processing unit. We demonstrate that both accelerators can meet the real-time requirements of 11 ms latency for mixed-reality apps. 1


Sign in / Sign up

Export Citation Format

Share Document