scholarly journals Spliceator: multi-species splice site prediction using convolutional neural networks

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nicolas Scalzitti ◽  
Arnaud Kress ◽  
Romain Orhand ◽  
Thomas Weber ◽  
Luc Moulinier ◽  
...  

Abstract Background Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. Results We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89–92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. Conclusions Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy.

Processes ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. 649
Author(s):  
Yifeng Liu ◽  
Wei Zhang ◽  
Wenhao Du

Deep learning based on a large number of high-quality data plays an important role in many industries. However, deep learning is hard to directly embed in the real-time system, because the data accumulation of the system depends on real-time acquisitions. However, the analysis tasks of such systems need to be carried out in real time, which makes it impossible to complete the analysis tasks by accumulating data for a long time. In order to solve the problems of high-quality data accumulation, high timeliness of the data analysis, and difficulty in embedding deep-learning algorithms directly in real-time systems, this paper proposes a new progressive deep-learning framework and conducts experiments on image recognition. The experimental results show that the proposed framework is effective and performs well and can reach a conclusion similar to the deep-learning framework based on large-scale data.


Biology ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 932
Author(s):  
Nuno M. Rodrigues ◽  
João E. Batista ◽  
Pedro Mariano ◽  
Vanessa Fonseca ◽  
Bernardo Duarte ◽  
...  

Over recent decades, the world has experienced the adverse consequences of uncontrolled development of multiple human activities. In recent years, the total production of chemicals has been composed of environmentally harmful compounds, the majority of which have significant environmental impacts. These emerging contaminants (ECs) include a wide range of man-made chemicals (such as pesticides, cosmetics, personal and household care products, pharmaceuticals), which are of worldwide use. Among these, several ECs raised concerns regarding their ecotoxicological effects and how to assess them efficiently. This is of particular interest if marine diatoms are considered as potential target species, due to their widespread distribution, being the most abundant phytoplankton group in the oceans, and also being responsible for key ecological roles. Bio-optical ecotoxicity methods appear as reliable, fast, and high-throughput screening (HTS) techniques, providing large datasets with biological relevance on the mode of action of these ECs in phototrophic organisms, such as diatoms. However, from the large datasets produced, only a small amount of data are normally extracted for physiological evaluation, leaving out a large amount of information on the ECs exposure. In the present paper, we use all the available information and evaluate the application of several machine learning and deep learning algorithms to predict the exposure of model organisms to different ECs under different doses, using a model marine diatom (Phaeodactylum tricornutum) as a test organism. The results show that 2D convolutional neural networks are the best method to predict the type of EC to which the cultures were exposed, achieving a median accuracy of 97.65%, while Rocket is the best at predicting which concentration the cultures were subjected to, achieving a median accuracy of 100%.


2017 ◽  
Author(s):  
Robi Tacutu ◽  
Daniel Thornton ◽  
Emily Johnson ◽  
Arie Budovsky ◽  
Diogo Barardo ◽  
...  

AbstractIn spite of a growing body of research and data, human ageing remains a poorly understood process. To facilitate studies of ageing, over 10 years ago we developed the Human Ageing Genomic Resources (HAGR), which are now the leading online resource for biogerontologists. In this update, we present HAGR’s main functionalities, including new additions and improvements to HAGR. HAGR consists of five databases: 1) the GenAge database of ageing-related genes, in turn composed of a dataset of >300 human ageing-related genes and a dataset with >2000 genes associated with ageing or longevity in model organisms; 2) the AnAge database of animal ageing and longevity, featuring >4000 species; 3) the GenDR database with >200 genes associated with the life-extending effects of dietary restriction; 4) the LongevityMap database of human genetic association studies of longevity with >500 entries; 5) the DrugAge database with >400 ageing or longevity-associated drugs or compounds; 6) the CellAge database with >200 genes associated with cell senescence. All our databases are manually curated by experts to ensure a high quality data and presented in an intuitive and clear interface that includes cross-links across our databases and to external resources. HAGR is freely available online (http://genomics.senescence.info/).


2020 ◽  
Author(s):  
Guy M. Hagen ◽  
Justin Bendesky ◽  
Rosa Machado ◽  
Tram-Anh Nguyen ◽  
Tanmay Kumar ◽  
...  

AbstractBackgroundFluorescence microscopy is an important technique in many areas of biological research. Two factors which limit the usefulness and performance of fluorescence microscopy are photobleaching of fluorescent probes during imaging, and, when imaging live cells, phototoxicity caused by light exposure. Recently developed methods in machine learning are able to greatly improve the signal to noise ratio of acquired images. This allows researchers to record images with much shorter exposure times, which in turn minimizes photobleaching and phototoxicity by reducing the dose of light reaching the sample.FindingsTo employ deep learning methods, a large amount of data is needed to train the underlying convolutional neural network. One way to do this involves use of pairs of fluorescence microscopy images acquired with long and short exposure times. We provide high quality data sets which can be used to train and evaluate deep learning methods under development.ConclusionThe availability of high quality data is vital for training convolutional neural networks which are used in current machine learning approaches.


2019 ◽  
Vol 88 (1) ◽  
Author(s):  
Tomasz Henryk Szymura ◽  
Magdalena Szymura

Grasslands provide wide range of ecosystem services, however, their area and quality are still diminishing in Europe. Nowadays, they often create isolated patches inside “sea” of other habitats. We have examined basic structural landscape metrics of grasslands in Poland using CORINE land use database. Characteristics for both all individual patches as well as average values for 10 × 10-km grid covering Poland were examined. We also assessed the percentage of grasslands within protected areas and ecological corridors. We found that in Poland rather small patches (0.3–1 km<sup>2</sup>) dominate, usually located 200–500 m away from each other. The grasslands had clumped distribution, thus in Poland exist large areas where grasslands patches are separated kilometers from each other. Almost all indices calculated for 10 × 10-km<sup>2</sup> were correlated, i.e., in regions with high percentage of grasslands, the patches were large, more numerous, placed close to each other, and had more irregular shapes. Our results revealed that the percentage of grasslands within protected areas and ecological corridors did not differ from the average value for Poland. On the other hand, forests were significantly over-represented in protected areas and ecological corridors. These findings suggest that there is no planned scheme for grassland protection at the landscape scale in Poland. Development the scheme is urgent and needs high-quality data regarding distribution of seminatural grasslands patches. In practice, nature conservationists and managers should consider spatial processes in their plans in order to maintain grassland biodiversity.


Author(s):  
Sethu Arun Kumar ◽  
Thirumoorthy Durai Ananda Kumar ◽  
Narasimha M Beeraka ◽  
Gurubasavaraj Veeranna Pujar ◽  
Manisha Singh ◽  
...  

Predicting novel small molecule bioactivities for the target deconvolution, hit-to-lead optimization in drug discovery research, requires molecular representation. Previous reports have demonstrated that machine learning (ML) and deep learning (DL) have substantial implications in virtual screening, peptide synthesis, drug ADMET screening and biomarker discovery. These strategies can increase the positive outcomes in the drug discovery process without false-positive rates and can be achieved in a cost-effective way with a minimum duration of time by high-quality data acquisition. This review substantially discusses the recent updates in AI tools as cheminformatics application in medicinal chemistry for the data-driven decision making of drug discovery and challenges in high-quality data acquisition in the pharmaceutical industry while improving small-molecule bioactivities and properties.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S757-S757
Author(s):  
Charlotte L Eost-Telling ◽  
Paul Kingston ◽  
Louise Taylor ◽  
Jan Bailey

Abstract The Mass Observation Project, established in 1937, documents the lives of ordinary people living in the UK, and explores a wide range of social issues. The Project distributes a set of written questions (“Directives”) to a panel of 500 members of the British public (“Observers”) three times each year; “Observers” respond in writing. From the initial commissioning of a “Directive” to data becoming available for analysis takes between four to six months. This approach offers researchers an opportunity to capture in-depth qualitative data from individuals with a range of demographic backgrounds who live across the UK. As there are no word limits on “Observers’” responses and they remain anonymous, a “Directive” often yields rich, high-quality data. Additionally, compared with alternative methods of collecting large volumes of qualitative data from a heterogeneous population, commissioning a “Directive” is cost-effective in terms of time and resource.


2020 ◽  
Author(s):  
Kevin Jooss ◽  
John P. McGee ◽  
Rafael D. Melani ◽  
Neil L. Kelleher

AbstractNative mass spectrometry (nMS) is a rapidly growing method for the characterization of large proteins and protein complexes, preserving “native” non-covalent inter- and intramolecular interactions. Direct infusion of purified analytes into a mass spectrometer represents the standard approach for conducting nMS experiments. Alternatively, CZE can be performed under native conditions, providing high separation performance while consuming trace amounts of sample material. Here, we provide standard operating procedures for acquiring high quality data using CZE in native mode coupled online to various Orbitrap mass spectrometers via a commercial sheathless interface, covering a wide range of analytes from 30 – 800 kDa. Using a standard protein mix, the influence of various CZE method parameters were evaluated, such as BGE/conductive liquid composition and separation voltage. Additionally, a universal approach for the optimization of fragmentation settings in the context of protein subunit and metalloenzyme characterization is discussed in detail for model analytes. A short section is dedicated to troubleshooting of the nCZE-MS setup. This study is aimed to help normalize nCZE-MS practices to enhance the CE community and provide a resource for production of reproducible and high-quality data.


2021 ◽  
Vol 12 ◽  
pp. 435
Author(s):  
Adnan A. Khan ◽  
Hamza Ibad ◽  
Kaleem Sohail Ahmed ◽  
Zahra Hoodbhoy ◽  
Shahzad M. Shamim

Deep learning (DL) is a relatively newer subdomain of machine learning (ML) with incredible potential for certain applications in the medical field. Given recent advances in its use in neuro-oncology, its role in diagnosing, prognosticating, and managing the care of cancer patients has been the subject of many research studies. The gamut of studies has shown that the landscape of algorithmic methods is constantly improving with each iteration from its inception. With the increase in the availability of high-quality data, more training sets will allow for higher fidelity models. However, logistical and ethical concerns over a prospective trial comparing prognostic abilities of DL and physicians severely limit the ability of this technology to be widely adopted. One of the medical tenets is judgment, a facet of medical decision making in DL that is often missing because of its inherent nature as a “black box.” A natural distrust for newer technology, combined with a lack of autonomy that is normally expected in our current medical practices, is just one of several important limitations in implementation. In our review, we will first define and outline the different types of artificial intelligence (AI) as well as the role of AI in the current advances of clinical medicine. We briefly highlight several of the salient studies using different methods of DL in the realm of neuroradiology and summarize the key findings and challenges faced when using this nascent technology, particularly ethical challenges that could be faced by users of DL.


2020 ◽  
Author(s):  
James McDonagh ◽  
William Swope ◽  
Richard L. Anderson ◽  
Michael Johnston ◽  
David J. Bray

Digitization offers significant opportunities for the formulated product industry to transform the way it works and develop new methods of business. R&D is one area of operation that is challenging to take advantage of these technologies due to its high level of domain specialisation and creativity but the benefits could be significant. Recent developments of base level technologies such as artificial intelligence (AI)/machine learning (ML), robotics and high performance computing (HPC), to name a few, present disruptive and transformative technologies which could offer new insights, discovery methods and enhanced chemical control when combined in a digital ecosystem of connectivity, distributive services and decentralisation. At the fundamental level, research in these technologies has shown that new physical and chemical insights can be gained, which in turn can augment experimental R&D approaches through physics-based chemical simulation, data driven models and hybrid approaches. In all of these cases, high quality data is required to build and validate models in addition to the skills and expertise to exploit such methods. In this article we give an overview of some of the digital technology demonstrators we have developed for formulated product R&D. We discuss the challenges in building and deploying these demonstrators.<br>


Sign in / Sign up

Export Citation Format

Share Document