scholarly journals Regression plane concept: analysing continuous cellular processes with machine learning

2020 ◽  
Author(s):  
Abel Szkalisity ◽  
Filippo Piccinini ◽  
Attila Beleon ◽  
Tamas Balassa ◽  
Istvan Gergely Varga ◽  
...  

ABSTRACTBiological processes are inherently continuous, and the chance of phenotypic discovery is significantly restricted by discretising them. Using multi-parametric active regression we introduce a novel concept to describe and explore biological data in a continuous manner. We have implemented Regression Plane (RP), the first user-friendly discovery tool enabling class-free phenotypic supervised machine learning.

Author(s):  
Yoshihiro Yamanishi ◽  
Hisashi Kashima

In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compound-protein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two different viewpoints: binary classification and dimension reduction. In the results, they demonstrate the usefulness of the methods on the prediction of drug-target interactions and ligand-protein interactions from chemical structure data and genomic sequence data.


2012 ◽  
pp. 616-630
Author(s):  
Yoshihiro Yamanishi ◽  
Hisashi Kashima

In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compound-protein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two different viewpoints: binary classification and dimension reduction. In the results, they demonstrate the usefulness of the methods on the prediction of drug-target interactions and ligand-protein interactions from chemical structure data and genomic sequence data.


2017 ◽  
Vol 474 (4) ◽  
pp. 493-515 ◽  
Author(s):  
Rossana Zaru ◽  
Michele Magrane ◽  
Claire O'Donovan ◽  

Protein kinases form one of the largest protein families and are found in all species, from viruses to humans. They catalyze the reversible phosphorylation of proteins, often modifying their activity and localization. They are implicated in virtually all cellular processes and are one of the most intensively studied protein families. In recent years, they have become key therapeutic targets in drug development as natural mutations affecting kinase genes are the cause of many diseases. The vast amount of data contained in the primary literature and across a variety of biological data collections highlights the need for a repository where this information is stored in a concise and easily accessible manner. The UniProt Knowledgebase meets this need by providing the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. Here, we describe the expert curation process for kinases, focusing on the Caenorhabditis elegans kinome. The C. elegans kinome is composed of 438 kinases and almost half of them have been functionally characterized, highlighting that C. elegans is a valuable and versatile model organism to understand the role of kinases in biological processes.


Author(s):  
Md Mehedi Hasan ◽  
Shaherin Basith ◽  
Mst Shamima Khatun ◽  
Gwang Lee ◽  
Balachandran Manavalan ◽  
...  

Abstract DNA N6-methyladenine (6mA) represents important epigenetic modifications, which are responsible for various cellular processes. The accurate identification of 6mA sites is one of the challenging tasks in genome analysis, which leads to an understanding of their biological functions. To date, several species-specific machine learning (ML)-based models have been proposed, but majority of them did not test their model to other species. Hence, their practical application to other plant species is quite limited. In this study, we explored 10 different feature encoding schemes, with the goal of capturing key characteristics around 6mA sites. We selected five feature encoding schemes based on physicochemical and position-specific information that possesses high discriminative capability. The resultant feature sets were inputted to six commonly used ML methods (random forest, support vector machine, extremely randomized tree, logistic regression, naïve Bayes and AdaBoost). The Rosaceae genome was employed to train the above classifiers, which generated 30 baseline models. To integrate their individual strength, Meta-i6mA was proposed that combined the baseline models using the meta-predictor approach. In extensive independent test, Meta-i6mA showed high Matthews correlation coefficient values of 0.918, 0.827 and 0.635 on Rosaceae, rice and Arabidopsis thaliana, respectively and outperformed the existing predictors. We anticipate that the Meta-i6mA can be applied across different plant species. Furthermore, we developed an online user-friendly web server, which is available at http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/.


2017 ◽  
Vol 01 (01) ◽  
pp. 1630017
Author(s):  
Charles C. N. Wang ◽  
Jeffrey J. P. Tsai

Bioinformatics conceptualizes biological processes in terms of genomics and applies computer science (derived from disciplines such as applied modeling, data mining, machine learning and statistics) to extract knowledge from biological data. This paper introduces the working definitions of bioinformatics and its applications and challenges. We also identify the bioinformatics resources that are popular among bioinformatics analysis, review some primary methods used to analyze bioinformatics problems, and review the data mining, semantic computing and deep learning technologies that may be applied in bioinformatics analysis.


2019 ◽  
Vol 63 (3-4-5) ◽  
pp. 235-244 ◽  
Author(s):  
Anna Ajduk ◽  
Maciej Szkulmowski

In recent years, we have witnessed an unprecedented advancement of light microscopy techniques which has allowed us to better understand biological processes occurring during oogenesis and early embryonic development in mammals. In short, two modes of cellular imaging are now available: those involving fluorescent labels and those which are fluorophore-free. Fluorescence microscopy, in its various forms, is used predominantly in research, as it provides detailed information about cellular processes; however, it can involove an increased risk of photodamage. Fluorophore-free techniques provide, on the other hand, a smaller amount of biological data but they are safer for cells and therefore can be potentially used in a clinical setting. Here, we review various fluorescence and fluorophore-free visualisation approaches and discuss their applicability in developmental biology and reproductive medicine.


2021 ◽  
Vol 6 (61) ◽  
pp. 3073
Author(s):  
Begüm Topçuoğlu ◽  
Zena Lapp ◽  
Kelly Sovacool ◽  
Evan Snitkin ◽  
Jenna Wiens ◽  
...  

2020 ◽  
Vol 14 (2) ◽  
pp. 140-159
Author(s):  
Anthony-Paul Cooper ◽  
Emmanuel Awuni Kolog ◽  
Erkki Sutinen

This article builds on previous research around the exploration of the content of church-related tweets. It does so by exploring whether the qualitative thematic coding of such tweets can, in part, be automated by the use of machine learning. It compares three supervised machine learning algorithms to understand how useful each algorithm is at a classification task, based on a dataset of human-coded church-related tweets. The study finds that one such algorithm, Naïve-Bayes, performs better than the other algorithms considered, returning Precision, Recall and F-measure values which each exceed an acceptable threshold of 70%. This has far-reaching consequences at a time where the high volume of social media data, in this case, Twitter data, means that the resource-intensity of manual coding approaches can act as a barrier to understanding how the online community interacts with, and talks about, church. The findings presented in this article offer a way forward for scholars of digital theology to better understand the content of online church discourse.


2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


Sign in / Sign up

Export Citation Format

Share Document