Single-cell ChIP-seq imputation with SIMPA by leveraging bulk ENCODE data

AbstractSingle-cell ChIP-seq analysis is challenging due to data sparsity. We present SIMPA (https://github.com/salbrec/SIMPA), a single-cell ChIP-seq data imputation method leveraging predictive information within bulk ENCODE data to impute missing protein-DNA interacting regions of target histone marks or transcription factors. Machine learning models trained for each single cell, each target, and each genomic region enable drastic improvement in cell types clustering and genes identification.

Download Full-text

A pitfall for machine learning methods aiming to predict across cell types

Genome Biology ◽

10.1186/s13059-020-02177-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jacob Schreiber ◽

Ritambhara Singh ◽

Jeffrey Bilmes ◽

William Stafford Noble

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cell Types ◽

Chromatin Domain ◽

Learning Models ◽

Machine Learning Methods ◽

Domain Boundaries ◽

Average Activity ◽

Test Sets ◽

Machine Learning Models

AbstractMachine learning models that predict genomic activity are most useful when they make accurate predictions across cell types. Here, we show that when the training and test sets contain the same genomic loci, the resulting model may falsely appear to perform well by effectively memorizing the average activity associated with each locus across the training cell types. We demonstrate this phenomenon in the context of predicting gene expression and chromatin domain boundaries, and we suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data becomes available, future projects will increasingly risk suffering from this issue.

Download Full-text

Evaluation of machine learning approaches for cell-type identification from single-cell transcriptomics data

Briefings in Bioinformatics ◽

10.1093/bib/bbab035 ◽

2021 ◽

Author(s):

Yixuan Huang ◽

Peng Zhang

Keyword(s):

Machine Learning ◽

Single Cell ◽

Computation Time ◽

Support Vector ◽

Cell Phenotype ◽

Learning Models ◽

Cell Type ◽

Sequencing Data ◽

Phenotype Classification ◽

Machine Learning Models

Abstract Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which is time-consuming, irreproducible and sometimes lack canonical markers for certain cell types. There is a growing realization of the potential of machine learning models as a supervised classification approach that can significantly aid decision-making processes for cell-type identification. In this work, we performed a comprehensive and impartial evaluation of 10 machine learning models that automatically assign cell phenotypes. The performance of classification methods is estimated by using 20 publicly accessible single-cell RNA sequencing datasets with different sizes, technologies, species and levels of complexity. The performance of each model for within dataset (intra-dataset) and across datasets (inter-dataset) experiments based on the classification accuracy and computation time are both evaluated. Besides, the sensitivity to the number of input features, different annotation levels and dataset complexity was also been estimated. Results showed that most classifiers perform well on a variety of datasets with decreased accuracy for complex datasets, while the Linear Support Vector Machine (linear-SVM) and Logistic Regression classifier models have the best overall performance with remarkably fast computation time. Our work provides a guideline for researchers to select and apply suitable machine learning-based classification models in their analysis workflows and sheds some light on the potential direction of future improvement on automated cell phenotype classification tools based on the single-cell sequencing data.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text

Development of Machine Learning Models to Predict Student Performance in Computer Literacy Courses

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v13i1.16863 ◽

2018 ◽

Vol 13 (1) ◽

pp. 21

Author(s):

George Anderson ◽

Oduronke T. Eyitayo

Keyword(s):

Machine Learning ◽

Student Performance ◽

Computer Literacy ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Experimental Comparison of Machine Learning Models in Malware Packing Detection

2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS) ◽

10.23919/apnoms50412.2020.9237007 ◽

2020 ◽

Author(s):

Jong-Wouk Kim ◽

Juhong Namgung ◽

Yang-Sae Moon ◽

Mi-Jung Choi

Keyword(s):

Machine Learning ◽

Experimental Comparison ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.

Download Full-text

A Comparative Study of Machine Learning Models for Stock Market Rate Prediction

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i6.985990 ◽

2019 ◽

Vol 7 (6) ◽

pp. 985-990

Author(s):

reeraksha M S ◽

Bhargavi M S

Keyword(s):

Machine Learning ◽

Stock Market ◽

Comparative Study ◽

Learning Models ◽

Rate Prediction ◽

Market Rate ◽

Machine Learning Models

Download Full-text

An Intelligent Approach for Prediction of Liver Disease using Machine Learning Models

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/568102020 ◽

2020 ◽

Vol 8 (10) ◽

pp. 6974-6983

Keyword(s):

Machine Learning ◽

Liver Disease ◽

Learning Models ◽

Intelligent Approach ◽

Machine Learning Models

Download Full-text

Utilizing Blockchain Technology in Social Media Bot Identification

10.36227/techrxiv.12049374 ◽

2020 ◽

Author(s):

Shreya Reddy ◽

Lisa Ewen ◽

Pankti Patel ◽

Prerak Patel ◽

Ankit Kundal ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Gold Standard ◽

The Internet ◽

Learning Models ◽

Current Time ◽

Machine Learning Methods ◽

Blockchain Technology ◽

Modern Age ◽

Machine Learning Models

As bots become more prevalent and smarter in the modern age of the internet, it becomes ever more important that they be identified and removed. Recent research has dictated that machine learning methods are accurate and the gold standard of bot identification on social media. Unfortunately, machine learning models do not come without their negative aspects such as lengthy training times, difficult feature selection, and overwhelming pre-processing tasks. To overcome these difficulties, we are proposing a blockchain framework for bot identification. At the current time, it is unknown how this method will perform, but it serves to prove the existence of an overwhelming gap of research under this area.

Download Full-text

Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

10.21236/ada622645 ◽

2015 ◽

Author(s):

Katya Scheinberg

Keyword(s):

Machine Learning ◽

Complex Systems ◽

Learning Models ◽

Statistical Machine Learning ◽

Derivative Free Optimization ◽

Derivative Free ◽

Machine Learning Models

Download Full-text