Explainable t-SNE for single-cell RNA-seq data analysis

Single-cell RNA (scRNA-seq) sequencing technologies trigger the study of individual cell gene expression and reveal the diversity within cell populations. To measure cell-to-cell similarity based on their transcription and gene expression, many dimension reduction methods are employed to retrieve the corresponding low-dimensional embeddings of input scRNA-seq data to conduct clustering. However, the methods lack explainability and may not perform well with scRNA-seq data because they are often migrated from other fields and not customized for high-dimensional sparse scRNA-seq data. In this study, we propose an explainable t-SNE: cell-driven t-SNE (c-TSNE) that fuses the cell differences reflected from biologically meaningful distance metrics for input scRNA-seq data. Our study shows that the proposed method not only enhances the interpretation of the original t-SNE visualization for scRNA-seq data but also demonstrates favorable single cell segregation performance on benchmark datasets compared to the state-of-the-art peers. The robustness analysis shows that the proposed cell-driven t-SNE demonstrates robustness to dropout and noise in dimension reduction and clustering. It provides a novel and practical way to investigate the interpretability of t-SNE in scRNA-seq data analysis. Unlike the general assumption that the explainanbility of a machine learning method needs to compromise with the learning efficiency, the proposed explainable t-SNE improves both clustering efficiency and explainanbility in scRNA-seq analysis. More importantly, our work suggests that widely used t-SNE can be easily misused in the existing scRNA-seq analysis, because its default Euclidean distance can bring biases or meaningless results in cell difference evaluation for high-dimensional sparse scRNA-seq data. To the best of our knowledge, it is the first explainable t-SNE proposed in scRNA-seq analysis and will inspire other explainable machine learning method development in the field.

Download Full-text

Data analysis for a set of university student lists using the k-Nearest Neighbors machine learning method

SSRN Electronic Journal ◽

10.2139/ssrn.3849881 ◽

2020 ◽

Author(s):

Alex Francisco Estupiñán López

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Nearest Neighbors ◽

University Student ◽

Machine Learning Method ◽

Learning Method ◽

K Nearest Neighbors

Download Full-text

New interpretable machine learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy

10.1101/702118 ◽

2019 ◽

Cited By ~ 3

Author(s):

Evan Greene ◽

Greg Finak ◽

Leonard A. D’Amico ◽

Nina Bhardwaj ◽

Candice D. Church ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

T Cell ◽

Single Cell ◽

Cancer Immunotherapy ◽

Effector Memory ◽

Machine Learning Method ◽

Learning Method ◽

Modeling Framework ◽

Interpretable Machine Learning

AbstractHigh-dimensional single-cell cytometry is routinely used to characterize patient responses to cancer immunotherapy and other treatments. This has produced a wealth of datasets ripe for exploration but whose biological and technical heterogeneity make them difficult to analyze with current tools. We introduce a new interpretable machine learning method for single-cell mass and flow cytometry studies, FAUST, that robustly performs unbiased cell population discovery and annotation. FAUST processes data on a per-sample basis and returns biologically interpretable cell phenotypes that can be compared across studies, making it well-suited for the analysis and integration of complex datasets. We demonstrate how FAUST can be used for candidate biomarker discovery and validation by applying it to a flow cytometry dataset from a Merkel cell carcinoma anti-PD-1 trial and discover new CD4+ and CD8+ effector-memory T cell correlates of outcome co-expressing PD-1, HLA-DR, and CD28. We then use FAUST to validate these correlates in an independent CyTOF dataset from a published metastatic melanoma trial. Importantly, existing state-of-the-art computational discovery approaches as well as prior manual analysis did not detect these or any other statistically significant T cell sub-populations associated with anti-PD-1 treatment in either data set. We further validate our methodology by using FAUST to replicate the discovery of a previously reported myeloid correlate in a different published melanoma trial, and validate the correlate by identifying it de novo in two additional independent trials. FAUST’s phenotypic annotations can be used to perform cross-study data integration in the presence of heterogeneous data and diverse immunophenotyping staining panels, enabling hypothesis-driven inference about cell sub-population abundance through a multivariate modeling framework we call Phenotypic and Functional Differential Abundance (PFDA). We demonstrate this approach on data from myeloid and T cell panels across multiple trials. Together, these results establish FAUST as a powerful and versatile new approach for unbiased discovery in single-cell cytometry.

Download Full-text

Methodology Proposal of ADHD Classification of Children Based on Cross Recurrence Plots

10.21203/rs.3.rs-163507/v1 ◽

2021 ◽

Author(s):

Marco Aceves-Fernandez

Keyword(s):

Machine Learning ◽

Spectral Distribution ◽

High Dimensional ◽

Control Group ◽

Machine Learning Method ◽

Learning Method ◽

Recurrence Plots ◽

Eeg Signals ◽

Power Spectral

Abstract Dealing with electroencephalogram signals (EEG) are often not easy. The lack of predicability and complexity of such non-stationary, noisy and high dimensional signals is challenging. Cross Recurrence Plots (CRP) have been used extensively to deal with detecting subtle changes in signals even when the noise is embedded in the signal. In this contribution, a total of 121 children performed visual attention experiments and a proposed methodology using CRP and a Welch Power Spectral Distribution have been used to classify then between those who have ADHD and the control group. Additional tools were presented to determine to which extent the proposed methodology is able to classify accurately and avoid misclassifications, thus demonstrating that this methodology is feasible to classify EEG signals from subjects with ADHD. Lastly, the results were compared with a baseline machine learning method to prove experimentally that this methodology is consistent and the results repeatable.

Download Full-text

A unified machine learning method for task-related and resting state fMRI data analysis

2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/embc.2014.6945099 ◽

2014 ◽

Author(s):

Xiaomu Song ◽

Nan-kuei Chen

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Resting State ◽

Resting State Fmri ◽

Fmri Data ◽

Machine Learning Method ◽

Learning Method ◽

Fmri Data Analysis

Download Full-text

Studying the capabilities of the analytical system based on the machine learning method

Radio Industry (Russia) ◽

10.21778/2413-9599-2020-30-3-112-126 ◽

2020 ◽

Vol 30 (3) ◽

pp. 112-126

Author(s):

S. V. Palmov

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Predictive Models ◽

Arithmetic Mean ◽

Machine Learning Method ◽

Learning Method ◽

Learning Tools ◽

Analytical System ◽

Reliability And Robustness ◽

Almost All

Data analysis carried out by machine learning tools has covered almost all areas of human activity. This is due to a large amount of data that needs to be processed in order, for example, to predict the occurrence of specific events (an emergency, a customer contacting the organization’s technical support, a natural disaster, etc.) or to formulate recommendations regarding interaction with a certain group of people (personalized offers for the customer, a person’s reaction to advertising, etc.). The paper deals with the possibilities of the Multitool analytical system, created based on the machine learning method «decision tree», in terms of building predictive models that are suitable for solving data analysis problems in practical use. For this purpose, a series of ten experiments was conducted, in which the results generated by the system were evaluated in terms of their reliability and robustness using five criteria: arithmetic mean, standard deviation, variance, probability, and F-measure. As a result, it was found that Multitool, despite its limited functionality, allows creating predictive models of sufficient quality and suitable for practical use.

Download Full-text

Data analysis for a set of university student lists using the k-Nearest Neighbors machine learning method

Journal of Physics Conference Series ◽

10.1088/1742-6596/1514/1/012011 ◽

2020 ◽

Vol 1514 ◽

pp. 012011

Author(s):

D Pedrozo ◽

F Barajas ◽

A Estupiñán ◽

K L Cristiano ◽

D A Triana

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Nearest Neighbors ◽

University Student ◽

Machine Learning Method ◽

Learning Method ◽

K Nearest Neighbors

Download Full-text

New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data

2006 International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/iembs.2006.4398195 ◽

2006 ◽

Author(s):

Ching Wei Wang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Machine Learning Method ◽

Learning Method ◽

Expression Data ◽

Ensemble Machine Learning

Download Full-text

New interpretable machine-learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy

Patterns ◽

10.1016/j.patter.2021.100372 ◽

2021 ◽

pp. 100372

Author(s):

Evan Greene ◽

Greg Finak ◽

Leonard A. D'Amico ◽

Nina Bhardwaj ◽

Candice D. Church ◽

...

Keyword(s):

Machine Learning ◽

Single Cell ◽

Clinical Response ◽

Cancer Immunotherapy ◽

Machine Learning Method ◽

Learning Method ◽

Interpretable Machine Learning ◽

Cell Data

Download Full-text

Cell type prioritization in single-cell data

10.1101/2019.12.20.884916 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michael A. Skinnider ◽

Jordan W. Squair ◽

Claudia Kathe ◽

Mark A. Anderson ◽

Matthieu Gautier ◽

...

Keyword(s):

Single Cell ◽

Neural Circuits ◽

Cell Types ◽

Chromatin Accessibility ◽

High Dimensional ◽

Machine Learning Method ◽

Learning Method ◽

Rna Seq ◽

Cell Type ◽

Cell Data

We present a machine-learning method to prioritize the cell types most responsive to biological perturbations within high-dimensional single-cell data. We validate our method, Augur (https://github.com/neurorestore/Augur), on a compendium of single-cell RNA-seq, chromatin accessibility, and imaging transcriptomics datasets. We apply Augur to expose the neural circuits that enable walking after paralysis in response to spinal cord neurostimulation.

Download Full-text

Predicting candidate genes from phenotypes, functions and anatomical site of expression

Bioinformatics ◽

10.1093/bioinformatics/btaa879 ◽

2020 ◽

Author(s):

Jun Chen ◽

Azza Althagafi ◽

Robert Hoehndorf

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Prioritization ◽

Supplementary Information ◽

Model Organisms ◽

Anatomical Site ◽

Machine Learning Method ◽

Gene Products ◽

Learning Method ◽

Biomedical Ontologies

Abstract Motivation Over the past years, many computational methods have been developed to incorporate information about phenotypes for disease–gene prioritization task. These methods generally compute the similarity between a patient’s phenotypes and a database of gene-phenotype to find the most phenotypically similar match. The main limitation in these methods is their reliance on knowledge about phenotypes associated with particular genes, which is not complete in humans as well as in many model organisms, such as the mouse and fish. Information about functions of gene products and anatomical site of gene expression is available for more genes and can also be related to phenotypes through ontologies and machine-learning models. Results We developed a novel graph-based machine-learning method for biomedical ontologies, which is able to exploit axioms in ontologies and other graph-structured data. Using our machine-learning method, we embed genes based on their associated phenotypes, functions of the gene products and anatomical location of gene expression. We then develop a machine-learning model to predict gene–disease associations based on the associations between genes and multiple biomedical ontologies, and this model significantly improves over state-of-the-art methods. Furthermore, we extend phenotype-based gene prioritization methods significantly to all genes, which are associated with phenotypes, functions or site of expression. Availability and implementation Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text