High-dimensional role of AI and machine learning in cancer research

2017 ◽

Author(s):

Kacper Sokol ◽

Peter Flach

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Spatial Perception ◽

Black Box ◽

High Dimensional ◽

Box Models ◽

Machine Learning Applications ◽

Black Box Models ◽

Machine Learning Models

Understanding data, models and predictions is important for machine learning applications. Due to the limitations of our spatial perception and intuition, analysing high-dimensional data is inherently difficult. Furthermore, black-box models achieving high predictive accuracy are widely used, yet the logic behind their predictions is often opaque. Use of textualisation -- a natural language narrative of selected phenomena -- can tackle these shortcomings. When extended with argumentation theory we could envisage machine learning models and predictions arguing persuasively for their choices.

Download Full-text

Investigating the role of Simpson’s paradox in the analysis of top-ranked features in high-dimensional bioinformatics datasets

Briefings in Bioinformatics ◽

10.1093/bib/bby126 ◽

2019 ◽

Vol 21 (2) ◽

pp. 421-428 ◽

Cited By ~ 1

Author(s):

Alex A Freitas

Keyword(s):

Machine Learning ◽

High Dimensional ◽

Feature Ranking ◽

Ranking Methods ◽

Simpson’S Paradox ◽

Small Set ◽

Simpson's Paradox ◽

High Dimensional Datasets ◽

Class Variable

Abstract An important problem in bioinformatics consists of identifying the most important features (or predictors), among a large number of features in a given classification dataset. This problem is often addressed by using a machine learning–based feature ranking method to identify a small set of top-ranked predictors (i.e. the most relevant features for classification). The large number of studies in this area has, however, an important limitation: they ignore the possibility that the top-ranked predictors occur in an instance of Simpson’s paradox, where the positive or negative association between a predictor and a class variable reverses sign upon conditional on each of the values of a third (confounder) variable. In this work, we review and investigate the role of Simpson’s paradox in the analysis of top-ranked predictors in high-dimensional bioinformatics datasets, in order to avoid the potential danger of misinterpreting an association between a predictor and the class variable. We perform computational experiments using four well-known feature ranking methods from the machine learning field and five high-dimensional datasets of ageing-related genes, where the predictors are Gene Ontology terms. The results show that occurrences of Simpson’s paradox involving top-ranked predictors are much more common for one of the feature ranking methods.

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

Leveraging Machine Learning to Characterize the Role of Socio-Economic Determinants on Physical Health and Well-Being Among Veterans

SSRN Electronic Journal ◽

10.2139/ssrn.3686845 ◽

2020 ◽

Author(s):

Christos Makridis ◽

David Zhao ◽

Cosmin (Adi) Bejan ◽

Gil Alterovitz

Keyword(s):

Machine Learning ◽

Physical Health ◽

Well Being ◽

Economic Determinants ◽

Health And Well Being

Download Full-text

Classification of Brainwaves for Sleep Stages by High-Dimensional FFT Features from EEG Signals

Applied Sciences ◽

10.3390/app10051797 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1797 ◽

Cited By ~ 2

Author(s):

Mera Kartika Delimayanti ◽

Bedy Purnama ◽

Ngoc Giang Nguyen ◽

Mohammad Reza Faisal ◽

Kunti Robiatul Mahmudah ◽

...

Keyword(s):

Machine Learning ◽

Sleep Stage ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Sleep Stages ◽

Eeg Signals ◽

Stage Classification ◽

Sleep Stage Classification ◽

Low Dimensional

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.

Download Full-text

Role of machine learning algorithms over heart diseases prediction

10.1063/5.0030743 ◽

2020 ◽

Author(s):

Siva Kumar Jonnavithula ◽

Abhilash Kumar Jha ◽

Modepalli Kavitha ◽

Singaraju Srinivasulu

Keyword(s):

Machine Learning ◽

Heart Diseases ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

The role of residential history in cancer research: A scoping review

Social Science & Medicine ◽

10.1016/j.socscimed.2020.113657 ◽

2021 ◽

Vol 270 ◽

pp. 113657

Author(s):

S. Namin ◽

Y. Zhou ◽

J. Neuner ◽

K. Beyer

Keyword(s):

Cancer Research ◽

Scoping Review ◽

Residential History

Download Full-text

The role of machine learning analytics and metrics in retailing research

Journal of Retailing ◽

10.1016/j.jretai.2020.12.001 ◽

2020 ◽

Author(s):

Xin (Shane) Wang ◽

Jun Hyun (Joseph) Ryoo ◽

Neil Bendle ◽

Praveen K. Kopalle

Keyword(s):

Machine Learning ◽

Learning Analytics

Download Full-text

The Role of Network Science in Glioblastoma

Cancers ◽

10.3390/cancers13051045 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1045

Author(s):

Marta B. Lopes ◽

Eduarda P. Martins ◽

Susana Vinga ◽

Bruno M. Costa

Keyword(s):

Personalized Medicine ◽

Drug Development ◽

Information Flow ◽

Clinical Studies ◽

Network Science ◽

Cancer Genomics ◽

High Dimensional ◽

Network Discovery ◽

Software Implementations

Network science has long been recognized as a well-established discipline across many biological domains. In the particular case of cancer genomics, network discovery is challenged by the multitude of available high-dimensional heterogeneous views of data. Glioblastoma (GBM) is an example of such a complex and heterogeneous disease that can be tackled by network science. Identifying the architecture of molecular GBM networks is essential to understanding the information flow and better informing drug development and pre-clinical studies. Here, we review network-based strategies that have been used in the study of GBM, along with the available software implementations for reproducibility and further testing on newly coming datasets. Promising results have been obtained from both bulk and single-cell GBM data, placing network discovery at the forefront of developing a molecularly-informed-based personalized medicine.

Download Full-text

Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows

Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems ◽

10.1145/3411764.3445306 ◽

2021 ◽

Author(s):

Doris Xin ◽

Eva Yiwei Wu ◽

Doris Jung-Lin Lee ◽

Niloufar Salehi ◽

Aditya Parameswaran

Keyword(s):

Machine Learning

Download Full-text

High-dimensional role of AI and machine learning in cancer research

The Role of Textualisation and Argumentation in Understanding the Machine Learning Process

Investigating the role of Simpson’s paradox in the analysis of top-ranked features in high-dimensional bioinformatics datasets

Exchange Spin Coupling from Gaussian Process Regression

Leveraging Machine Learning to Characterize the Role of Socio-Economic Determinants on Physical Health and Well-Being Among Veterans

Classification of Brainwaves for Sleep Stages by High-Dimensional FFT Features from EEG Signals

Role of machine learning algorithms over heart diseases prediction

The role of residential history in cancer research: A scoping review

The role of machine learning analytics and metrics in retailing research

The Role of Network Science in Glioblastoma

Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows

Export Citation Format