BowSaw: inferring higher-order trait interactions associated with complex biological phenotypes

Mapping Intimacies ◽

10.1101/839357 ◽

2019 ◽

Author(s):

Demetrius DiMucci ◽

Mark Kon ◽

Daniel Segrè

Keyword(s):

Random Forest ◽

Human Microbiome ◽

Human Microbiome Project ◽

Black Box ◽

Random Forest Algorithm ◽

Biological Complexity ◽

Noise Levels ◽

Mechanistic Insight ◽

System Variables

AbstractMachine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g. from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue towards new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset, and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.

Download Full-text

BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.663532 ◽

2021 ◽

Vol 8 ◽

Author(s):

Demetrius DiMucci ◽

Mark Kon ◽

Daniel Segrè

Keyword(s):

Random Forest ◽

Human Microbiome ◽

Human Microbiome Project ◽

Black Box ◽

Random Forest Algorithm ◽

Biological Complexity ◽

Noise Levels ◽

Mechanistic Insight ◽

System Variables

Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue toward new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.

Download Full-text

Image Classification of Rice Leaf Diseases Using Random Forest Algorithm

2021 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunication Engineering ◽

10.1109/ectidamtncon51128.2021.9425696 ◽

2021 ◽

Author(s):

Panuwat Mekha ◽

Nutnicha Teeyasuksaet

Keyword(s):

Random Forest ◽

Image Classification ◽

Random Forest Algorithm ◽

Rice Leaf

Download Full-text

Classification of Headache Disorder Using Random Forest Algorithm

2020 4th International Conference on Informatics and Computational Sciences (ICICoS) ◽

10.1109/icicos51170.2020.9299105 ◽

2020 ◽

Author(s):

Dhiyaussalam ◽

Adi Wibowo ◽

Fajar Agung Nugroho ◽

Eko Adi Sarwoko ◽

I Made Agus Setiawan

Keyword(s):

Random Forest ◽

Headache Disorder ◽

Random Forest Algorithm

Download Full-text

Semi-supervised classification of hyperspectral image using random forest algorithm

2014 IEEE Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2014.6947074 ◽

2014 ◽

Cited By ~ 7

Author(s):

S. Amini ◽

S. Homayouni ◽

A. Safari

Keyword(s):

Random Forest ◽

Supervised Classification ◽

Hyperspectral Image ◽

Random Forest Algorithm

Download Full-text

Big Data Analysis and Classification of Biomedical Signal Using Random Forest Algorithm

Advances in Intelligent Systems and Computing - New Paradigm in Decision Science and Management ◽

10.1007/978-981-13-9330-3_20 ◽

2019 ◽

pp. 217-224

Author(s):

Saumendra Kumar Mohapatra ◽

Mihir Narayan Mohanty

Keyword(s):

Big Data ◽

Data Analysis ◽

Random Forest ◽

Big Data Analysis ◽

Random Forest Algorithm ◽

Biomedical Signal

Download Full-text

Classification of pathological disorders in children using random forest algorithm

2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE) ◽

10.1109/ic-etite47903.2020.253 ◽

2020 ◽

Author(s):

Sujit Bebortta ◽

Manoranjan Panda ◽

Shradhanjali Panda

Keyword(s):

Random Forest ◽

Random Forest Algorithm

Download Full-text

Classification of Continuous Sky Brightness Data Using Random Forest

Advances in Astronomy ◽

10.1155/2020/5102065 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Rhorom Priyatikanto ◽

Lidia Mayangsari ◽

Rudi A. Prihandoko ◽

Agustinus G. Admiranto

Keyword(s):

Random Forest ◽

Observation Time ◽

Classification Model ◽

Posterior Distributions ◽

Random Forest Algorithm ◽

Negative Effect ◽

Sky Brightness ◽

Global Statistics ◽

Effect Of Light

Sky brightness measuring and monitoring are required to mitigate the negative effect of light pollution as a byproduct of modern civilization. Good handling of a pile of sky brightness data includes evaluation and classification of the data according to its quality and characteristics such that further analysis and inference can be conducted properly. This study aims to develop a classification model based on Random Forest algorithm and to evaluate its performance. Using sky brightness data from 1250 nights with minute temporal resolution acquired at eight different stations in Indonesia, datasets consisting of 15 features were created to train and test the model. Those features were extracted from the observation time, the global statistics of nightly sky brightness, or the light curve characteristics. Among those features, 10 are considered to be the most important for the classification task. The model was trained to classify the data into six classes (1: peculiar data, 2: overcast, 3: cloudy, 4: clear, 5: moonlit-cloudy, and 6: moonlit-clear) and then tested to achieve high accuracy (92%) and scores (F-score = 84% and G-mean = 84%). Some misclassifications exist, but the classification results are considerably good as indicated by posterior distributions of the sky brightness as a function of classes. Data classified as class-4 have sharp distribution with typical full width at half maximum of 1.5 mag/arcsec2, while distributions of class-2 and -3 are left skewed with the latter having lighter tail. Due to the moonlight, distributions of class-5 and -6 data are more smeared or have larger spread. These results demonstrate that the established classification model is reasonably good and consistent.

Download Full-text