scholarly journals SMILE: systems metabolomics using interpretable learning and evolution

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Chengyuan Sha ◽  
Miroslava Cuperlovic-Culf ◽  
Ting Hu

Abstract Background Direct link between metabolism and cell and organism phenotype in health and disease makes metabolomics, a high throughput study of small molecular metabolites, an essential methodology for understanding and diagnosing disease development and progression. Machine learning methods have seen increasing adoptions in metabolomics thanks to their powerful prediction abilities. However, the “black-box” nature of many machine learning models remains a major challenge for wide acceptance and utility as it makes the interpretation of decision process difficult. This challenge is particularly predominant in biomedical research where understanding of the underlying decision making mechanism is essential for insuring safety and gaining new knowledge. Results In this article, we proposed a novel computational framework, Systems Metabolomics using Interpretable Learning and Evolution (SMILE), for supervised metabolomics data analysis. Our methodology uses an evolutionary algorithm to learn interpretable predictive models and to identify the most influential metabolites and their interactions in association with disease. Moreover, we have developed a web application with a graphical user interface that can be used for easy analysis, interpretation and visualization of the results. Performance of the method and utilization of the web interface is shown using metabolomics data for Alzheimer’s disease. Conclusions SMILE was able to identify several influential metabolites on AD and to provide interpretable predictive models that can be further used for a better understanding of the metabolic background of AD. SMILE addresses the emerging issue of interpretability and explainability in machine learning, and contributes to more transparent and powerful applications of machine learning in bioinformatics.

2021 ◽  
Author(s):  
Marco Del Giudice

In this paper, I highlight a problem that has become ubiquitous in scientific applications of machine learning methods, and can lead to seriously distorted inferences about the phenomena under study. I call it the prediction-explanation fallacy. The fallacy occurs when researchers use prediction-optimized models for explanatory purposes, without considering the tradeoffs between explanation and prediction. This is a problem for at least two reasons. First, prediction-optimized models are often deliberately biased and unrealistic in order to prevent overfitting, and hence fail to accurately explain the phenomenon of interest. In other cases, they have an exceedingly complex structure that is hard or impossible to interpret, which greatly limits their explanatory value. Second, different predictive models trained on the same or similar data can be biased in different ways, so that multiple models may predict equally well but suggest conflicting explanations of the underlying phenomenon. In this note I introduce the tradeoffs between prediction and explanation in a non-technical fashion, present some illustrative examples from neuroscience, and end by discussing some mitigating factors and methods that can be used to limit or circumvent the problem.


2022 ◽  
Vol 2 (14) ◽  
pp. 26-34
Author(s):  
Nguyen Manh Thang ◽  
Tran Thi Luong

Abstract—Almost developed applications tend to become as accessible as possible to the user on the Internet. Different applications often store their data in cyberspace for more effective work and entertainment, such as Google Docs, emails, cloud storage, maps, weather, news,... Attacks on Web resources most often occur at the application level, in the form of HTTP/HTTPS-requests to the site, where traditional firewalls have limited capabilities for analysis and detection attacks. To protect Web resources from attacks at the application level, there are special tools - Web Application Firewall (WAF). This article presents an anomaly detection algorithm, and how it works in the open-source web application firewall ModSecurity, which uses machine learning methods with 8 suggested features to detect attacks on web applications. Tóm tắt—Hầu hết các ứng dụng được phát triển có xu hướng trở nên dễ tiếp cận nhất có thể đối với người dùng qua Internet. Các ứng dụng khác nhau thường lưu trữ dữ liệu trên không gian mạng để làm việc và giải trí hiệu quả hơn, chẳng hạn như Google Docs, email, lưu trữ đám mây, bản đồ, thời tiết, tin tức,... Các cuộc tấn công vào tài nguyên Web thường xảy ra nhất ở tầng ứng dụng, dưới dạng các yêu cầu HTTP/HTTPS đến trang web, nơi tường lửa truyền thống có khả năng hạn chế trong việc phân tích và phát hiện các cuộc tấn công. Để bảo vệ tài nguyên Web khỏi các cuộc tấn công ở tầng ứng dụng, xuất hiện các công cụ đặc biệt - Tường lửa Ứng dụng Web (WAF). Bài viết này trình bày thuật toán phát hiện bất thường và cách thức hoạt động của tường lửa ứng dụng web mã nguồn mở ModSecurity khi sử dụng phương pháp học máy với 8 đặc trưng được đề xuất để phát hiện các cuộc tấn công vào các ứng dụng web.


2022 ◽  
Author(s):  
Marcus Kubsch ◽  
Christina Krist ◽  
Joshua Rosenberg

Machine learning has become commonplace in educational research and science education research, especially to support assessment efforts. Such applications of machine learning have shown their promise in replicating and scaling human-driven codes of students’ work. Despite this promise, we and other scholars argue that machine learning has not achieved its transformational potential. We argue that this is because our field is currently lacking frameworks for supporting creative, principled, and critical endeavors to use machine learning in science education research. To offer considerations for science education researchers’ use of ML, we present a framework, Distributing Epistemic Functions and Tasks (DEFT), that highlights the functions and tasks that pertain to generating knowledge that can be carried out by either trained researchers or machine learning algorithms. Such considerations are critical decisions that should occur alongside those about, for instance, the type of data or algorithm used. We apply this framework to two cases, one that exemplifies the cutting-edge use of machine learning in science education research and another that offers a wholly different means of using machine learning and human-driven inquiry together. We conclude with strategies for researchers to adopt machine learning and call for the field to rethink how we prepare science education researchers in an era of great advances in computational power and access to machine learning methods.


2019 ◽  
Vol 24 (34) ◽  
pp. 3998-4006
Author(s):  
Shijie Fan ◽  
Yu Chen ◽  
Cheng Luo ◽  
Fanwang Meng

Background: On a tide of big data, machine learning is coming to its day. Referring to huge amounts of epigenetic data coming from biological experiments and clinic, machine learning can help in detecting epigenetic features in genome, finding correlations between phenotypes and modifications in histone or genes, accelerating the screen of lead compounds targeting epigenetics diseases and many other aspects around the study on epigenetics, which consequently realizes the hope of precision medicine. Methods: In this minireview, we will focus on reviewing the fundamentals and applications of machine learning methods which are regularly used in epigenetics filed and explain their features. Their advantages and disadvantages will also be discussed. Results: Machine learning algorithms have accelerated studies in precision medicine targeting epigenetics diseases. Conclusion: In order to make full use of machine learning algorithms, one should get familiar with the pros and cons of them, which will benefit from big data by choosing the most suitable method(s).


2020 ◽  
Vol 4 (2) ◽  
pp. 61
Author(s):  
Yi Di Boon ◽  
Sunil Chandrakant Joshi ◽  
Somen Kumar Bhudolia ◽  
Goram Gohel

Advanced manufacturing techniques, such as automated fiber placement and additive manufacturing enables the fabrication of fiber-reinforced polymer composite components with customized material and structural configurations. In order to take advantage of this customizability, the design process for fiber-reinforced polymer composite components needs to be improved. Machine learning methods have been identified as potential techniques capable of handling the complexity of the design problem. In this review, the applications of machine learning methods in various aspects of structural component design are discussed. They include studies on microstructure-based material design, applications of machine learning models in stress analysis, and topology optimization of fiber-reinforced polymer composites. A design automation framework for performance-optimized fiber-reinforced polymer composite components is also proposed. The proposed framework aims to provide a comprehensive and efficient approach for the design and optimization of fiber-reinforced polymer composite components. The challenges in building the models required for the proposed framework are also discussed briefly.


2017 ◽  
Vol 114 (40) ◽  
pp. 10601-10605 ◽  
Author(s):  
Daniel M. Sussman ◽  
Samuel S. Schoenholz ◽  
Ekin D. Cubuk ◽  
Andrea J. Liu

Nanometrically thin glassy films depart strikingly from the behavior of their bulk counterparts. We investigate whether the dynamical differences between a bulk and thin film polymeric glass former can be understood by differences in local microscopic structure. Machine learning methods have shown that local structure can serve as the foundation for successful, predictive models of particle rearrangement dynamics in bulk systems. By contrast, in thin glassy films, we find that particles at the center of the film and those near the surface are structurally indistinguishable despite exhibiting very different dynamics. Next, we show that structure-independent processes, already present in bulk systems and demonstrably different from simple facilitated dynamics, are crucial for understanding glassy dynamics in thin films. Our analysis suggests a picture of glassy dynamics in which two dynamical processes coexist, with relative strengths that depend on the distance from an interface. One of these processes depends on local structure and is unchanged throughout most of the film, while the other is purely Arrhenius, does not depend on local structure, and is strongly enhanced near the free surface of a film.


2017 ◽  
Author(s):  
Fadhl M Alakwaa ◽  
Kumardeep Chaudhary ◽  
Lana X Garmire

ABSTRACTMetabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+) and 67 negative estrogen receptor (ER-), to test the accuracies of autoencoder, a deep learning (DL) framework, as well as six widely used machine learning models, namely Random Forest (RF), Support Vector Machines (SVM), Recursive Partitioning and Regression Trees (RPART), Linear Discriminant Analysis (LDA), Prediction Analysis for Microarrays (PAM), and Generalized Boosted Models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER-patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value<0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion & absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accurcy (AUC=0.93) and better revelation of disease biology. We encourage the adoption of autoencoder based deep learning method in the metabolomics research community for classification.


2021 ◽  
Vol 4 (s1) ◽  
Author(s):  
Cecile Valsecchi ◽  
Francesca Grisoni ◽  
Viviana Consonni ◽  
Davide Ballabio ◽  
Roberto Todeschini

Nuclear receptors (NRs) are involved in fundamental human health processes and are a relevant target for toxicological risk assessment. To help prioritize chemicals that can mimic natural hormones and be endocrine disruptors, computational models can be a useful tool.1,2 In this work we i) created an exhaustive collection of NR modulators and ii) applied machine learning methods to fill the data-gap and prioritize NRs modulators by building predictive models.


Sign in / Sign up

Export Citation Format

Share Document