scholarly journals Opening the Black Box: Interpretable Machine Learning for Geneticists

Author(s):  
Christina B. Azodi ◽  
Jiliang Tang ◽  
Shin-Han Shiu

Machine learning (ML) has emerged as a critical tool for making sense of the growing amount of genetic and genomic data available because of its ability to find complex patterns in high dimensional and heterogeneous data. While the complexity of ML models is what makes them powerful, it also makes them difficult to interpret. Fortunately, recent efforts to develop approaches that make the inner workings of ML models understandable to humans have improved our ability to make novel biological insights using ML. Here we discuss the importance of interpretable ML, different strategies for interpreting ML models, and examples of how these strategies have been applied. Finally, we identify challenges and promising future directions for interpretable ML in genetics and genomics.

Author(s):  
Kacper Sokol ◽  
Peter Flach

Understanding data, models and predictions is important for machine learning applications. Due to the limitations of our spatial perception and intuition, analysing high-dimensional data is inherently difficult. Furthermore, black-box models achieving high predictive accuracy are widely used, yet the logic behind their predictions is often opaque. Use of textualisation -- a natural language narrative of selected phenomena -- can tackle these shortcomings. When extended with argumentation theory we could envisage machine learning models and predictions arguing persuasively for their choices.


Author(s):  
Christopher J. Hansen ◽  
Dominic DiCostanzo ◽  
Randall J. Mumaw ◽  
Emily S. Patterson

The fields of healthcare and aviation can learn from one another about alerts and their potential for effective application through predictive analytics. We conducted a series of interactive discussions between an expert in alerts in aviation cockpits and graduate students specializing in the application of machine learning in healthcare, and particularly with respect to image analysis. We present our findings regarding insights for healthcare on alerts and for aviation on machine learning. Our findings suggest that ‘opening up the black box’ is important for highly skilled pilots to be able to process recommendations from complex algorithms in aviation, and that considering whether an alert or alarm is ‘actionable’ is important when directing the attention of nurses caring for more than one patient at a time in a hospital environment.


Author(s):  
Qianfan Wu ◽  
Adel Boueiz ◽  
Alican Bozkurt ◽  
Arya Masoomi ◽  
Allan Wang ◽  
...  

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 218936-218953
Author(s):  
Jose Tapia-Galisteo ◽  
Jose M. Iniesta ◽  
Carmen Perez-Gandia ◽  
Gema Garcia-Saez ◽  
Diego Urgeles Puertolas ◽  
...  

2018 ◽  
Vol 66 (4) ◽  
pp. 283-290 ◽  
Author(s):  
Johannes Brinkrolf ◽  
Barbara Hammer

Abstract Classification by means of machine learning models constitutes one relevant technology in process automation and predictive maintenance. However, common techniques such as deep networks or random forests suffer from their black box characteristics and possible adversarial examples. In this contribution, we give an overview about a popular alternative technology from machine learning, namely modern variants of learning vector quantization, which, due to their combined discriminative and generative nature, incorporate interpretability and the possibility of explicit reject options for irregular samples. We give an explicit bound on minimum changes required for a change of the classification in case of LVQ networks with reject option, and we demonstrate the efficiency of reject options in two examples.


2021 ◽  
Author(s):  
Marcelo Cajias ◽  
Willwersch Jonas ◽  
Lorenz Felix ◽  
Franz Fuerst

2018 ◽  
Author(s):  
Qianfan Wu ◽  
Adel Boueiz ◽  
Alican Bozkurt ◽  
Arya Masoomi ◽  
Allan Wang ◽  
...  

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used auto-encoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1861
Author(s):  
João Brito ◽  
Hugo Proença

Interpretability has made significant strides in recent years, enabling the formerly black-box models to reach new levels of transparency. These kinds of models can be particularly useful to broaden the applicability of machine learning-based systems to domains where—apart from the predictions—appropriate justifications are also required (e.g., forensics and medical image analysis). In this context, techniques that focus on visual explanations are of particular interest here, due to their ability to directly portray the reasons that support a given prediction. Therefore, in this document, we focus on presenting the core principles of interpretability and describing the main methods that deliver visual cues (including one that we designed for periocular recognition in particular). Based on these intuitions, the experiments performed show explanations that attempt to highlight the most important periocular components towards a non-match decision. Then, some particularly challenging scenarios are presented to naturally sustain our conclusions and thoughts regarding future directions.


Sign in / Sign up

Export Citation Format

Share Document