Machine learning in prognosis research

Author(s):  
Mihaela van der Schaar ◽  
Harry Hemingway

Machine learning offers an alternative to the methods for prognosis research in large and complex datasets and for delivering dynamic models of prognosis. Machine learning foregrounds the capacity to learn from large and complex data about the pathways, predictors, and trajectories of health outcomes in individuals. This reflects wider societal drives for data-driven modelling embedded and automated within powerful computers to analyse large amounts of data. Machine learning derives algorithms that can learn from data and can allow the data full freedom, for example, to follow a pragmatic approach in developing a prognostic model. Rather than choosing factors for model development in advance, machine learning allows the data to reveal which features are important for which predictions. This chapter introduces key machine learning concepts relevant to each of the four prognosis research types, explains where it may enhance prognosis research, and highlights challenges.

2021 ◽  
Vol 4 ◽  
Author(s):  
Shailesh Tripathi ◽  
David Muhr ◽  
Manuel Brunner ◽  
Herbert Jodlbauer ◽  
Matthias Dehmer ◽  
...  

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Xisong Liang ◽  
Zeyu Wang ◽  
Ziyu Dai ◽  
Hao Zhang ◽  
Quan Cheng ◽  
...  

Malignant neoplasms are characterized by poor therapeutic efficacy, high recurrence rate, and extensive metastasis, leading to short survival. Previous methods for grouping prognostic risks are based on anatomic, clinical, and pathological features that exhibit lower distinguishing capability compared with genetic signatures. The update of sequencing techniques and machine learning promotes the genetic panels-based prognostic model development, especially the RNA-panel models. Gliomas harbor the most malignant features and the poorest survival among all tumors. Currently, numerous glioma prognostic models have been reported. We systematically reviewed all 138 machine-learning-based genetic models and proposed novel criteria in assessing their quality. Besides, the biological and clinical significance of some highly overlapped glioma markers in these models were discussed. This study screened out markers with strong prognostic potential and 27 models presenting high quality. Conclusively, we comprehensively reviewed 138 prognostic models combined with glioma genetic panels and presented novel criteria for the development and assessment of clinically important prognostic models. This will guide the genetic models in cancers from laboratory-based research studies to clinical applications and improve glioma patient prognostic management.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254062
Author(s):  
Ramona Leenings ◽  
Nils Ralf Winter ◽  
Lucas Plagwitz ◽  
Vincent Holstein ◽  
Jan Ernsting ◽  
...  

PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development. It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences. It is especially designed to support the iterative model development process and automates the repetitive training, hyperparameter optimization and evaluation tasks. Importantly, the workflow ensures unbiased performance estimates while still allowing the user to fully customize the machine learning analysis. PHOTONAI extends existing solutions with a novel pipeline implementation supporting more complex data streams, feature combinations, and algorithm selection. Metrics and results can be conveniently visualized using the PHOTONAI Explorer and predictive models are shareable in a standardized format for further external validation or application. A growing add-on ecosystem allows researchers to offer data modality specific algorithms to the community and enhance machine learning in the areas of the life sciences. Its practical utility is demonstrated on an exemplary medical machine learning problem, achieving a state-of-the-art solution in few lines of code. Source code is publicly available on Github, while examples and documentation can be found at www.photon-ai.com.


2012 ◽  
Author(s):  
Michael Ghil ◽  
Mickael D. Chekroun ◽  
Dmitri Kondrashov ◽  
Michael K. Tippett ◽  
Andrew Robertson ◽  
...  

2020 ◽  
Vol 21 ◽  
Author(s):  
Sukanya Panja ◽  
Sarra Rahem ◽  
Cassandra J. Chu ◽  
Antonina Mitrofanova

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4736
Author(s):  
Sk. Tanzir Mehedi ◽  
Adnan Anwar ◽  
Ziaur Rahman ◽  
Kawsar Ahmed

The Controller Area Network (CAN) bus works as an important protocol in the real-time In-Vehicle Network (IVN) systems for its simple, suitable, and robust architecture. The risk of IVN devices has still been insecure and vulnerable due to the complex data-intensive architectures which greatly increase the accessibility to unauthorized networks and the possibility of various types of cyberattacks. Therefore, the detection of cyberattacks in IVN devices has become a growing interest. With the rapid development of IVNs and evolving threat types, the traditional machine learning-based IDS has to update to cope with the security requirements of the current environment. Nowadays, the progression of deep learning, deep transfer learning, and its impactful outcome in several areas has guided as an effective solution for network intrusion detection. This manuscript proposes a deep transfer learning-based IDS model for IVN along with improved performance in comparison to several other existing models. The unique contributions include effective attribute selection which is best suited to identify malicious CAN messages and accurately detect the normal and abnormal activities, designing a deep transfer learning-based LeNet model, and evaluating considering real-world data. To this end, an extensive experimental performance evaluation has been conducted. The architecture along with empirical analyses shows that the proposed IDS greatly improves the detection accuracy over the mainstream machine learning, deep learning, and benchmark deep transfer learning models and has demonstrated better performance for real-time IVN security.


Author(s):  
Yun Peng ◽  
Byron Choi ◽  
Jianliang Xu

AbstractGraphs have been widely used to represent complex data in many applications, such as e-commerce, social networks, and bioinformatics. Efficient and effective analysis of graph data is important for graph-based applications. However, most graph analysis tasks are combinatorial optimization (CO) problems, which are NP-hard. Recent studies have focused a lot on the potential of using machine learning (ML) to solve graph-based CO problems. Most recent methods follow the two-stage framework. The first stage is graph representation learning, which embeds the graphs into low-dimension vectors. The second stage uses machine learning to solve the CO problems using the embeddings of the graphs learned in the first stage. The works for the first stage can be classified into two categories, graph embedding methods and end-to-end learning methods. For graph embedding methods, the learning of the the embeddings of the graphs has its own objective, which may not rely on the CO problems to be solved. The CO problems are solved by independent downstream tasks. For end-to-end learning methods, the learning of the embeddings of the graphs does not have its own objective and is an intermediate step of the learning procedure of solving the CO problems. The works for the second stage can also be classified into two categories, non-autoregressive methods and autoregressive methods. Non-autoregressive methods predict a solution for a CO problem in one shot. A non-autoregressive method predicts a matrix that denotes the probability of each node/edge being a part of a solution of the CO problem. The solution can be computed from the matrix using search heuristics such as beam search. Autoregressive methods iteratively extend a partial solution step by step. At each step, an autoregressive method predicts a node/edge conditioned to current partial solution, which is used to its extension. In this survey, we provide a thorough overview of recent studies of the graph learning-based CO methods. The survey ends with several remarks on future research directions.


Author(s):  
Ekaterina Kochmar ◽  
Dung Do Vu ◽  
Robert Belfer ◽  
Varun Gupta ◽  
Iulian Vlad Serban ◽  
...  

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.


Sign in / Sign up

Export Citation Format

Share Document