scholarly journals Pattern discovery and disentanglement on relational datasets

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Andrew K. C. Wong ◽  
Pei-Yuan Zhou ◽  
Zahid A. Butt

AbstractMachine Learning has made impressive advances in many applications akin to human cognition for discernment. However, success has been limited in the areas of relational datasets, particularly for data with low volume, imbalanced groups, and mislabeled cases, with outputs that typically lack transparency and interpretability. The difficulties arise from the subtle overlapping and entanglement of functional and statistical relations at the source level. Hence, we have developed Pattern Discovery and Disentanglement System (PDD), which is able to discover explicit patterns from the data with various sizes, imbalanced groups, and screen out anomalies. We present herein four case studies on biomedical datasets to substantiate the efficacy of PDD. It improves prediction accuracy and facilitates transparent interpretation of discovered knowledge in an explicit representation framework PDD Knowledge Base that links the sources, the patterns, and individual patients. Hence, PDD promises broad and ground-breaking applications in genomic and biomedical machine learning.

Author(s):  
Anik Das ◽  
Mohamed M. Ahmed

Accurate lane-change prediction information in real time is essential to safely operate Autonomous Vehicles (AVs) on the roadways, especially at the early stage of AVs deployment, where there will be an interaction between AVs and human-driven vehicles. This study proposed reliable lane-change prediction models considering features from vehicle kinematics, machine vision, driver, and roadway geometric characteristics using the trajectory-level SHRP2 Naturalistic Driving Study and Roadway Information Database. Several machine learning algorithms were trained, validated, tested, and comparatively analyzed including, Classification And Regression Trees (CART), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naïve Bayes (NB) based on six different sets of features. In each feature set, relevant features were extracted through a wrapper-based algorithm named Boruta. The results showed that the XGBoost model outperformed all other models in relation to its highest overall prediction accuracy (97%) and F1-score (95.5%) considering all features. However, the highest overall prediction accuracy of 97.3% and F1-score of 95.9% were observed in the XGBoost model based on vehicle kinematics features. Moreover, it was found that XGBoost was the only model that achieved a reliable and balanced prediction performance across all six feature sets. Furthermore, a simplified XGBoost model was developed for each feature set considering the practical implementation of the model. The proposed prediction model could help in trajectory planning for AVs and could be used to develop more reliable advanced driver assistance systems (ADAS) in a cooperative connected and automated vehicle environment.


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A166-A166
Author(s):  
Ankita Paul ◽  
Karen Wong ◽  
Anup Das ◽  
Diane Lim ◽  
Miranda Tan

Abstract Introduction Cancer patients are at an increased risk of moderate-to-severe obstructive sleep apnea (OSA). The STOP-Bang score is a commonly used screening questionnaire to assess risk of OSA in the general population. We hypothesize that cancer-relevant features, like radiation therapy (RT), may be used to determine the risk of OSA in cancer patients. Machine learning (ML) with non-parametric regression is applied to increase the prediction accuracy of OSA risk. Methods Ten features namely STOP-Bang score, history of RT to the head/neck/thorax, cancer type, cancer stage, metastasis, hypertension, diabetes, asthma, COPD, and chronic kidney disease were extracted from a database of cancer patients with a sleep study. The ML technique, K-Nearest-Neighbor (KNN), with a range of k values (5 to 20), was chosen because, unlike Logistic Regression (LR), KNN is not presumptive of data distribution and mapping function, and supports non-linear relationships among features. A correlation heatmap was computed to identify features having high correlation with OSA. Principal Component Analysis (PCA) was performed on the correlated features and then KNN was applied on the components to predict the risk of OSA. Receiver Operating Characteristic (ROC) - Area Under Curve (AUC) and Precision-Recall curves were computed to compare and validate performance for different test sets and majority class scenarios. Results In our cohort of 174 cancer patients, the accuracy in determining OSA among cancer patients using STOP-Bang score was 82.3% (LR) and 90.69% (KNN) but reduced to 89.9% in KNN using all 10 features mentioned above. PCA + KNN application using STOP-Bang score and RT as features, increased prediction accuracy to 94.1%. We validated our ML approach using a separate cohort of 20 cancer patients; the accuracies in OSA prediction were 85.57% (LR), 91.1% (KNN), and 92.8% (PCA + KNN). Conclusion STOP-Bang score and history of RT can be useful to predict risk of OSA in cancer patients with the PCA + KNN approach. This ML technique can refine screening tools to improve prediction accuracy of OSA in cancer patients. Larger studies investigating additional features using ML may improve OSA screening accuracy in various populations Support (if any):


2021 ◽  
pp. 027836492098785
Author(s):  
Julian Ibarz ◽  
Jie Tan ◽  
Chelsea Finn ◽  
Mrinal Kalakrishnan ◽  
Peter Pastor ◽  
...  

Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low-level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time, real-world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn: as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.


Semantic Web ◽  
2020 ◽  
pp. 1-21
Author(s):  
Franziska Pannach ◽  
Caroline Sporleder ◽  
Wolfgang May ◽  
Aravind Krishnan ◽  
Anusharani Sewchurran

Vladimir Propp’s theory Morphology of the Folktale identifies 31 invariant functions, subfunctions, and seven classes of folktale characters to describe the narrative structure of the Russian magic tale. Since it was first published in 1928, Propp’s approach has been used on various folktales of different cultural backgrounds. ProppOntology models Propp’s theory by describing narrative functions using a combination of a function class hierarchy and characteristic relationships between the Dramatis Personae for each function. A special focus lies on the restrictions Propp defined regarding which Dramatis Personae fulfill a certain function. This paper investigates how an ontology can assist traditional Humanities research in examining how well Propp’s theory fits for folktales outside of the Russian–European folktale culture. For this purpose, a lightweight query system has been implemented. To determine how well both the annotation schema and the query system works, twenty African tales and fifteen tales from the Kerala region in India were annotated. The system is evaluated by examining two case studies regarding the representation of characters and the use of Proppian functions in African and Indian tales. The findings are in line with traditional analogous Humanities research. This project shows how carefully modelled ontologies can be utilized as a knowledge base for comparative folklore research.


Author(s):  
Joel Weijia Lai ◽  
Candice Ke En Ang ◽  
U. Rajendra Acharya ◽  
Kang Hao Cheong

Artificial Intelligence in healthcare employs machine learning algorithms to emulate human cognition in the analysis of complicated or large sets of data. Specifically, artificial intelligence taps on the ability of computer algorithms and software with allowable thresholds to make deterministic approximate conclusions. In comparison to traditional technologies in healthcare, artificial intelligence enhances the process of data analysis without the need for human input, producing nearly equally reliable, well defined output. Schizophrenia is a chronic mental health condition that affects millions worldwide, with impairment in thinking and behaviour that may be significantly disabling to daily living. Multiple artificial intelligence and machine learning algorithms have been utilized to analyze the different components of schizophrenia, such as in prediction of disease, and assessment of current prevention methods. These are carried out in hope of assisting with diagnosis and provision of viable options for individuals affected. In this paper, we review the progress of the use of artificial intelligence in schizophrenia.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi139-vi139
Author(s):  
Jan Lost ◽  
Tej Verma ◽  
Niklas Tillmanns ◽  
W R Brim ◽  
Harry Subramanian ◽  
...  

Abstract PURPOSE Identifying molecular subtypes in gliomas has prognostic and therapeutic value, traditionally after invasive neurosurgical tumor resection or biopsy. Recent advances using artificial intelligence (AI) show promise in using pre-therapy imaging for predicting molecular subtype. We performed a systematic review of recent literature on AI methods used to predict molecular subtypes of gliomas. METHODS Literature review conforming to PRSIMA guidelines was performed for publications prior to February 2021 using 4 databases: Ovid Embase, Ovid MEDLINE, Cochrane trials (CENTRAL), and Web of Science core-collection. Keywords included: artificial intelligence, machine learning, deep learning, radiomics, magnetic resonance imaging, glioma, and glioblastoma. Non-machine learning and non-human studies were excluded. Screening was performed using Covidence software. Bias analysis was done using TRIPOD guidelines. RESULTS 11,727 abstracts were retrieved. After applying initial screening exclusion criteria, 1,135 full text reviews were performed, with 82 papers remaining for data extraction. 57% used retrospective single center hospital data, 31.6% used TCIA and BRATS, and 11.4% analyzed multicenter hospital data. An average of 146 patients (range 34-462 patients) were included. Algorithms predicting IDH status comprised 51.8% of studies, MGMT 18.1%, and 1p19q 6.0%. Machine learning methods were used in 71.4%, deep learning in 27.4%, and 1.2% directly compared both methods. The most common algorithm for machine learning were support vector machine (43.3%), and for deep learning convolutional neural network (68.4%). Mean prediction accuracy was 76.6%. CONCLUSION Machine learning is the predominant method for image-based prediction of glioma molecular subtypes. Major limitations include limited datasets (60.2% with under 150 patients) and thus limited generalizability of findings. We recommend using larger annotated datasets for AI network training and testing in order to create more robust AI algorithms, which will provide better prediction accuracy to real world clinical datasets and provide tools that can be translated to clinical practice.


2021 ◽  
Author(s):  
Luc Blassel ◽  
Anna Tostevin ◽  
Christian Julian Villabona-Arenas ◽  
Martine Peeters ◽  
Stephane Hue ◽  
...  

Drug resistance mutations (DRMs) appear in HIV under treatment pressure. DRMs are commonly transmitted to naive patients. The standard approach to reveal new DRMs is to test for significant frequency differences of mutations between treated and naive patients. However, we then consider each mutation individually and cannot hope to study interactions between several mutations. Here, we aim to leverage the ever-growing quantity of high-quality sequence data and machine learning methods to study such interactions (i.e. epistasis), as well as try to find new DRMs. We trained classifiers to discriminate between Reverse Transcriptase Inhibitor (RTI)-experienced and RTI-naive samples on a large HIV-1 reverse transcriptase (RT) sequence dataset from the UK (n ≈ 55; 000), using all observed mutations as binary representation features. To assess the robustness of our findings, our classifiers were evaluated on independent data sets, both from the UK and Africa. Important representation features for each classifier were then extracted as potential DRMs. To find novel DRMs, we repeated this process by removing either features or samples associated to known DRMs. When keeping all known resistance signal, we detected sufficiently prevalent known DRMs, thus validating the approach. When removing features corresponding to known DRMs, our classifiers retained some prediction accuracy, and six new mutations significantly associated with resistance were identified. These six mutations have a low genetic barrier, are correlated to known DRMs, and are spatially close to either the RT active site or the regulatory binding pocket. When removing both known DRM features and sequences containing at least one known DRM, our classifiers lose all prediction accuracy. These results likely indicate that all mutations directly conferring resistance have been found, and that our newly discovered DRMs are accessory or compensatory mutations. Moreover, we did not find any significant signal of epistasis, beyond the standard resistance scheme associating major DRMs to auxiliary mutations.


Sign in / Sign up

Export Citation Format

Share Document