Data-Driven Machine Learning Approaches for Advanced Battery Modeling

Analysis of long-term multichannel EEG signals for automatic seizure detection is an active area of research that has seen application of methods from different domains of signal processing and machine learning. The majority of approaches developed in this context consist of extraction of hand-crafted features that are used to train a classifier for eventual seizure detection. Approaches that are data-driven, do not use hand-crafted features, and use small amounts of patients' historical EEG data for classifier training are few in number. The approach presented in this paper falls in the latter category, and is based on a signal-derived empirical dictionary approach, which utilizes empirical mode decomposition (EMD) and discrete wavelet transform (DWT) based dictionaries learned using a framework inspired by traditional methods of dictionary learning. Three features associated with traditional dictionary learning approaches, namely projection coefficients, coefficient vector and reconstruction error, are extracted from both EMD and DWT based dictionaries for automated seizure detection. This is the first time these features have been applied for automatic seizure detection using an empirical dictionary approach. Small amounts of patients' historical multi-channel EEG data are used for classifier training, and multiple classifiers are used for seizure detection using newer data. In addition, the seizure detection results are validated using 5-fold cross-validation to rule out any bias in the results. The CHB-MIT benchmark database containing long-term EEG recordings of pediatric patients is used for validation of the approach, and seizure detection performance comparable to the state-of-the-art is obtained. Seizure detection is performed using five classifiers, thereby allowing a comparison of the dictionary approaches, features extracted, and classifiers used. The best seizure detection performance is obtained using EMD based dictionary and reconstruction error feature and support vector machine classifier, with accuracy, sensitivity and specificity values of 88.2, 90.3, and 88.1%, respectively. Comparison is also made with other recent studies using the same database. The methodology presented in this paper is shown to be computationally efficient and robust for patient-specific automatic seizure detection. A data-driven methodology utilizing a small amount of patients' historical data is hence demonstrated as a practical solution for automatic seizure detection.

Download Full-text

Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease

10.1101/256008 ◽

2018 ◽

Cited By ~ 1

Author(s):

Andrew J. Steele ◽

S. Aylin Cakiroglu ◽

Anoop D. Shah ◽

Spiros C. Denaxas ◽

Harry Hemingway ◽

...

Keyword(s):

Machine Learning ◽

Clinical Practice ◽

Electronic Health Records ◽

Random Forests ◽

Missing Values ◽

Elastic Net ◽

Data Driven ◽

Learning Approaches ◽

Health Records ◽

Cox Models

AbstractPrognostic modelling is important in clinical practice and epidemiology for patient management and research. Electronic health records (EHR) provide large quantities of data for such models, but conventional epidemiological approaches require significant researcher time to implement. Expert selection of variables, fine-tuning of variable transformations and interactions, and imputing missing values in datasets are time-consuming and could bias subsequent analysis, particularly given that missingness in EHR is both high, and may carry meaning.Using a cohort of over 80,000 patients from the CALIBER programme, we performed a systematic comparison of several machine-learning approaches in EHR. We used Cox models and random survival forests with and without imputation on 27 expert-selected variables to predict all-cause mortality. We also used Cox models, random forests and elastic net regression on an extended dataset with 586 variables to build prognostic models and identify novel prognostic factors without prior expert input.We observed that data-driven models used on an extended dataset can outperform conventional models for prognosis, without data preprocessing or imputing missing values, and with no need to scale or transform continuous data. An elastic net Cox regression based with 586 unimputed variables with continuous values discretised achieved a C-index of 0.801 (bootstrapped 95% CI 0.799 to 0.802), compared to 0.793 (0.791 to 0.794) for a traditional Cox model comprising 27 expert-selected variables with imputation for missing values.We also found that data-driven models allow identification of novel prognostic variables; that the absence of values for particular variables carries meaning, and can have significant implications for prognosis; and that variables often have a nonlinear association with mortality, which discretised Cox models and random forests can elucidate.This demonstrates that machine-learning approaches applied to raw EHR data can be used to build reliable models for use in research and clinical practice, and identify novel predictive variables and their effects to inform future research.

Download Full-text

Modeling Evapotranspiration Response to Climatic Forcings Using Data-Driven Techniques in Grassland Ecosystems

Advances in Meteorology ◽

10.1155/2018/1824317 ◽

2018 ◽

Vol 2018 ◽

pp. 1-18 ◽

Cited By ~ 5

Author(s):

Xianming Dou ◽

Yongguo Yang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Water Vapor ◽

Data Driven ◽

Machine Learning Techniques ◽

Support Vector ◽

Group Method ◽

Generalized Regression Neural Network ◽

Learning Approaches ◽

Inference System

Remarkable progress has been made over the last decade toward characterizing the mechanisms that dominate the exchange of water vapor between the biosphere and the atmosphere. This is attributed partly to the considerable development of machine learning techniques that allow the scientific community to use these advanced tools for approximating the nonlinear processes affecting the variation of water vapor in terrestrial ecosystems. Three novel machine learning approaches, namely, group method of data handling, extreme learning machine (ELM), and adaptive neurofuzzy inference system (ANFIS), were developed to simulate and forecast the daily evapotranspiration (ET) at four different grassland sites based on the flux tower data using the eddy covariance method. These models were compared with the extensively utilized data-driven models, including artificial neural network, generalized regression neural network, and support vector machine (SVM). Moreover, the influences of internal functions on their corresponding models (SVM, ELM, and ANFIS) were investigated together. It was demonstrated that most developed models did good job of simulating and forecasting daily ET at the four sites. In addition to strengths of robustness and simplicity, the newly proposed methods achieved the estimates comparable to those of the conventional approaches and accordingly can be used as promising alternatives to traditional methods. It was further discovered that the generalization performance of the ELM, ANFIS, and SVM models strongly depended on their respective internal functions, especially for SVM.

Download Full-text

Human-Centric AI: The Symbiosis of Human and Artificial Intelligence

Entropy ◽

10.3390/e23030332 ◽

2021 ◽

Vol 23 (3) ◽

pp. 332

Author(s):

Davor Horvatić ◽

Tomislav Lipic

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Human Life ◽

Data Driven ◽

Learning Approaches ◽

Second Wave

Well-evidenced advances of data-driven complex machine learning approaches emerging within the so-called second wave of artificial intelligence (AI) fostered the exploration of possible AI applications in various domains and aspects of human life, practices, and society [...]

Download Full-text

Data Driven Encoding of Structures and Link Predictions in Large XML Document Collections

Advances in Data Mining and Database Management - XML Data Mining ◽

10.4018/978-1-61350-356-0.ch010 ◽

2011 ◽

pp. 219-241

Author(s):

Markus Hagenbuchner ◽

Chung Tsoi ◽

Shu Jia Zhang ◽

Milly Kc

Keyword(s):

Machine Learning ◽

Data Driven ◽

Training Dataset ◽

Web Pages ◽

Learning Approaches ◽

Document Collections ◽

Related Data ◽

Xml Documents ◽

Significant Research ◽

Clustering Problems

In recent years there have been some significant research towards the ability of processing related data, particularly the relatedness among atomic elements in a structure with those in another structure. A number of approaches have been developed with various degrees of success. This chapter provides an overview of machine learning approaches for the encoding of related atomic elements in one structure with those in other structures. The chapter briefly reviews a number of unsupervised approaches for such data structures which can be used for solving generic classification, regression, and clustering problems. We will apply this approach to a particularly interesting and challenging problem: The prediction of both the number and their locations of the in-links and out-links of a set of XML documents. In this problem, we are given a set of XML pages, which may represent web pages on the Internet, with in-links and out-links. Based on this training dataset, we wish to predict the number and locations of in-links and out-links of a set of XML documents, which are as yet not linked to other existing XML documents. To the best of our knowledge, this is the only known data driven unsupervised machine learning approach for the prediction of in-links and out-links of XML documents.

Download Full-text

Biomarker-Informed Machine Learning Model of Cognitive Fatigue from a Heart Rate Response Perspective

Sensors ◽

10.3390/s21113843 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3843

Author(s):

Kar Fye Alvin Lee ◽

Woon-Seng Gan ◽

Georgios Christopoulos

Keyword(s):

Machine Learning ◽

Heart Rate ◽

Nervous Activity ◽

Self Report ◽

Data Driven ◽

Psychological State ◽

Research Progress ◽

Physiological Data ◽

Learning Approaches ◽

Cognitive Fatigue

Cognitive fatigue is a psychological state characterised by feelings of tiredness and impaired cognitive functioning arising from high cognitive demands. This paper examines the recent research progress on the assessment of cognitive fatigue and provides informed recommendations for future research. Traditionally, cognitive fatigue is introspectively assessed through self-report or objectively inferred from a decline in behavioural performance. However, more recently, researchers have attempted to explore the biological underpinnings of cognitive fatigue to understand and measure this phenomenon. In particular, there is evidence indicating that the imbalance between sympathetic and parasympathetic nervous activity appears to be a physiological correlate of cognitive fatigue. This imbalance has been indexed through various heart rate variability indices that have also been proposed as putative biomarkers of cognitive fatigue. Moreover, in contrast to traditional inferential methods, there is also a growing research interest in using data-driven approaches to assessing cognitive fatigue. The ubiquity of wearables with the capability to collect large amounts of physiological data appears to be a major facilitator in the growth of data-driven research in this area. Preliminary findings indicate that such large datasets can be used to accurately predict cognitive fatigue through various machine learning approaches. Overall, the potential of combining domain-specific knowledge gained from biomarker research with machine learning approaches should be further explored to build more robust predictive models of cognitive fatigue.

Download Full-text

Artefact Detection in Impedance Pneumography Signals: A Machine Learning Approach

Sensors ◽

10.3390/s21082613 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2613

Author(s):

Jonathan Moeyersons ◽

John Morales ◽

Nick Seeuws ◽

Chris Van Hoof ◽

Evelien Hermeling ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Reference Signal ◽

Machine Learning Algorithms ◽

Classification Model ◽

Data Driven ◽

Learning Approaches ◽

Impedance Pneumography ◽

Artefact Detection ◽

Feature Based

Impedance pneumography has been suggested as an ambulatory technique for the monitoring of respiratory diseases. However, its ambulatory nature makes the recordings more prone to noise sources. It is important that such noisy segments are identified and removed, since they could have a huge impact on the performance of data-driven decision support tools. In this study, we investigated the added value of machine learning algorithms to separate clean from noisy bio-impedance signals. We compared three approaches: a heuristic algorithm, a feature-based classification model (SVM) and a convolutional neural network (CNN). The dataset consists of 47 chronic obstructive pulmonary disease patients who performed an inspiratory threshold loading protocol. During this protocol, their respiration was recorded with a bio-impedance device and a spirometer, which served as a gold standard. Four annotators scored the signals for the presence of artefacts, based on the reference signal. We have shown that the accuracy of both machine learning approaches (SVM: 87.77 ± 2.64% and CNN: 87.20 ± 2.78%) is significantly higher, compared to the heuristic approach (84.69 ± 2.32%). Moreover, no significant differences could be observed between the two machine learning approaches. The feature-based and neural network model obtained a respective AUC of 92.77±2.95% and 92.51±1.74%. These findings show that a data-driven approach could be beneficial for the task of artefact detection in respiratory thoracic bio-impedance signals.

Download Full-text

Artificial Intelligence in Drug Discovery: A Comprehensive Review of Data-driven and Machine Learning Approaches

Biotechnology and Bioprocess Engineering ◽

10.1007/s12257-020-0049-y ◽

2020 ◽

Vol 25 (6) ◽

pp. 895-930

Author(s):

Hyunho Kim ◽

Eunyoung Kim ◽

Ingoo Lee ◽

Bongsung Bae ◽

Minsu Park ◽

...

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Drug Discovery ◽

Data Driven ◽

Learning Approaches ◽

Comprehensive Review

Download Full-text

Modeling Motivation for Alcohol in Humans using Traditional and Machine Learning Approaches

10.31234/osf.io/ahwsg ◽

2020 ◽

Author(s):

Erica Grodin ◽

Amanda Kay Montoya ◽

Spencer Bujarski ◽

Lara A. Ray

Keyword(s):

Machine Learning ◽

A Priori ◽

Progressive Ratio ◽

Data Driven ◽

Learning Approaches ◽

Self Administration ◽

Breath Alcohol ◽

Data Driven Approach ◽

Preclinical And Clinical Studies ◽

Alcohol Seeking

Given the significant cost of alcohol use disorder, identifying risk factors for alcohol seeking represents a research priority. Prominent addiction theories emphasize the role of motivation in the alcohol seeking process, which has largely been studied using preclinical models. In order to bridge the gap between preclinical and clinical studies, this study examined predictors of motivation for alcohol self-administration using a novel paradigm. Heavy drinkers (n=67) completed an alcohol infusion consisting of an alcohol challenge (target breath alcohol = 60mg%) and a progressive-ratio alcohol self-administration paradigm (maximum breath alcohol 120mg%; ratio requirements range = 20-3,139 response). Growth curve modeling was used to predict breath alcohol trajectories during alcohol self-administration. K-means clustering was used to identify motivated (n=41) and unmotivated (n=26) self-administration trajectories. The data was analyzed using two approaches: a theory-driven test of a-priori predictors and a data-driven, machine learning model. In both approaches, steeper delay discounting, indicating a preference for smaller, sooner rewards, predicted motivated alcohol seeking. The data-driven approach further identified phasic alcohol craving as a predictor of motivated alcohol self-administration. Additional application of this model to AUD translational science and treatment development appear warranted.

Download Full-text