scholarly journals Epileptic Seizure Prediction Through Machine Learning and Spatio-Temporal Features Based Time Series Analysis of Intracranial Electroencephalogram Data

Epilepsy is a group of neurological disorders identifiable by infrequent but recurrent seizures. Seizure prediction is widely recognized as a significant problem in the neuroscience domain. Developing a Brain-Computer Interface (BCI) for seizure prediction can provide an alert to the patient, providing a buffer time to get the necessary emergency medication or at least be able to call for help, thus improving the quality of life of the patients. A considerable number of clinical studies presented evidence of symptoms (patterns) before seizure episodes and thus, there is large research on seizure prediction, however, there is very little existing literature that illustrates the use of structured processes in machine learning for predicting seizures. Limited training data and class imbalance (EEG segments corresponding to preictal phase, the duration just before the seizure, to about an hour prior to the episode, are usually in a tiny minority) are a few challenges that need to be addressed when employing machine learning for this task. In this paper we present a comparative study of various machine learning approaches that can be used for classification of EEG signals into preictal and interictal (Interictal is the time between seizures) using the features extracted from the intracranial EEG. Publicly available data has been used for this purpose for both human and canine subjects. After data pre-processing and extensive feature extraction, different models are trained and are effectively used to analyze the temporal dynamics of the brain (interictal and preictal) in affected subjects. We present the improved results for various classification algorithms, with AUROC values of best classification models at 0.99.

2019 ◽  
Vol 11 (3) ◽  
pp. 284 ◽  
Author(s):  
Linglin Zeng ◽  
Shun Hu ◽  
Daxiang Xiang ◽  
Xiang Zhang ◽  
Deren Li ◽  
...  

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.


2020 ◽  
Author(s):  
Paul Francoeur ◽  
Tomohide Masuda ◽  
David R. Koes

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank and perform a comprehensive evaluation of grid-based convolutional neural network models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more, lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of 5 densely connected convolutional newtworks, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models.


2020 ◽  
Author(s):  
Abdur Rahman M. A. Basher ◽  
Steven J. Hallam

AbstractMachine learning methods show great promise in predicting metabolic pathways at different levels of biological organization. However, several complications remain that can degrade prediction performance including inadequately labeled training data, missing feature information, and inherent imbalances in the distribution of enzymes and pathways within a dataset. This class imbalance problem is commonly encountered by the machine learning community when the proportion of instances over class labels within a dataset are uneven, resulting in poor predictive performance for underrepresented classes. Here, we present leADS, multi-label learning based on active dataset subsampling, that leverages the idea of subsampling points from a pool of data to reduce the negative impact of training loss due to class imbalance. Specifically, leADS performs an iterative process to: (i)-construct an acquisition model in an ensemble framework; (ii) select informative points using an appropriate acquisition function; and (iii) train on selected samples. Multiple base learners are implemented in parallel where each is assigned a portion of labeled training data to learn pathways. We benchmark leADS using a corpora of 10 experimental datasets manifesting diverse multi-label properties used in previous pathway prediction studies, including manually curated organismal genomes, synthetic microbial communities and low complexity microbial communities. Resulting performance metrics equaled or exceeded previously reported machine learning methods for both organismal and multi-organismal genomes while establishing an extensible framework for navigating class imbalances across diverse real world datasets.Availability and implementationThe software package, and installation instructions are published on github.com/[email protected]


Author(s):  
Ana Clara Gomes da Silva ◽  
Clarisse Lins de Lima ◽  
Cecilia Cordeiro da Silva ◽  
Giselle Machado Magalhães Moreno ◽  
Eduardo Luiz Silva ◽  
...  

Science ◽  
2021 ◽  
Vol 371 (6535) ◽  
pp. eabe8628
Author(s):  
Marshall Burke ◽  
Anne Driscoll ◽  
David B. Lobell ◽  
Stefano Ermon

Accurate and comprehensive measurements of a range of sustainable development outcomes are fundamental inputs into both research and policy. We synthesize the growing literature that uses satellite imagery to understand these outcomes, with a focus on approaches that combine imagery with machine learning. We quantify the paucity of ground data on key human-related outcomes and the growing abundance and improving resolution (spatial, temporal, and spectral) of satellite imagery. We then review recent machine learning approaches to model-building in the context of scarce and noisy training data, highlighting how this noise often leads to incorrect assessment of model performance. We quantify recent model performance across multiple sustainable development domains, discuss research and policy applications, explore constraints to future progress, and highlight research directions for the field.


2020 ◽  
Author(s):  
Paul Francoeur ◽  
Tomohide Masuda ◽  
David R. Koes

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank and perform a comprehensive evaluation of grid-based convolutional neural network models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more, lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of 5 densely connected convolutional newtworks, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models.


Author(s):  
Gebreab K. Zewdie ◽  
David J. Lary ◽  
Estelle Levetin ◽  
Gemechu F. Garuma

Allergies to airborne pollen are a significant issue affecting millions of Americans. Consequently, accurately predicting the daily concentration of airborne pollen is of significant public benefit in providing timely alerts. This study presents a method for the robust estimation of the concentration of airborne Ambrosia pollen using a suite of machine learning approaches including deep learning and ensemble learners. Each of these machine learning approaches utilize data from the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric weather and land surface reanalysis. The machine learning approaches used for developing a suite of empirical models are deep neural networks, extreme gradient boosting, random forests and Bayesian ridge regression methods for developing our predictive model. The training data included twenty-four years of daily pollen concentration measurements together with ECMWF weather and land surface reanalysis data from 1987 to 2011 is used to develop the machine learning predictive models. The last six years of the dataset from 2012 to 2017 is used to independently test the performance of the machine learning models. The correlation coefficients between the estimated and actual pollen abundance for the independent validation datasets for the deep neural networks, random forest, extreme gradient boosting and Bayesian ridge were 0.82, 0.81, 0.81 and 0.75 respectively, showing that machine learning can be used to effectively forecast the concentrations of airborne pollen.


2020 ◽  
Vol 2020 (14) ◽  
pp. 341-1-341-10
Author(s):  
Han Hu ◽  
Yang Lei ◽  
Daisy Xin ◽  
Viktor Shkolnikov ◽  
Steven Barcelo ◽  
...  

Separation and isolation of living cells plays an important role in the fields of medicine and biology with label-free imaging often used for isolating cells. The analysis of label-free cell images has many challenges when examining the behavior of cells. This paper presents methods to analyze label-free cells. Many of the tools we describe are based on machine learning approaches. We also investigate ways of augmenting limited availability of training data. Our results demonstrate that our proposed methods are capable of successfully segmenting and classifying label-free cells.


Author(s):  
Kai Hu ◽  
Zhaodi Zhou ◽  
Liguo Weng ◽  
Jia Liu ◽  
Lihua Wang ◽  
...  

Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous experiences. Among numerous machine learning algorithms, Weighted Extreme Learning Machine (WELM) is one of the famous cases recently. It not only has Extreme Learning Machine (ELM)’s extremely fast training speed and better generalization performance than traditional Neuron Network (NN), but also has the merit in handling imbalance data by assigning more weight to minority class and less weight to majority class. But it still has the limitation of its weight generated according to class distribution of training data, thereby, creating dependency on input data [R. Sharma and A. S. Bist, Genetic algorithm based weighted extreme learning machine for binary imbalance learning, 2015 Int. Conf. Cognitive Computing and Information Processing (CCIP) (IEEE, 2015), pp. 1–6; N. Koutsouleris, Classification/machine learning approaches, Annu. Rev. Clin. Psychol. 13(1) (2016); G. Dudek, Extreme learning machine for function approximation–interval problem of input weights and biases, 2015 IEEE 2nd Int. Conf. Cybernetics (CYBCONF) (IEEE, 2015), pp. 62–67; N. Zhang, Y. Qu and A. Deng, Evolutionary extreme learning machine based weighted nearest-neighbor equality classification, 2015 7th Int. Conf. Intelligent Human-Machine Systems and Cybernetics (IHMSC), Vol. 2 (IEEE, 2015), pp. 274–279]. This leads to the lack of finding optimal weight at which good generalization performance could be achieved [R. Sharma and A. S. Bist, Genetic algorithm based weighted extreme learning machine for binary imbalance learning, 2015 Int. Conf. Cognitive Computing and Information Processing (CCIP) (IEEE, 2015), pp. 1–6; N. Koutsouleris, Classification/machine learning approaches, Annu. Rev. Clin. Psychol. 13(1) (2016); G. Dudek, Extreme learning machine for function approximation–interval problem of input weights and biases, 2015 IEEE 2nd Int. Conf. Cybernetics (CYBCONF) (IEEE, 2015), pp. 62–67; N. Zhang, Y. Qu and A. Deng, Evolutionary extreme learning machine based weighted nearest-neighbor equality classification, 2015 7th Int. Conf. Intelligent Human-Machine Systems and Cybernetics (IHMSC), Vol. 2 (IEEE, 2015), pp. 274–279]. To solve it, a hybrid algorithm which composed by WELM algorithm and Particle Swarm Optimization (PSO) is proposed. Firstly, it distributes the weight according to the number of different samples, determines weighted method; Then, it combines the ELM model and the weighted method to establish WELM model; finally it utilizes PSO to optimize WELM’s three parameters (input weight, bias, the weight of imbalanced training data). Experiment data from both prediction and recognition show that it has better performance than classical WELM algorithms.


Sign in / Sign up

Export Citation Format

Share Document