A LightGBM-Based EEG Analysis Method for Driver Mental States Classification

Fatigue driving can easily lead to road traffic accidents and bring great harm to individuals and families. Recently, electroencephalography- (EEG-) based physiological and brain activities for fatigue detection have been increasingly investigated. However, how to find an effective method or model to timely and efficiently detect the mental states of drivers still remains a challenge. In this paper, we combine common spatial pattern (CSP) and propose a light-weighted classifier, LightFD, which is based on gradient boosting framework for EEG mental states identification. The comparable results with traditional classifiers, such as support vector machine (SVM), convolutional neural network (CNN), gated recurrent unit (GRU), and large margin nearest neighbor (LMNN), show that the proposed model could achieve better classification performance, as well as the decision efficiency. Furthermore, we also test and validate that LightFD has better transfer learning performance in EEG classification of driver mental states. In summary, our proposed LightFD classifier has better performance in real-time EEG mental state prediction, and it is expected to have broad application prospects in practical brain-computer interaction (BCI).

Download Full-text

Identification of Anisomerous Motor Imagery EEG Signals Based on Complex Algorithms

Computational Intelligence and Neuroscience ◽

10.1155/2017/2727856 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Rensong Liu ◽

Zhiwen Zhang ◽

Feng Duan ◽

Xin Zhou ◽

Zixuan Meng

Keyword(s):

Motor Imagery ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Recognition System ◽

Classification Performance ◽

Support Vector ◽

Eeg Signals ◽

Accuracy Rate ◽

Common Spatial Pattern ◽

Identification Rate

Motor imagery (MI) electroencephalograph (EEG) signals are widely applied in brain-computer interface (BCI). However, classified MI states are limited, and their classification accuracy rates are low because of the characteristics of nonlinearity and nonstationarity. This study proposes a novel MI pattern recognition system that is based on complex algorithms for classifying MI EEG signals. In electrooculogram (EOG) artifact preprocessing, band-pass filtering is performed to obtain the frequency band of MI-related signals, and then, canonical correlation analysis (CCA) combined with wavelet threshold denoising (WTD) is used for EOG artifact preprocessing. We propose a regularized common spatial pattern (R-CSP) algorithm for EEG feature extraction by incorporating the principle of generic learning. A new classifier combining the K-nearest neighbor (KNN) and support vector machine (SVM) approaches is used to classify four anisomerous states, namely, imaginary movements with the left hand, right foot, and right shoulder and the resting state. The highest classification accuracy rate is 92.5%, and the average classification accuracy rate is 87%. The proposed complex algorithm identification method can significantly improve the identification rate of the minority samples and the overall classification performance.

Download Full-text

Comparison of Prediction Models for Mortality Related to Injuries from Road Traffic Accidents after Correcting for Undersampling

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18115604 ◽

2021 ◽

Vol 18 (11) ◽

pp. 5604

Author(s):

Yookyung Boo ◽

Youngjin Choi

Keyword(s):

Traffic Accidents ◽

Road Traffic ◽

National Level ◽

Brier Score ◽

Support Vector ◽

Main Diagnosis ◽

Road Traffic Accidents ◽

Multiple Variables ◽

Type Of Injury ◽

Kernel Model

In this study, four models—logistic regression (LR), random forest (RF), linear support vector machine (SVM), and radial basis function (RBF)-SVM—were compared for their accuracy in determining mortality caused by road traffic injuries. They were tested using five years of national-level data from the Korea Disease Control and Prevention Agency’s (KDCA) National Hospital Discharge In-Depth Survey (2013 through to 2017). Model performance was measured for accuracy, precision, recall, F1 score, and Brier score metrics using classification analysis that included characteristics of patients, accidents, injuries, and illnesses. Due to the number of variables and differing units, the rates of survival and mortality related to road traffic accidents were imbalanced, so the data was corrected and standardized before the classification models’ performances were compared. Using the importance analysis, the main diagnosis, the type of injury, the site of the injury, the type of injury, the operation status, the type of accident, the role at the time of the accident, and the sex were selected as the analysis factors. The biggest contributing factor was the role in the accident, which is the driver, and the major sites of the injuries were head injuries and deep injuries. Using selected factors, comparisons of the classification performance of each model indicated RBF-SVM and RF models were superior to the others. Of the SVM models, the RBF kernel model was superior to the linear kernel model; it can be inferred that the performance of the high-dimensional transformed RBF model is superior when the dimension is complex because of the use of multiple variables. The findings suggest there are limitations to analyses involving imbalanced, multidimensional original data, such as data on road traffic mortality. Thus, analyses must be performed after imbalances are corrected.

Download Full-text

Quantifying the Influence of Achievement Emotions for Student Learning in MOOCs

Journal of Educational Computing Research ◽

10.1177/0735633120967318 ◽

2020 ◽

pp. 073563312096731

Author(s):

Bowen Liu ◽

Wanli Xing ◽

Yifang Zeng ◽

Yonghe Wu

Keyword(s):

Random Forest ◽

Nearest Neighbor ◽

Online Courses ◽

Learning Performance ◽

Support Vector ◽

K Nearest Neighbor ◽

Achievement Emotions ◽

Integrative Framework ◽

Emotional Interaction ◽

Performance Results

Massive Open Online Courses (MOOCs) have become a popular tool for worldwide learners. However, a lack of emotional interaction and support is an important reason for learners to abandon their learning and eventually results in poor learning performance. This study applied an integrative framework of achievement emotions to uncover their holistic influence on students’ learning by analyzing more than 400,000 forum posts from 13 MOOCs. Six machine-learning models were first built to automatically identify achievement emotions, including K-Nearest Neighbor, Logistic Regression, Naïve Bayes, Decision Tree, Random Forest, and Support Vector Machines. Results showed that Random Forest performed the best with a kappa of 0.83 and an ROC_AUC of 0.97. Then, multilevel modeling with the “Stepwise Build-up” strategy was used to quantify the effect of achievement emotions on students’ academic performance. Results showed that different achievement emotions influenced students’ learning differently. These findings allow MOOC platforms and instructors to provide relevant emotional feedback to students automatically or manually, thereby improving their learning in MOOCs.

Download Full-text

Detection Model on Fatigue Driving Behaviors Based on the Operating Parameters of Freight Vehicles

Applied Sciences ◽

10.3390/app11157132 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7132

Author(s):

Jianfeng Xi ◽

Shiqing Wang ◽

Tongqiang Ding ◽

Jian Tian ◽

Hui Shao ◽

...

Keyword(s):

Traffic Accidents ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Operating Data ◽

Detection Model ◽

Driving Behaviors ◽

K Nearest Neighbor Algorithm ◽

Fatigue Driving ◽

Freight Vehicles

Whether in developing or developed countries, traffic accidents caused by freight vehicles are responsible for more than 10% of deaths of all traffic accidents. Fatigue driving is one of the main causes of freight vehicle accidents. Existing fatigue driving studies mostly use vehicle operating data from experiments or simulation data, exposing certain drawbacks in the validity and reliability of the models used. This study collected a large quantity of real driving data to extract sample data under different fatigue degrees. The parameters of vehicle operating data were selected based on significant driver fatigue degrees. The k-nearest neighbor algorithm was used to establish the detection model of fatigue driving behaviors, taking into account influence of the number of training samples and other parameters in the accuracy of fatigue driving behavior detection. With the collected operating data of 50 freight vehicles in the past month, the fatigue driving behavior detection models based on the k-nearest neighbor algorithm and the commonly used BP neural network proposed in this paper were tested, respectively. The analysis results showed that the accuracy of both models are 75.9%, but the fatigue driving detection model based on the k-nearest neighbor algorithm is more reliable.

Download Full-text

mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation

Bioinformatics ◽

10.1093/bioinformatics/bty1047 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2757-2765 ◽

Cited By ~ 63

Author(s):

Balachandran Manavalan ◽

Shaherin Basith ◽

Tae Hwan Shin ◽

Leyi Wei ◽

Gwang Lee

Keyword(s):

Nearest Neighbor ◽

Feature Representation ◽

Superior Performance ◽

Supplementary Information ◽

Gradient Boosting ◽

Support Vector ◽

Pharmaceutical Drugs ◽

K Nearest Neighbor ◽

Feature Descriptors ◽

Predicted Probability

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Comparative Study of Machine Learning Classifiers for Modelling Road Traffic Accidents

Applied Sciences ◽

10.3390/app12020828 ◽

2022 ◽

Vol 12 (2) ◽

pp. 828

Author(s):

Tebogo Bokaba ◽

Wesley Doorsamy ◽

Babu Sena Paul

Keyword(s):

Machine Learning ◽

Traffic Accidents ◽

Road Traffic ◽

Real Life ◽

Support Vector ◽

Road Traffic Accidents ◽

Machine Learning Classifiers ◽

Reduction Techniques ◽

Learning Classifiers ◽

Accident Data

Road traffic accidents (RTAs) are a major cause of injuries and fatalities worldwide. In recent years, there has been a growing global interest in analysing RTAs, specifically concerned with analysing and modelling accident data to better understand and assess the causes and effects of accidents. This study analysed the performance of widely used machine learning classifiers using a real-life RTA dataset from Gauteng, South Africa. The study aimed to assess prediction model designs for RTAs to assist transport authorities and policymakers. It considered classifiers such as naïve Bayes, logistic regression, k-nearest neighbour, AdaBoost, support vector machine, random forest, and five missing data methods. These classifiers were evaluated using five evaluation metrics: accuracy, root-mean-square error, precision, recall, and receiver operating characteristic curves. Furthermore, the assessment involved parameter adjustment and incorporated dimensionality reduction techniques. The empirical results and analyses show that the RF classifier, combined with multiple imputations by chained equations, yielded the best performance when compared with the other combinations.

Download Full-text

Classification of Hot Spots using XGBoost and LightGBM Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9459.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 722-724

Keyword(s):

Computational Methods ◽

Protein Interactions ◽

Hot Spots ◽

Cell Metabolism ◽

Pearson Correlation ◽

Classification Performance ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting ◽

Hub Proteins

Protein-Protein Interactions referred as PPIs perform significant role in biological functions like cell metabolism, immune response, signal transduction etc. Hot spots are small fractions of residues in interfaces and provide substantial binding energy in PPIs. Therefore, identification of hot spots is important to discover and analyze molecular medicines and diseases. The current strategy, alanine scanning isn't pertinent to enormous scope applications since the technique is very costly and tedious. The existing computational methods are poor in classification performance as well as accuracy in prediction. They are concerned with the topological structure and gene expression of hub proteins. The proposed system focuses on hot spots of hub proteins by eliminating redundant as well as highly correlated features using Pearson Correlation Coefficient and Support Vector Machine based feature elimination. Extreme Gradient boosting and LightGBM algorithms are used to ensemble a set of weak classifiers to form a strong classifier. The proposed system shows better accuracy than the existing computational methods. The model can also be used to predict accurate molecular inhibitors for specific PPIs

Download Full-text

A Novel Fatigue Driving State Recognition and Warning Method Based on EEG and EOG Signals

Journal of Healthcare Engineering ◽

10.1155/2021/7799793 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Li Liu ◽

Yunfeng Ji ◽

Yun Gao ◽

Zhenyu Ping ◽

Liang Kuang ◽

...

Keyword(s):

Real Time ◽

Early Warning ◽

Traffic Accidents ◽

Recognition Accuracy ◽

Recognition Performance ◽

Warning System ◽

Support Vector ◽

Eyelid Closure ◽

Recognition Result ◽

Fatigue Driving

Traffic accidents are easily caused by tired driving. If the fatigue state of the driver can be identified in time and a corresponding early warning can be provided, then the occurrence of traffic accidents could be avoided to a large extent. At present, the recognition of fatigue driving states is mostly based on recognition accuracy. Fatigue state is currently recognized by combining different features, such as facial expressions, electroencephalogram (EEG) signals, yawning, and the percentage of eyelid closure over the pupil over time (PERCLoS). The combination of these features increases the recognition time and lacks real-time performance. In addition, some features will increase error in the recognition result, such as yawning frequently with the onset of a cold or frequent blinking with dry eyes. On the premise of ensuring the recognition accuracy and improving the realistic feasibility and real-time recognition performance of fatigue driving states, a fast support vector machine (FSVM) algorithm based on EEGs and electrooculograms (EOGs) is proposed to recognize fatigue driving states. First, the collected EEG and EOG modal data are preprocessed. Second, multiple features are extracted from the preprocessed EEGs and EOGs. Finally, FSVM is used to classify and recognize the data features to obtain the recognition result of the fatigue state. Based on the recognition results, this paper designs a fatigue driving early warning system based on Internet of Things (IoT) technology. When the driver shows symptoms of fatigue, the system not only sends a warning signal to the driver but also informs other nearby vehicles using this system through IoT technology and manages the operation background.

Download Full-text

A New Approach to Fall Detection Based on Improved Dual Parallel Channels Convolutional Neural Network

Sensors ◽

10.3390/s19122814 ◽

2019 ◽

Vol 19 (12) ◽

pp. 2814 ◽

Cited By ~ 2

Author(s):

Xiaoguang Liu ◽

Huanliang Li ◽

Cunguang Lou ◽

Tie Liang ◽

Xiuling Liu ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Nearest Neighbor ◽

Fall Detection ◽

Classification Performance ◽

Daily Activities ◽

Support Vector ◽

K Nearest Neighbor ◽

Linear Discriminant ◽

Parallel Channels

Falls are the major cause of fatal and non-fatal injury among people aged more than 65 years. Due to the grave consequences of the occurrence of falls, it is necessary to conduct thorough research on falls. This paper presents a method for the study of fall detection using surface electromyography (sEMG) based on an improved dual parallel channels convolutional neural network (IDPC-CNN). The proposed IDPC-CNN model is designed to identify falls from daily activities using the spectral features of sEMG. Firstly, the classification accuracy of time domain features and spectrograms are compared using linear discriminant analysis (LDA), k-nearest neighbor (KNN) and support vector machine (SVM). Results show that spectrograms provide a richer way to extract pattern information and better classification performance. Therefore, the spectrogram features of sEMG are selected as the input of IDPC-CNN to distinguish between daily activities and falls. Finally, The IDPC-CNN is compared with SVM and three different structure CNNs under the same conditions. Experimental results show that the proposed IDPC-CNN achieves 92.55% accuracy, 95.71% sensitivity and 91.7% specificity. Overall, The IDPC-CNN is more effective than the comparison in accuracy, efficiency, training and generalization.

Download Full-text

Bacterial Immunogenicity Prediction by Machine Learning Methods

Vaccines ◽

10.3390/vaccines8040709 ◽

2020 ◽

Vol 8 (4) ◽

pp. 709

Author(s):

Ivan Dimitrov ◽

Nevena Zaharieva ◽

Irini Doytchinova

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Predictive Ability ◽

Initial Step ◽

Majority Voting ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Test Set ◽

Extreme Gradient Boosting

The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They are able to significantly reduce the experimental work for discovering novel vaccine candidates. Here, we applied six supervised ML methods (partial least squares-based discriminant analysis, k nearest neighbor (kNN), random forest (RF), support vector machine (SVM), random subspace method (RSM), and extreme gradient boosting) on a set of 317 known bacterial immunogens and 317 bacterial non-immunogens and derived models for immunogenicity prediction. The models were validated by internal cross-validation in 10 groups from the training set and by the external test set. All of them showed good predictive ability, but the xgboost model displays the most prominent ability to identify immunogens by recognizing 84% of the known immunogens in the test set. The combined RSM-kNN model was the best in the recognition of non-immunogens, identifying 92% of them in the test set. The three best performing ML models (xgboost, RSM-kNN, and RF) were implemented in the new version of the server VaxiJen, and the prediction of bacterial immunogens is now based on majority voting.

Download Full-text