Fertility-LightGBM: A fertility-related protein prediction model by multi-information fusion and light gradient boosting machine

ABSTRACTThe identification of fertility-related proteins plays an essential part in understanding the embryogenesis of germ cell development. Since the traditional experimental methods are expensive and time-consuming to identify fertility-related proteins, the purposes of predicting protein functions from amino acid sequences appeared. In this paper, we propose a fertility-related protein prediction model. Firstly, the model combines protein physicochemical property information, evolutionary information and sequence information to construct the initial feature space ‘ALL’. Then, the least absolute shrinkage and selection operator (LASSO) is used to remove redundant features. Finally, light gradient boosting machine (LightGBM) is used as a classifier to predict. The 5-fold cross-validation accuracy of the training dataset is 88.5%, and the independent accuracy of the training dataset is 91.5%. The results show that our model is more competitive for the prediction of fertility-related proteins, which is helpful for the study of fertility diseases and related drug targets.

Download Full-text

Fertility-LightGBM: A fertility-related protein prediction model by multi-information fusion and light gradient boosting machine

Biomedical Signal Processing and Control ◽

10.1016/j.bspc.2021.102630 ◽

2021 ◽

Vol 68 ◽

pp. 102630

Author(s):

Minghui Wang ◽

Lingling Yue ◽

Xinhua Yang ◽

Xiaolin Wang ◽

Yu Han ◽

...

Keyword(s):

Prediction Model ◽

Information Fusion ◽

Gradient Boosting ◽

Related Protein ◽

Light Gradient ◽

Protein Prediction ◽

Gradient Boosting Machine

Download Full-text

A Multi-Class Automatic Sleep Staging Method Based on Photoplethysmography Signals

Entropy ◽

10.3390/e23010116 ◽

2021 ◽

Vol 23 (1) ◽

pp. 116

Author(s):

Xiangfa Zhao ◽

Guobing Sun

Keyword(s):

Time Domain ◽

Single Channel ◽

Kappa Statistic ◽

Gradient Boosting ◽

Sleep Staging ◽

Challenging Problem ◽

Sleep State ◽

Light Gradient ◽

Gradient Boosting Machine ◽

The Time Domain

Automatic sleep staging with only one channel is a challenging problem in sleep-related research. In this paper, a simple and efficient method named PPG-based multi-class automatic sleep staging (PMSS) is proposed using only a photoplethysmography (PPG) signal. Single-channel PPG data were obtained from four categories of subjects in the CAP sleep database. After the preprocessing of PPG data, feature extraction was performed from the time domain, frequency domain, and nonlinear domain, and a total of 21 features were extracted. Finally, the Light Gradient Boosting Machine (LightGBM) classifier was used for multi-class sleep staging. The accuracy of the multi-class automatic sleep staging was over 70%, and the Cohen’s kappa statistic k was over 0.6. This also showed that the PMSS method can also be applied to stage the sleep state for patients with sleep disorders.

Download Full-text

A Review of Light Gradient Boosting Machine Method for Hate Speech Classification on Twitter

2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE) ◽

10.1109/icecie50279.2020.9309565 ◽

2020 ◽

Author(s):

Muhammad Hafizh Abdurrahman ◽

Budhi Irawan ◽

Casi Setianingsih

Keyword(s):

Hate Speech ◽

Gradient Boosting ◽

Machine Method ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Speech Classification

Download Full-text

A Swarm Enhanced Light Gradient Boosting Machine for Crowdfunding Project Outcome Prediction

Machine Learning for Cyber Security - Lecture Notes in Computer Science ◽

10.1007/978-3-030-62463-7_34 ◽

2020 ◽

pp. 372-382

Author(s):

Shuang Geng ◽

Miaojia Huang ◽

Zhibo Wang

Keyword(s):

Outcome Prediction ◽

Gradient Boosting ◽

Light Gradient ◽

Project Outcome ◽

Gradient Boosting Machine

Download Full-text

Development of a Diabetes Melitus Detection and Prediction Model Using Light Gradient Boosting Machine and K-Nearest Neighbour

10.36108/ujees/1202.30.0160 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

B. A Omodunbi

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Hybrid Model ◽

Learning Model ◽

Experimental Result ◽

Gradient Boosting ◽

Light Gradient ◽

Machine Learning Model ◽

Gradient Boosting Machine ◽

Receiver Operating

Diabetes mellitus is a health disorder that occurs when the blood sugar level becomes extremely high due to body resistance in producing the required amount of insulin. The aliment happens to be among the major causes of death in Nigeria and the world at large. This study was carried out to detect diabetes mellitus by developing a hybrid model that comprises of two machine learning model namely Light Gradient Boosting Machine (LGBM) and K-Nearest Neighbor (KNN). This research is aimed at developing a machine learning model for detecting the occurrence of diabetes in patients. The performance metrics employed in evaluating the finding for this study are Receiver Operating Characteristics (ROC) Curve, Five-fold Cross-validation, precision, and accuracy score. The proposed system had an accuracy of 91% and the area under the Receiver Operating Characteristic Curve was 93%. The experimental result shows that the prediction accuracy of the hybrid model is better than traditional machine learning

Download Full-text

Urban Crime Trends Analysis and Occurrence Possibility Prediction based on Light Gradient Boosting Machine

10.1109/bdai52447.2021.9515252 ◽

2021 ◽

Author(s):

Xiangzhi Tong ◽

Pin Ni ◽

Qingge Li ◽

QiAo Yuan ◽

Junru Liu ◽

...

Keyword(s):

Urban Crime ◽

Gradient Boosting ◽

Crime Trends ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Trends Analysis

Download Full-text

Multistep-Ahead Solar Radiation Forecasting Scheme Based on the Light Gradient Boosting Machine: A Case Study of Jeju Island

Remote Sensing ◽

10.3390/rs12142271 ◽

2020 ◽

Vol 12 (14) ◽

pp. 2271 ◽

Cited By ~ 2

Author(s):

Jinwoong Park ◽

Jihoon Moon ◽

Seungmin Jung ◽

Eenjun Hwang

Keyword(s):

Solar Radiation ◽

Global Solar Radiation ◽

Jeju Island ◽

Gradient Boosting ◽

Probabilistic Forecasting ◽

Training Time ◽

Light Gradient ◽

Proposed Model ◽

Gradient Boosting Machine ◽

Time Problem

Smart islands have focused on renewable energy sources, such as solar and wind, to achieve energy self-sufficiency. Because solar photovoltaic (PV) power has the advantage of less noise and easier installation than wind power, it is more flexible in selecting a location for installation. A PV power system can be operated more efficiently by predicting the amount of global solar radiation for solar power generation. Thus far, most studies have addressed day-ahead probabilistic forecasting to predict global solar radiation. However, day-ahead probabilistic forecasting has limitations in responding quickly to sudden changes in the external environment. Although multistep-ahead (MSA) forecasting can be used for this purpose, traditional machine learning models are unsuitable because of the substantial training time. In this paper, we propose an accurate MSA global solar radiation forecasting model based on the light gradient boosting machine (LightGBM), which can handle the training-time problem and provide higher prediction performance compared to other boosting methods. To demonstrate the validity of the proposed model, we conducted a global solar radiation prediction for two regions on Jeju Island, the largest island in South Korea. The experiment results demonstrated that the proposed model can achieve better predictive performance than the tree-based ensemble and deep learning methods.

Download Full-text

Prediction of Extubation Failure for Intensive Care Unit Patients Using Light Gradient Boosting Machine

IEEE Access ◽

10.1109/access.2019.2946980 ◽

2019 ◽

Vol 7 ◽

pp. 150960-150968 ◽

Cited By ~ 3

Author(s):

Tingting Chen ◽

Jun Xu ◽

Haochao Ying ◽

Xiaojun Chen ◽

Ruiwei Feng ◽

...

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Extubation Failure ◽

Gradient Boosting ◽

Light Gradient ◽

Gradient Boosting Machine

Download Full-text

SAT-LB121 Development of a Machine-Learning Method for Predicting New Onset of Diabetes Mellitus: A Retrospective Analysis of 509,153 Annual Specific Health Checkup Records

Journal of the Endocrine Society ◽

10.1210/jendso/bvaa046.2194 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

Author(s):

Akihiro Nomura ◽

Sho Yamamoto ◽

Yuta Hayakawa ◽

Kouki Taniguchi ◽

Takuya Higashitani ◽

...

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Prediction Model ◽

Performance Test ◽

Bootstrap Method ◽

Area Under The Curve ◽

Training Dataset ◽

Gradient Boosting ◽

Health Checkup ◽

Specific Health

Abstract Diabetes mellitus (DM) is a chronic disorder, characterized by impaired glucose metabolism. It is linked to increased risks of several diseases such as atrial fibrillation, cancer, and cardiovascular diseases. Therefore, DM prevention is essential. However, the traditional regression-based DM-onset prediction methods are incapable of investigating future DM for generally healthy individuals without DM. Employing gradient-boosting decision trees, we developed a machine learning-based prediction model to identify the DM signatures, prior to the onset of DM. We employed the nationwide annual specific health checkup records, collected during the years 2008 to 2018, from Kanazawa city, Ishikawa, Japan. The data included the physical examinations, blood and urine tests, and participant questionnaires. Individuals without DM (at baseline), who underwent more than two annual health checkups during the said period, were included. The new cases of DM onset were recorded when the participants were diagnosed with DM in the annual check-ups. The dataset was divided into three subsets in a 6:2:2 ratio to constitute the training, tuning (internal validation), and testing datasets. Employing the testing dataset, the ability of our trained prediction model to calculate the area under the curve (AUC), precision, recall, F1 score, and overall accuracy was evaluated. Using a 1,000-iteration bootstrap method, every performance test resulted in a two-sided 95% confidence interval (CI). We included 509,153 annual health checkup records of 139,225 participants. Among them, 65,505 participants without DM were included, which constituted36,303 participants in the training dataset and 13,101 participants in each of the tuning and testing datasets. We identified a total of 4,696 new DM-onset patients (7.2%) in the study period. Our trained model predicted the future incidence of DM with the AUC, precision, recall, F1 score, and overall accuracy of 0.71 (0.69-0.72 with 95% CI), 75.3% (71.6-78.8), 42.2% (39.3-45.2), 54.1% (51.2-56.7), and 94.9% (94.5-95.2), respectively. In conclusion, the machine learning-based prediction model satisfactorily identified the DM onset prior to the actual incidence.

Download Full-text

Predicting Important Features That Influence COVID-19 Infection Through Light Gradient Boosting Machine: Case of Toronto

American Journal of Mathematical and Computer Modelling ◽

10.11648/j.ajmcm.20210603.11 ◽

2021 ◽

Vol 6 (3) ◽

pp. 43

Author(s):

Yein Choi

Keyword(s):

Gradient Boosting ◽

Light Gradient ◽

Gradient Boosting Machine

Download Full-text