Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction via the Chou’s 5-steps Rule and General Pseudo Components

Introduction: Neddylation is a highly dynamic and reversible post-translatiNeddylation is a highly dynamic and reversible post-translational modification. The abnormality of neddylation has previously been shown to be closely related to some human diseases. The detection of neddylation sites is essential for elucidating the regulation mechanisms of protein neddylation.onal modification which has been found to be involved in various biological processes and closely associated with many diseases. The accurate identification of neddylation sites is necessary to elucidate the underlying molecular mechanisms of neddylation. As the traditional experimental methods are time consuming and expensive, it is desired to develop computational methods to predict neddylation sites. In this study, a novel predictor named NeddPred is proposed to predict lysine neddylation sites. An effective feature extraction method, bi-profile bayes encoding, is employed to encode neddylation sites. Moreover, a fuzzy support vector machine algorithm is proposed to solve the class imbalance and noise problem in the prediction of neddylation sites. As illustrated by 10-fold cross-validation, NeddPred achieves an excellent performance with a Matthew's correlation coefficient of 0.7082 and an area under receiver operating characteristic curve of 0.9769. Independent tests show that NeddPred significantly outperforms existing neddylation sites predictor NeddyPreddy. Therefore, NeddPred can be a complement to the existing tools for the prediction of neddylation sites. A user-friendly web-server for NeddPred is established at 123.206.31.171/NeddPred/. Objective: As the detection of the lysine neddylation sites by the traditional experimental method is often expensive and time-consuming, it is imperative to design computational methods to identify neddylation sites. Methods: In this study, a bioinformatics tool named NeddPred is developed to identify underlying protein neddylation sites. A bi-profile bayes feature extraction is used to encode neddylation sites and a fuzzy support vector machine model is utilized to overcome the problem of noise and class imbalance in the prediction. Results: Matthew's correlation coefficient of NeddPred achieved 0.7082 and an area under the receiver operating characteristic curve of 0.9769. Independent tests show that NeddPred significantly outperforms existing lysine neddylation sites predictor NeddyPreddy. Conclusion: Therefore, NeddPred can be a complement to the existing tools for the prediction of neddylation sites. A user-friendly webserver for NeddPred is accessible at 123.206.31.171/NeddPred/.

Download Full-text

Premio de Investigación SCHOT 2020: desarrollo y validación de un modelo multivariables de predicción de estadía hospitalaria en pacientes mayores de 65 años sometidos artroplastia total de cadera electiva en Chile utilizando aprendizaje de máquinas

Revista Chilena de Ortopedia y Traumatología ◽

10.1055/s-0041-1740232 ◽

2021 ◽

Vol 62 (03) ◽

pp. e180-e192

Author(s):

Claudio Díaz-Ledezma ◽

David Díaz-Solís ◽

Raúl Muñoz-Reyes ◽

Jonathan Torres Castro

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Receiver Operating Characteristic Curve ◽

Receiver Operating Characteristic ◽

Operating Characteristic ◽

Characteristic Curve ◽

Support Vector ◽

Operating Characteristic Curve ◽

Receiver Operating

Resumen Introducción La predicción de la estadía hospitalaria luego de una artroplastia total de cadera (ATC) electiva es crucial en la evaluación perioperatoria de los pacientes, con un rol determinante desde el punto de vista operacional y económico. Internacionalmente, se han empleado macrodatos (big data, en inglés) e inteligencia artificial para llevar a cabo evaluaciones pronósticas de este tipo. El objetivo del presente estudio es desarrollar y validar, con el empleo del aprendizaje de máquinas (machine learning, en inglés), una herramienta capaz de predecir la estadía hospitalaria de pacientes chilenos mayores de 65 años sometidos a ATC por artrosis. Material y Métodos Empleando los registros electrónicos de egresos hospitalarios anonimizados del Departamento de Estadísticas e Información de Salud (DEIS), se obtuvieron los datos de 8.970 egresos hospitalarios de pacientes sometidos a ATC por artrosis entre los años 2016 y 2018. En total, 15 variables disponibles en el DEIS, además del porcentaje de pobreza de la comuna de origen del paciente, fueron incluidos para predecir la probabilidad de que un paciente presentara una estadía acortada (< 3 días) o prolongada (> 3 días) luego de la cirugía. Utilizando técnicas de aprendizaje de máquinas, 8 algoritmos de predicción fueron entrenados con el 80% de la muestra. El 20% restante se empleó para validar las capacidades predictivas de los modelos creados a partir de los algoritmos. La métrica de optimización se evaluó y ordenó en un ranking utilizando el área bajo la curva de característica operativa del receptor (area under the receiver operating characteristic curve, AUC-ROC, en inglés), que corresponde a cuan bien un modelo puede distinguir entre dos grupos. Resultados El algoritmo XGBoost obtuvo el mejor desempeño, con una AUC-ROC promedio de 0,86 (desviación estándar [DE]: 0,0087). En segundo lugar, observamos que el algoritmo lineal de máquina de vector de soporte (support vector machine, SVM, en inglés) obtuvo una AUC-ROC de 0,85 (DE: 0,0086). La importancia relativa de las variables explicativas demostró que la región de residencia, el servicio de salud, el establecimiento de salud donde se operó el paciente, y la modalidad de atención son las variables que más determinan el tiempo de estadía de un paciente. Discusión El presente estudio desarrolló algoritmos de aprendizaje de máquinas basados en macrodatos chilenos de libre acceso, y logró desarrollar y validar una herramienta que demuestra una adecuada capacidad discriminatoria para predecir la probabilidad de estadía hospitalaria acortada versus prolongada en adultos mayores sometidos a ATC por artrosis. Conclusión Los algoritmos creados a traves del empleo del aprendizaje de máquinas permiten predecir la estadía hospitalaria en pacientes chilenos operado de artroplastia total de cadera electiva.

Download Full-text

Prediction of the Risk of C5 Palsy After Posterior Laminectomy and Fusion With Cervical Myelopathy Using Support Vector Machine: an Analysis of 184 Consecutive Patients

10.21203/rs.3.rs-315608/v1 ◽

2021 ◽

Author(s):

Haosheng Wang ◽

Zhi-Ri Tang ◽

Wenle Li ◽

Tingting Fan ◽

Jianwu Zhao ◽

...

Keyword(s):

Support Vector Machine ◽

Receiver Operating Characteristic Curve ◽

Cervical Myelopathy ◽

Operating Characteristic ◽

Characteristic Curve ◽

Support Vector ◽

Svm Model ◽

Operating Characteristic Curve ◽

Posterior Laminectomy ◽

Laminectomy And Fusion

Abstract Background: This study aimed to predict the C5 palsy (C5P) after posterior laminectomy and fusion (PLF) with cervical myelopathy (CM) from routinely available variables by using support vector machine (SVM) method.Methods: We conducted a retrospective investigation based on 184 consecutive patients with CM after PLF, and data was collected from March 2013 to December 2019. Clinical and imaging variables were obtained and imported into univariable and multivariable logistics regression analysis to identify risk factors for C5P. According to published reports and clinical experience, a series of variables was selected to develop an SVM machine learning model to predict C5P. The accuracy (ACC), area under the receiver operating characteristic curve (AUC) and confusion matrices were used to evaluate the performance of the prediction model.Results: Among the total 184 consecutive patients, C5P occurred in 26 patients (14.13%). Multivariate analyses demonstrated the following 4 independent factors associated with C5P: electromyogram abnormal (odds ratio [OR] = 7.861), JOA recovery rate (OR = 1.412), modified Pavlov ratio (OR = 0.009), and presence of foraminal stenosis C4-C5 (OR = 15.492). The SVM model achieved an area under receiver operating characteristic curve (AUC) of 0.923 and ACC of 0.918. Meanwhile, the confusion matrix shown the classification results of the discriminant analysis. Conclusions: The designed SVM model presented a satisfied performance in predicting C5P from routinely available variables. However, future external validation is needed.

Download Full-text

Prediction of the risk of C5 palsy after posterior laminectomy and fusion with cervical myelopathy using a support vector machine: an analysis of 184 consecutive patients

Journal of Orthopaedic Surgery and Research ◽

10.1186/s13018-021-02476-5 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Haosheng Wang ◽

Zhi-Ri Tang ◽

Wenle Li ◽

Tingting Fan ◽

Jianwu Zhao ◽

...

Keyword(s):

Support Vector Machine ◽

Receiver Operating Characteristic Curve ◽

Cervical Myelopathy ◽

Operating Characteristic ◽

Characteristic Curve ◽

Support Vector ◽

Svm Model ◽

Operating Characteristic Curve ◽

Posterior Laminectomy ◽

Laminectomy And Fusion

Abstract Background This study aimed to predict C5 palsy (C5P) after posterior laminectomy and fusion (PLF) with cervical myelopathy (CM) from routinely available variables using a support vector machine (SVM) method. Methods We conducted a retrospective investigation based on 184 consecutive patients with CM after PLF, and data were collected from March 2013 to December 2019. Clinical and imaging variables were obtained and imported into univariable and multivariable logistic regression analyses to identify risk factors for C5P. According to published reports and clinical experience, a series of variables was selected to develop an SVM machine learning model to predict C5P. The accuracy (ACC), area under the receiver operating characteristic curve (AUC), and confusion matrices were used to evaluate the performance of the prediction model. Results Among the 184 consecutive patients, C5P occurred in 26 patients (14.13%). Multivariate analyses demonstrated the following 4 independent factors associated with C5P: abnormal electromyogram (odds ratio [OR] = 7.861), JOA recovery rate (OR = 1.412), modified Pavlov ratio (OR = 0.009), and presence of C4–C5 foraminal stenosis (OR = 15.492). The SVM model achieved an area under the receiver operating characteristic curve (AUC) of 0.923 and an ACC of 0.918. Additionally, the confusion matrix showed the classification results of the discriminant analysis. Conclusions The designed SVM model presented satisfactory performance in predicting C5P from routinely available variables. However, future external validation is needed.

Download Full-text

Predicting Risk of Antenatal Depression and Anxiety Using Multi-Layer Perceptrons and Support Vector Machines

Journal of Personalized Medicine ◽

10.3390/jpm11030199 ◽

2021 ◽

Vol 11 (3) ◽

pp. 199

Author(s):

Fajar Javed ◽

Syed Omer Gilani ◽

Seemab Latif ◽

Asim Waris ◽

Mohsin Jamil ◽

...

Keyword(s):

Low Income ◽

Operating Characteristic ◽

Mental Health Problems ◽

Characteristic Curve ◽

Antenatal Depression ◽

Low Income Countries ◽

Support Vector ◽

Depression And Anxiety ◽

Gynecology And Obstetrics ◽

Operating Characteristic Curve

Perinatal depression and anxiety are defined to be the mental health problems a woman faces during pregnancy, around childbirth, and after child delivery. While this often occurs in women and affects all family members including the infant, it can easily go undetected and underdiagnosed. The prevalence rates of antenatal depression and anxiety worldwide, especially in low-income countries, are extremely high. The wide majority suffers from mild to moderate depression with the risk of leading to impaired child–mother relationship and infant health, few women end up taking their own lives. Owing to high costs and non-availability of resources, it is almost impossible to diagnose every pregnant woman for depression/anxiety whereas under-detection can have a lasting impact on mother and child’s health. This work proposes a multi-layer perceptron based neural network (MLP-NN) classifier to predict the risk of depression and anxiety in pregnant women. We trained and evaluated our proposed system on a Pakistani dataset of 500 women in their antenatal period. ReliefF was used for feature selection before classifier training. Evaluation metrics such as accuracy, sensitivity, specificity, precision, F1 score, and area under the receiver operating characteristic curve were used to evaluate the performance of the trained model. Multilayer perceptron and support vector classifier achieved an area under the receiving operating characteristic curve of 88% and 80% for antenatal depression and 85% and 77% for antenatal anxiety, respectively. The system can be used as a facilitator for screening women during their routine visits in the hospital’s gynecology and obstetrics departments.

Download Full-text

Machine learning for identification of surgeries with high risks of cancellation

Health Informatics Journal ◽

10.1177/1460458218813602 ◽

2018 ◽

Vol 26 (1) ◽

pp. 141-155 ◽

Cited By ~ 2

Author(s):

Li Luo ◽

Fengyi Zhang ◽

Yao Yao ◽

RenRong Gong ◽

Martina Fu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Value ◽

Operating Characteristic ◽

Sampling Methods ◽

Characteristic Curve ◽

Support Vector ◽

Chi Square ◽

Stable Performance ◽

Operating Characteristic Curve

Surgery cancellations waste scarce operative resources and hinder patients’ access to operative services. In this study, the Wilcoxon and chi-square tests were used for predictor selection, and three machine learning models – random forest, support vector machine, and XGBoost – were used for the identification of surgeries with high risks of cancellation. The optimal performances of the identification models were as follows: sensitivity − 0.615; specificity − 0.957; positive predictive value − 0.454; negative predictive value − 0.904; accuracy − 0.647; and area under the receiver operating characteristic curve − 0.682. Of the three models, the random forest model achieved the best performance. Thus, the effective identification of surgeries with high risks of cancellation is feasible with stable performance. Models and sampling methods significantly affect the performance of identification. This study is a new application of machine learning for the identification of surgeries with high risks of cancellation and facilitation of surgery resource management.

Download Full-text

iRNAD: a computational tool for identifying D modification sites in RNA sequence

Bioinformatics ◽

10.1093/bioinformatics/btz358 ◽

2019 ◽

Vol 35 (23) ◽

pp. 4922-4929 ◽

Cited By ~ 31

Author(s):

Zhao-Chun Xu ◽

Peng-Mian Feng ◽

Hui Yang ◽

Wang-Ren Qiu ◽

Wei Chen ◽

...

Keyword(s):

Computational Models ◽

Operating Characteristic ◽

Cross Validation ◽

Characteristic Curve ◽

Support Vector ◽

Final Model ◽

Rna Sequence ◽

Functional Roles ◽

Proposed Model ◽

User Friendly

Abstract Motivation Dihydrouridine (D) is a common RNA post-transcriptional modification found in eukaryotes, bacteria and a few archaea. The modification can promote the conformational flexibility of individual nucleotide bases. And its levels are increased in cancerous tissues. Therefore, it is necessary to detect D in RNA for further understanding its functional roles. Since wet-experimental techniques for the aim are time-consuming and laborious, it is urgent to develop computational models to identify D modification sites in RNA. Results We constructed a predictor, called iRNAD, for identifying D modification sites in RNA sequence. In this predictor, the RNA samples derived from five species were encoded by nucleotide chemical property and nucleotide density. Support vector machine was utilized to perform the classification. The final model could produce the overall accuracy of 96.18% with the area under the receiver operating characteristic curve of 0.9839 in jackknife cross-validation test. Furthermore, we performed a series of validations from several aspects and demonstrated the robustness and reliability of the proposed model. Availability and implementation A user-friendly web-server called iRNAD can be freely accessible at http://lin-group.cn/server/iRNAD, which will provide convenience and guide to users for further studying D modification.

Download Full-text