scholarly journals An integrated machine learning framework for a discriminative analysis of schizophrenia using multi-biological data

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Peng-fei Ke ◽  
Dong-sheng Xiong ◽  
Jia-hui Li ◽  
Zhi-lin Pan ◽  
Jing Zhou ◽  
...  

AbstractFinding effective and objective biomarkers to inform the diagnosis of schizophrenia is of great importance yet remains challenging. Relatively little work has been conducted on multi-biological data for the diagnosis of schizophrenia. In this cross-sectional study, we extracted multiple features from three types of biological data, including gut microbiota data, blood data, and electroencephalogram data. Then, an integrated framework of machine learning consisting of five classifiers, three feature selection algorithms, and four cross validation methods was used to discriminate patients with schizophrenia from healthy controls. Our results show that the support vector machine classifier without feature selection using the input features of multi-biological data achieved the best performance, with an accuracy of 91.7% and an AUC of 96.5% (p < 0.05). These results indicate that multi-biological data showed better discriminative capacity for patients with schizophrenia than single biological data. The top 5% discriminative features selected from the optimal model include the gut microbiota features (Lactobacillus, Haemophilus, and Prevotella), the blood features (superoxide dismutase level, monocyte-lymphocyte ratio, and neutrophil count), and the electroencephalogram features (nodal local efficiency, nodal efficiency, and nodal shortest path length in the temporal and frontal-parietal brain areas). The proposed integrated framework may be helpful for understanding the pathophysiology of schizophrenia and developing biomarkers for schizophrenia using multi-biological data.

2021 ◽  
Author(s):  
Peng-fei Ke ◽  
Dong-sheng Xiong ◽  
Jia-hui Li ◽  
Shi-jia Li ◽  
Jie Song ◽  
...  

Abstract Finding effective and objective biomarkers to inform the diagnosis of schizophrenia is of great importance yet remains challenging. However, there is relatively little work on multi-biological data for the diagnosis of schizophrenia. This was a cross-sectional study in which we extracted multiple features from three types of biological data including gut microbiota data, blood data, and electroencephalogram data. Then, an integrated framework of machine learning, consisting of five classifiers, three feature selection algorithms, and four cross-validation methods was used to discriminate patients with schizophrenia from healthy controls. Our results showed that the performance of the classifier using multi-biological data was better than that of the classifiers using single biological data, with 91.7% accuracy and 96.5% AUC. The most discriminative features (top 5%) for the classification include gut microbiota features (Lactobacillus, Haemophilus, and Prevotella), blood features (superoxide dismutase, monocyte-lymphocyte ratio, and neutrophil), and electroencephalogram features (nodal local efficiency, nodal efficiency, and nodal shortest path length in the temporal and frontal-parietal areas).The proposed integrated framework may be help in understanding the pathophysiology of schizophrenia and developing biomarkers for schizophrenia using multi-biological data.


2020 ◽  
Vol 10 (4) ◽  
pp. 242 ◽  
Author(s):  
Daniele Pietrucci ◽  
Adelaide Teofani ◽  
Valeria Unida ◽  
Rocco Cerroni ◽  
Silvia Biocca ◽  
...  

The involvement of the gut microbiota in Parkinson’s disease (PD), investigated in several studies, identified some common alterations of the microbial community, such as a decrease in Lachnospiraceae and an increase in Verrucomicrobiaceae families in PD patients. However, the results of other bacterial families are often contradictory. Machine learning is a promising tool for building predictive models for the classification of biological data, such as those produced in metagenomic studies. We tested three different machine learning algorithms (random forest, neural networks and support vector machines), analyzing 846 metagenomic samples (472 from PD patients and 374 from healthy controls), including our published data and those downloaded from public databases. Prediction performance was evaluated by the area under curve, accuracy, precision, recall and F-score metrics. The random forest algorithm provided the best results. Bacterial families were sorted according to their importance in the classification, and a subset of 22 families has been identified for the prediction of patient status. Although the results are promising, it is necessary to train the algorithm with a larger number of samples in order to increase the accuracy of the procedure.


BMJ Open ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. e048482
Author(s):  
Liu Zhang ◽  
Ya Ru Yan ◽  
Shi Qi Li ◽  
Hong Peng Li ◽  
Ying Ni Lin ◽  
...  

ObjectivesObstructive sleep apnoea (OSA) has received much attention as a risk factor for perioperative complications and 68.5% of OSA patients remain undiagnosed before surgery. Faciocervical characteristics may screen OSA for Asians due to smaller upper airways compared with Caucasians. Thus, our study aimed to explore a machine-learning model to screen moderate to severe OSA based on faciocervical and anthropometric measurements.DesignA cross-sectional study.SettingData were collected from the Shanghai Jiao Tong University School of Medicine affiliated Ruijin Hospital between February 2019 and August 2020.ParticipantsA total of 481 Chinese participants were included in the study.Primary and secondary outcome(1) Identification of moderate to severe OSA with apnoea–hypopnoea index 15 events/hour and (2) Verification of the machine-learning model.ResultsSex-Age-Body mass index (BMI)-maximum Interincisal distance-ratio of Height to thyrosternum distance-neck Circumference-waist Circumference (SABIHC2) model was set up. The SABIHC2 model could screen moderate to severe OSA with an area under the curve (AUC)=0.832, the sensitivity of 0.916 and specificity of 0.749, and performed better than the STOP-BANG (snoring, tiredness, observed apnea, high blood pressure, BMI, age, neck circumference, and male gender) questionnaire, which showed AUC=0.631, the sensitivity of 0.487 and specificity of 0.772. Especially for asymptomatic patients (Epworth Sleepiness Scale <10), the SABIHC2 model demonstrated better predictive ability compared with the STOP-BANG questionnaire, with AUC (0.824 vs 0.530), sensitivity (0.892 vs 0.348) and specificity (0.755 vs 0.809).ConclusionThe SABIHC2 machine-learning model provides a simple and accurate assessment of moderate to severe OSA in the Chinese population, especially for those without significant daytime sleepiness.


2017 ◽  
Vol 35 (15_suppl) ◽  
pp. 6596-6596
Author(s):  
Frank Po-Yen Lin ◽  
Chloe Martin ◽  
Simon Kocbek ◽  
Anthony M. Joshua ◽  
Rachel Fitz-Gerald Dear ◽  
...  

6596 Background: Knowing which factors compromise quality of life (QoL) in patients undergoing cancer treatments can help oncologists provide more effective care. To identify these factors, we conducted a single-centered cross-sectional study examining the relationships between patient-reported QoL, adverse events (AE), and treatment characteristics. Methods: Consecutive patients attending an outpatient chemotherapy unit completed two questionnaires (EORTC QLQ-C30 and National Cancer Institute PRO-CTCAE) per visit to identify factors contributing to the lowest global QoL score [QLQ-C30 QL2, range 0 (worst)–100 (best)] over a 6-week period. QL2 was correlated to each PRO-CTCAE item and treatment characteristic (tumor type, drug class, number of cycles, and treatment intent) using multiple regression, adjusted for age, sex, and use of concurrent radiotherapy. To determine whether QoL can be reliably modeled by machine learning, ten algorithms were compared for performance in classifying patients into dichotomized QL2 subgroups. Results: One hundred and fifteen of 130 patients (157/244 visits) completed up to 6 sets of questionnaires (median QL2: 67, IQR: 50–83). No difference was found between QL2 and treatment characteristics (at α Bonferroni=5×10-4). However, QL2 was significantly associated with AE in gastrointestinal, respiratory, attention, pain, sleep/wake, and mood categories. Using AE as covariates, support vector machine with radial basis kernel was the best at classifying patients into QoL groups (mean bootstrapped area under ROC curve 0.812, 95% CI 0.700–0.925). Conclusions: Patient-reported QoL is associated with multiple AE, but not with characteristics of systemic therapy. Machine learning analysis suggests that a combined AE analysis may reliably characterize a patient’s QoL. [Table: see text]


2021 ◽  
Vol 9 (1) ◽  
pp. 60-74
Author(s):  
Derry Pramono Adi ◽  
Lukman Junaedi ◽  
Frismanda ◽  
Agustinus Bimo Gumelar ◽  
Andreas Agung Kristanto

Initially, the goal of Machine Learning (ML) advancements is faster computation time and lower computation resources, while the curse of dimensionality burdens both computation time and resource. This paper describes the benefits of the Feature Selection Algorithms (FSA) for speech data under workload stress. FSA contributes to reducing both data dimension and computation time and simultaneously retains the speech information. We chose to use the robust Evolutionary Algorithm, Harmony Search, Principal Component Analysis, Genetic Algorithm, Particle Swarm Optimization, Ant Colony Optimization, and Bee Colony Optimization, which are then to be evaluated using the hierarchical machine learning models. These FSAs are explored with the conversational workload stress data of a Customer Service hotline, which has daily complaints that trigger stress in speaking. Furthermore, we employed precisely 223 acoustic-based features. Using Random Forest, our evaluation result showed computation time had improved 3.6 faster than the original 223 features employed. Evaluation using Support Vector Machine beat the record with 0.001 seconds of computation time.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Nahla F. Omran ◽  
Sara F. Abd-el Ghany ◽  
Hager Saleh ◽  
Ayman Nabil

Twitter integrates with streaming data technologies and machine learning to add new value to healthcare. This paper presented a real-time system to predict breast cancer based on streaming patient’s health data from Twitter. The proposed system consists of two major components: developing an offline building model and an online prediction pipeline. For the first component, we made a correlation between the features to determine the correlation between features and reduce the number of features from the Breast Cancer Wisconsin Diagnostic dataset. Two feature selection algorithms are recursive feature elimination and univariate feature selection algorithms which are applied to features after correlation to select the essential features. Four decision trees, logistic regression, support vector machine, and random forest classifier have been used on features after correlation and feature selection. Also, hyperparameter tuning and cross-validation have been applied with machine learning to optimize models and enhance accuracy. Apache Spark, Apache Kafka, and Twitter Streaming API are used to develop the second component. The best model with the highest accuracy obtained from the first component predicts breast cancer in real time from tweets’ streaming. The results showed that the best model is the random forest classifier which achieved the best accuracy.


Author(s):  
Kai Sheng Ooi ◽  
ZhiYuan Chen ◽  
Phaik Eong Poh ◽  
Jian Cui

Abstract Biological oxygen demand (BOD5) is an indicator used to monitor water quality. However, the standard process of measuring BOD5 is time consuming and could delay crucial mitigation works in the event of pollution. To solve this problem, this study employed multiple machine learning (ML) methods such as random forest (RF), support vector regression (SVR) and multilayer perceptron (MLP) to train a best model that can accurately predict the BOD5 values in water samples based on other physical and chemical properties of the water. The training parameters were optimized using genetic algorithm (GA) and feature selection was done using sequential feature selection (SFS) method. The proposed machine learning framework was firstly tested on the public dataset (Waterbase). MLP method produced the best model, with R2 score of 0.7672791942775417, relative MSE and relative MAE of approximately 15%. Feature importance calculations indicated that CODCr, Ammonium and Nitrate are features that highly correlates to BOD5. In the field study with a small private dataset consisting of water samples collected from two different lakes in Jiangsu Province of China, the trained model was found to have similar range of prediction error (around 15%), similar relative MAE (around 14%) and achieved about 6% better relative MSE.


Nutrients ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 2032
Author(s):  
Judit Companys ◽  
Maria José Gosalbes ◽  
Laura Pla-Pagà ◽  
Lorena Calderón-Pérez ◽  
Elisabet Llauradó ◽  
...  

We aimed to differentiate gut microbiota composition of overweight/obese and lean subjects and to determine its association with clinical variables and dietary intake. A cross-sectional study was performed with 96 overweight/obese subjects and 32 lean subjects. Anthropometric parameters were positively associated with Collinsella aerofaciens, Dorea formicigenerans and Dorea longicatena, which had higher abundance the overweight/obese subjects. Moreover, different genera of Lachnospiraceae were negatively associated with body fat, LDL and total cholesterol. Saturated fatty acids (SFAs) were negatively associated with the genus Intestinimonas, a biomarker of the overweight/obese group, whereas SFAs were positively associated with Roseburia, a biomarker for the lean group. In conclusion, Dorea formicigenerans, Dorea longicatena and Collinsella aerofaciens could be considered obesity biomarkers, Lachnospiraceae is associated with lipid cardiovascular risk factors. SFAs exhibited opposite association profiles with butyrate-producing bacteria depending on the BMI. Thus, the relationship between diet and microbiota opens new tools for the management of obesity.


Sign in / Sign up

Export Citation Format

Share Document