A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstructive sleep apnea

Farahnaz Hajipour; Mohammad Jafari Jozani; Zahra Moussavi

doi:10.1007/s11517-020-02206-9

414 Deep Neural Networks: A Survey Tool for Obstructive Sleep Apnea Prediction

SLEEP ◽

10.1093/sleep/zsab072.413 ◽

2021 ◽

Vol 44 (Supplement_2) ◽

pp. A164-A164

Author(s):

Pahnwat Taweesedt ◽

JungYoon Kim ◽

Jaehyun Park ◽

Jangwoon Park ◽

Munish Sharma ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Obstructive Sleep Apnea ◽

Sleep Apnea ◽

Deep Neural Networks ◽

Support Vector ◽

Learning Models ◽

Obstructive Sleep ◽

Screening Questionnaires ◽

Machine Learning Models

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):

Get full-text (via PubEx)

SAS Mobile Application for Diagnosis of Obstructive Sleep Apnea Utilizing Machine Learning Models

2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) ◽

10.1109/uemcon51285.2020.9298041 ◽

2020 ◽

Author(s):

Carl Haberfeld ◽

Alaa Sheta ◽

Md Shafaeat Hossain ◽

Hamza Turabieh ◽

Salim Surani

Keyword(s):

Machine Learning ◽

Obstructive Sleep Apnea ◽

Sleep Apnea ◽

Mobile Application ◽

Learning Models ◽

Obstructive Sleep ◽

Machine Learning Models

Get full-text (via PubEx)

Statistical and machine learning models for classification of human wear and delivery days in accelerometry data

10.1101/2020.12.31.424867 ◽

2021 ◽

Author(s):

Ryan Moore ◽

Kristin R. Archer ◽

Leena Choi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Human Activity ◽

Recurrent Neural Network ◽

Learning Models ◽

Learning Context ◽

Machine Learning Models

AbstractPurposeAccelerometers are increasingly utilized in healthcare research to assess human activity. Accelerometry data are often collected by mailing accelerometers to participants, who wear the accelerometers to collect data on their activity. The devices are then mailed back to the laboratory for analysis. We develop models to classify days in accelerometry data as activity from actual human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery.MethodsFor the classification of delivery days in accelerometry data, we developed statistical and machine learning models in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. We extracted several features, which were included to develop random forest, logistic regression, mixed effects regression, and multilayer perceptron models, while convolutional neural network, recurrent neural network, and hybrid convolutional recurrent neural network models were developed without feature extraction. Model performances were assessed using Monte Carlo cross-validation.ResultsWe found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively.ConclusionThe models developed in this study can be used to classify days in accelerometry data as either human or delivery activity. An analyst can weigh the larger computational cost and greater performance of the convolutional recurrent neural network against the faster but slightly less powerful random forest or logistic regression. The best performing models for classification of delivery data are publicly available on the open source R package, PhysicalActivity.

Get full-text (via PubEx)

Machine learning approach for predicting under-five mortality determinants in Ethiopia: evidence from the 2016 Ethiopian Demographic and Health Survey

Genus ◽

10.1186/s41118-020-00106-2 ◽

2020 ◽

Vol 76 (1) ◽

Author(s):

Fikrewold H. Bitew ◽

Samuel H. Nyarko ◽

Lloyd Potter ◽

Corey S. Sparks

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Health Survey ◽

Predictive Power ◽

Demographic And Health Survey ◽

Learning Models ◽

Under Five ◽

Machine Learning Model ◽

Machine Learning Models

Abstract There is a dearth of literature on the use of machine learning models to predict important under-five mortality risks in Ethiopia. In this study, we showed spatial variations of under-five mortality and used machine learning models to predict its important sociodemographic determinants in Ethiopia. The study data were drawn from the 2016 Ethiopian Demographic and Health Survey. We used three machine learning models such as random forests, logistic regression, and K-nearest neighbors as well as one traditional logistic regression model to predict under-five mortality determinants. For each machine learning model, measures of model accuracy and receiver operating characteristic curves were used to evaluate the predictive power of each model. The descriptive results show that there are considerable regional variations in under-five mortality rates in Ethiopia. The under-five mortality prediction ability was found to be between 46.3 and 67.2% for the models considered, with the random forest model (67.2%) showing the best performance. The best predictive model shows that household size, time to the source of water, breastfeeding status, number of births in the preceding 5 years, sex of a child, birth intervals, antenatal care, birth order, type of water source, and mother’s body mass index play an important role in under-five mortality levels in Ethiopia. The random forest machine learning model produces a better predictive power for estimating under-five mortality risk factors and may help to improve policy decision-making in this regard. Childhood survival chances can be improved considerably by using these important factors to inform relevant policies.

Get full-text (via PubEx)

Fake News Data Exploration and Analytics

Electronics ◽

10.3390/electronics10192326 ◽

2021 ◽

Vol 10 (19) ◽

pp. 2326

Author(s):

Mazhar Javed Awan ◽

Awais Yasin ◽

Haitham Nobanee ◽

Ahmed Abid Ali ◽

Zain Shahzad ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Random Forest Classifier ◽

The Internet ◽

Fake News ◽

Learning Models ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Machine Learning Models

Before the internet, people acquired their news from the radio, television, and newspapers. With the internet, the news moved online, and suddenly, anyone could post information on websites such as Facebook and Twitter. The spread of fake news has also increased with social media. It has become one of the most significant issues of this century. People use the method of fake news to pollute the reputation of a well-reputed organization for their benefit. The most important reason for such a project is to frame a device to examine the language designs that describe fake and right news through machine learning. This paper proposes models of machine learning that can successfully detect fake news. These models identify which news is real or fake and specify the accuracy of said news, even in a complex environment. After data-preprocessing and exploration, we applied three machine learning models; random forest classifier, logistic regression, and term frequency-inverse document frequency (TF-IDF) vectorizer. The accuracy of the TFIDF vectorizer, logistic regression, random forest classifier, and decision tree classifier models was approximately 99.52%, 98.63%, 99.63%, and 99.68%, respectively. Machine learning models can be considered a great choice to find reality-based results and applied to other unstructured data for various sentiment analysis applications.

Get full-text (via PubEx)

Machine learning-based preoperative datamining can predict the therapeutic outcome of sleep surgery in OSA subjects

Scientific Reports ◽

10.1038/s41598-021-94454-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jin Youp Kim ◽

Hyoun-Joong Kong ◽

Su Hwan Kim ◽

Sangjun Lee ◽

Seung Heon Kang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Sleep Apnea ◽

Success Rate ◽

Surgical Outcome ◽

Successful Outcome ◽

Gradient Boosting ◽

Learning Models ◽

Sleep Surgery ◽

Machine Learning Models

AbstractIncreasing recognition of anatomical obstruction has resulted in a large variety of sleep surgeries to improve anatomic collapse of obstructive sleep apnea (OSA) and the prediction of whether sleep surgery will have successful outcome is very important. The aim of this study is to assess a machine learning-based clinical model that predict the success rate of sleep surgery in OSA subjects. The predicted success rate from machine learning and the predicted subjective surgical outcome from the physician were compared with the actual success rate in 163 male dominated-OSA subjects. Predicted success rate of sleep surgery from machine learning models based on sleep parameters and endoscopic findings of upper airway demonstrated higher accuracy than subjective predicted value of sleep surgeon. The gradient boosting model showed the best performance to predict the surgical success that is evaluated by pre- and post-operative polysomnography or home sleep apnea testing among the logistic regression and three machine learning models, and the accuracy of gradient boosting model (0.708) was significantly higher than logistic regression model (0.542). Our data demonstrate that the data mining-driven prediction such as gradient boosting exhibited higher accuracy for prediction of surgical outcome and we can provide accurate information on surgical outcomes before surgery to OSA subjects using machine learning models.

Get full-text (via PubEx)

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Get full-text (via PubEx)

Understanding the role of oxidative stress in the incidence of metabolic syndrome and obstructive sleep apnea

BMC Endocrine Disorders ◽

10.1186/s12902-021-00735-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Behnam Kargar ◽

Zahra Zamanian ◽

Majid Bagheri Hosseinabadi ◽

Vahid Gharibi ◽

Mohammad Sanyar Moradi ◽

...

Keyword(s):

Oxidative Stress ◽

Metabolic Syndrome ◽

Obstructive Sleep Apnea ◽

Logistic Regression ◽

Sleep Apnea ◽

International Diabetes Federation ◽

Binary Logistic Regression ◽

Stress Index ◽

Obstructive Sleep

Abstract Background Understanding the causes and risk factors of metabolic syndrome is important for promoting population health. Oxidative stress has been associated with metabolic syndrome, and also obstructive sleep apnea. These are two diseases which have common prognostic characteristics for heart disease. The aim of this study was to examine the role of oxidative stress in the concurrent presence of metabolic syndrome and obstructive sleep apnea in a working population. Methods Participants were 163 artisan bakers in Shahroud, Iran, routinely exposed to significant heat stress and other oxidative stress indicators on a daily basis as part of their work. Using a cross-sectional design, data relevant to determining metabolic syndrome status according to International Diabetes Federation criteria, and the presence of obstructive sleep apnea according to the STOP-Bang score, was collected. Analyses included hierarchical binary logistic regression to yield predictors of the two diseases. Results Hierarchical binary logistic regression showed that oxidative stress – alongside obesity, no regular exercise, and smoking – was an independent predictor of metabolic syndrome, but not obstructive sleep apnea. Participants who were obese were 28 times more likely to have metabolic syndrome (OR 28.59, 95% CI 4.91–63.02) and 44 times more likely to have obstructive sleep apnea (OR 44.48, 95% CI 4.91–403.28). Participants meeting metabolic syndrome criteria had significantly higher levels of malondialdehyde (p < 0.05) than those who did not. No difference in oxidative stress index levels were found according to obstructive sleep apnea status. Conclusions Our findings suggest that oxidative stress contributes to the onset of metabolic syndrome, and that obstructive sleep apnea is involved in oxidative stress. Whilst obesity, exercise, and smoking remain important targets for reducing the incidence of metabolic syndrome and obstructive sleep apnea, policies to control risks of prolonged exposure to oxidative stress are also relevant in occupations where such environmental conditions exist.

Get full-text (via PubEx)

Machine learning for image-based detection of patients with obstructive sleep apnea: an exploratory study

Sleep And Breathing ◽

10.1007/s11325-021-02301-7 ◽

2021 ◽

Author(s):

Satoru Tsuiki ◽

Takuya Nagaoka ◽

Tatsuya Fukuda ◽

Yuki Sakamoto ◽

Fernanda R. Almeida ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Obstructive Sleep Apnea ◽

Sleep Apnea ◽

Convolutional Neural Network ◽

Deep Convolutional Neural Network ◽

Apnea Hypopnea Index ◽

Cephalometric Analysis ◽

Obstructive Sleep ◽

Main Region

Abstract Purpose In 2-dimensional lateral cephalometric radiographs, patients with severe obstructive sleep apnea (OSA) exhibit a more crowded oropharynx in comparison with non-OSA. We tested the hypothesis that machine learning, an application of artificial intelligence (AI), could be used to detect patients with severe OSA based on 2-dimensional images. Methods A deep convolutional neural network was developed (n = 1258; 90%) and tested (n = 131; 10%) using data from 1389 (100%) lateral cephalometric radiographs obtained from individuals diagnosed with severe OSA (n = 867; apnea hypopnea index > 30 events/h sleep) or non-OSA (n = 522; apnea hypopnea index < 5 events/h sleep) at a single center for sleep disorders. Three kinds of data sets were prepared by changing the area of interest using a single image: the original image without any modification (full image), an image containing a facial profile, upper airway, and craniofacial soft/hard tissues (main region), and an image containing part of the occipital region (head only). A radiologist also performed a conventional manual cephalometric analysis of the full image for comparison. Results The sensitivity/specificity was 0.87/0.82 for full image, 0.88/0.75 for main region, 0.71/0.63 for head only, and 0.54/0.80 for the manual analysis. The area under the receiver-operating characteristic curve was the highest for main region 0.92, for full image 0.89, for head only 0.70, and for manual cephalometric analysis 0.75. Conclusions A deep convolutional neural network identified individuals with severe OSA with high accuracy. Future research on this concept using AI and images can be further encouraged when discussing triage of OSA.

Get full-text (via PubEx)

High performance logistic regression for privacy-preserving genome analysis

BMC Medical Genomics ◽

10.1186/s12920-020-00869-9 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Anderson C. A. Nascimento ◽

Davis Railsback ◽

Jianwei Shen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Genome Analysis ◽

Local Area Network ◽

Local Area ◽

Activation Function ◽

Area Network ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Get full-text (via PubEx)