logistic regression classifier
Recently Published Documents


TOTAL DOCUMENTS

51
(FIVE YEARS 34)

H-INDEX

6
(FIVE YEARS 2)

2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

In today's world, machine learning has become a vital part of our lives. When applied to real-world applications, machine learning encounters the difficulty of high dimensional data. Unnecessary and redundant features can be found in data. The performance of classification algorithms employed in prediction is harmed by these superfluous features. The primary step in developing any decision support system is to identify critical features. In this paper, authors have proposed a hybrid feature selection method CFGA by integrating CFS (Correlation feature selection) and GA (genetic algorithm). The efficiency of proposed method is analyzed using Logistic Regression classifier on the scale of accuracy, sensitivity, specificity, precision, F-measure and execution time parameters. Proposed CFGA method is also compared to six other feature selection methods. Results demonstrate that proposed method have increased the performance of the classification system by removing irrelevant and redundant features.


Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5431
Author(s):  
Malak Aljabri ◽  
Sara Mhd. Bachar Chrouf ◽  
Norah A. Alzahrani ◽  
Leena Alghamdi ◽  
Reem Alfehaid ◽  
...  

The COVID-19 pandemic has greatly impacted the normal life of people worldwide. One of the most noticeable impacts is the enforcement of social distancing to reduce the spread of the virus. The Ministry of Education in Saudi Arabia implemented social distancing measures by enforcing distance learning at all educational stages. This measure brought about new experiences and challenges to students, parents, and teachers. This research measures the acceptance rate of this way of learning by analysing people’s tweets regarding distance learning in Saudi Arabia. All the tweets analysed were written in Arabic and collected within the boundary of Saudi Arabia. They date back to the day that the distance learning announcement was made. The tweets were pre-processed, and labelled positive, or negative. Machine learning classifiers with different features and extraction techniques were then built to analyse the sentiment. The accuracy results for the different models were then compared. The best accuracy achieved (0.899) resulted from the Logistic regression classifier with unigram and Term Frequency-Inverse Document Frequency as a feature extraction approach. This model was then applied on a new unlabelled dataset and classified to different educational stages; results demonstrated generally positive opinions regarding distance learning for general education stages (kindergarten, intermediate, and high schools), and negative opinions for the university stage. Further analysis was applied to identify the main topics related to the positive and negative sentiment. This result can be used by the Ministry of Education to further improve the distance learning educational system.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Samantha Prins ◽  
Ahnjili Zhuparris ◽  
Ellen P. Hart ◽  
Robert-Jan Doll ◽  
Geert Jan Groeneveld

Abstract Background In the current study, we aimed to develop an algorithm based on biomarkers obtained through non- or minimally invasive procedures to identify healthy elderly subjects who have an increased risk of abnormal cerebrospinal fluid (CSF) amyloid beta42 (Aβ) levels consistent with the presence of Alzheimer’s disease (AD) pathology. The use of the algorithm may help to identify subjects with preclinical AD who are eligible for potential participation in trials with disease modifying compounds being developed for AD. Due to this pre-selection, fewer lumbar punctures will be needed, decreasing overall burden for study subjects and costs. Methods Healthy elderly subjects (n = 200; age 65–70 (N = 100) and age > 70 (N = 100)) with an MMSE > 24 were recruited. An automated central nervous system test battery was used for cognitive profiling. CSF Aβ1-42 concentrations, plasma Aβ1-40, Aβ1-42, neurofilament light, and total Tau concentrations were measured. Aβ1-42/1-40 ratio was calculated for plasma. The neuroinflammation biomarker YKL-40 and APOE ε4 status were determined in plasma. Different mathematical models were evaluated on their sensitivity, specificity, and positive predictive value. A logistic regression algorithm described the data best. Data were analyzed using a 5-fold cross validation logistic regression classifier. Results Two hundred healthy elderly subjects were enrolled in this study. Data of 154 subjects were used for the per protocol analysis. The average age of the 154 subjects was 72.1 (65–86) years. Twenty-four (27.3%) were Aβ positive for AD (age 65–83). The results of the logistic regression classifier showed that predictive features for Aβ positivity/negativity in CSF consist of sex, 7 CNS tests, and 1 plasma-based assay. The model achieved a sensitivity of 70.82% (± 4.35) and a specificity of 89.25% (± 4.35) with respect to identifying abnormal CSF in healthy elderly subjects. The receiver operating characteristic curve showed an AUC of 65% (± 0.10). Conclusion This algorithm would allow for a 70% reduction of lumbar punctures needed to identify subjects with abnormal CSF Aβ levels consistent with AD. The use of this algorithm can be expected to lower overall subject burden and costs of identifying subjects with preclinical AD and therefore of total study costs. Trial registration ISRCTN.org identifier: ISRCTN79036545 (retrospectively registered).


2021 ◽  
Vol 5 (1) ◽  
pp. 69-74
Author(s):  
Olesia Barkovska ◽  
Vladyslav Kholiev ◽  
Daria Pyvovarova ◽  
Georgiy Ivaschenko ◽  
Dmytro Rosinskiy

The paper proposes a system which is electronic data storage (of qualification works of students from different countries) and provides the capability to identify and connect young scientists conducting research on a related problem area. The purpose of developing this system is to provide opportunities for knowledge exchange, research in a team on a common problem, as well as to identify scientific trends in different countries. In this paper, the preprocessing methods influence on the work of classifiers such as Logistic Regression, LSTM, BERT, LightGBM was researched. A study was conducted on the speed of classification and F1 assessment. Conclusions. Lemmatization showed to require a shorter operating time compared to steaming by almost twice and a better score by an average of 5 percent, so it was decided to use the Logistic Regression classifier with lemmatization at the stage of text preparation in the subsequent operation of the proposed ISKE.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Shenda Hong ◽  
Xinlin Hou ◽  
Jin Jing ◽  
Wendong Ge ◽  
Luxia Zhang

Background. Prediction of mortality risk in intensive care units (ICU) is an important task. Data-driven methods such as scoring systems, machine learning methods, and deep learning methods have been investigated for a long time. However, few data-driven methods are specially developed for pediatric ICU. In this paper, we aim to amend this gap—build a simple yet effective linear machine learning model from a number of hand-crafted features for mortality prediction in pediatric ICU. Methods. We use a recently released publicly available pediatric ICU dataset named pediatric intensive care (PIC) from Children’s Hospital of Zhejiang University School of Medicine in China. Unlike previous sophisticated machine learning methods, we want our method to keep simple that can be easily understood by clinical staffs. Thus, an ensemble step-wise feature ranking and selection method is proposed to select a small subset of effective features from the entire feature set. A logistic regression classifier is built upon selected features for mortality prediction. Results. The final predictive linear model with 11 features achieves a 0.7531 ROC-AUC score on the hold-out test set, which is comparable with a logistic regression classifier using all 397 features (0.7610 ROC-AUC score) and is higher than the existing well known pediatric mortality risk scorer PRISM III (0.6895 ROC-AUC score). Conclusions. Our method improves feature ranking and selection by utilizing an ensemble method while keeping a simple linear form of the predictive model and therefore achieves better generalizability and performance on mortality prediction in pediatric ICU.


2021 ◽  
Vol 11 ◽  
Author(s):  
Yixuan Zhai ◽  
Dixiang Song ◽  
Fengdong Yang ◽  
Yiming Wang ◽  
Xin Jia ◽  
...  

ObjectivesThe aim of this study was to establish and validate a radiomics nomogram for predicting meningiomas consistency, which could facilitate individualized operation schemes-making.MethodsA total of 172 patients was enrolled in the study (train cohort: 120 cases, test cohort: 52 cases). Tumor consistency was classified as soft or firm according to Zada’s consistency grading system. Radiomics features were extracted from multiparametric MRI. Variance selection and LASSO regression were used for feature selection. Then, radiomics models were constructed by five classifiers, and the area under curve (AUC) was used to evaluate the performance of each classifiers. A radiomics nomogram was developed using the best classifier. The performance of this nomogram was assessed by AUC, calibration and discrimination.ResultsA total of 3840 radiomics features were extracted from each patient, of which 3719 radiomics features were stable features. 28 features were selected to construct the radiomics nomogram. Logistic regression classifier had the highest prediction efficacy. Radiomics nomogram was constructed using logistic regression in the train cohort. The nomogram showed a good sensitivity and specificity with AUCs of 0.861 and 0.960 in train and test cohorts, respectively. Moreover, the calibration graph of the nomogram showed a favorable calibration in both train and test cohorts.ConclusionsThe presented radiomics nomogram, as a non-invasive prediction tool, could predict meningiomas consistency preoperatively with favorable accuracy, and facilitated the determination of individualized operation schemes.


2021 ◽  
Vol 6 (2) ◽  
pp. 120-129
Author(s):  
Nadhif Ikbar Wibowo ◽  
Tri Andika Maulana ◽  
Hamzah Muhammad ◽  
Nur Aini Rakhmawati

Public responses, posted on Twitter reacting to the Tokopedia data leak incident, were used as a data set to compare the performance of three different classifiers, trained using supervised learning modeling, to classify sentiment on the text. All tweets were classified into either positive, negative, or neutral classes. This study compares the performance of Random Forest, Support-Vector Machine, and Logistic Regression classifier. Data was scraped automatically and used to evaluate several models; the SVM-based model has the highest f1-score 0.503583. SVM is the best performing classifier.


2021 ◽  
Author(s):  
Tingyan Deng

Autistic Spectrum Disorder (ASD) is a developmental disability, which can affect communication and behavior, causing significant social, communication, and behavior challenge. From a rare childhood disorder, ASD has evolved into a disorder that is found, according to the National Institute of Health, in 1% to 2% of the population in high income countries. A potential early and accurate diagnosis can not only help doctors to find the disease early, leading to a more on time treatment to the patient, but also can save significant healthcare costs for the patients. With the rapid growth of ASD cases, many open-source ASD related datasets were created for scientists and doctors to investigate this disease. Autistic Spectrum Disorder Screening Data for Adult is a well-known dataset, which contains 20 features to be utilized for further analysis on the potential cause and prediction of ASD. In this paper, we developed an Autism classification algorithm based on logistic regression model. Our model starts with featuring engineering to extract deep information from the dataset and then applied a modified logistic regression classifier to the data. The model can predict the ASD in an average F1 score of 0.97, which displays the superiority and feasibility of the proposed model. Besides, the data visualization technique was used to displays several feature distributions images for people to better understand the data and related feature engineering.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 244
Author(s):  
Evangelos Kafantaris ◽  
Ian Piper ◽  
Tsz-Yan Milly Lo ◽  
Javier Escudero

Network physiology has emerged as a promising paradigm for the extraction of clinically relevant information from physiological signals by moving from univariate to multivariate analysis, allowing for the inspection of interdependencies between organ systems. However, for its successful implementation, the disruptive effects of artifactual outliers, which are a common occurrence in physiological recordings, have to be studied, quantified, and addressed. Within the scope of this study, we utilize Dispersion Entropy (DisEn) to initially quantify the capacity of outlier samples to disrupt the values of univariate and multivariate features extracted with DisEn from physiological network segments consisting of synchronised, electroencephalogram, nasal respiratory, blood pressure, and electrocardiogram signals. The DisEn algorithm is selected due to its efficient computation and good performance in the detection of changes in signals for both univariate and multivariate time-series. The extracted features are then utilised for the training and testing of a logistic regression classifier in univariate and multivariate configurations in an effort to partially automate the detection of artifactual network segments. Our results indicate that outlier samples cause significant disruption in the values of extracted features with multivariate features displaying a certain level of robustness based on the number of signals formulating the network segments from which they are extracted. Furthermore, the deployed classifiers achieve noteworthy performance, where the percentage of correct network segment classification surpasses 95% in a number of experimental setups, with the effectiveness of each configuration being affected by the signal in which outliers are located. Finally, due to the increase in the number of features extracted within the framework of network physiology and the observed impact of artifactual samples in the accuracy of their values, the implementation of algorithmic steps capable of effective feature selection is highlighted as an important area for future research.


Sign in / Sign up

Export Citation Format

Share Document