scholarly journals Polytomous Logistic Regression Based Random Forest Classifier for Diagnosing Cancer Disease

2018 ◽  
Vol 10 (8) ◽  
Author(s):  
Suganthi Jeyasingh ◽  
Malathy Veluchamy
2021 ◽  
Vol 5 (1) ◽  
pp. 22
Author(s):  
Heena Tyagi ◽  
Emma Daulton ◽  
Ayman S. Bannaga ◽  
Ramesh P. Arasaradnam ◽  
James A. Covington

This study outlines the use of an electronic nose as a method for the detection of VOCs as biomarkers of bladder cancer. Here, an AlphaMOS FOX 4000 electronic nose was used for the analysis of urine samples from 15 bladder cancer and 41 non-cancerous patients. The FOX 4000 consists of 18 MOS sensors that were used to differentiate the two groups. The results obtained were analysed using s MultiSens Analyzer and RStudio. The results showed a high separation with sensitivity and specificity of 0.93 and 0.88, respectively, using a Sparse Logistic Regression and 0.93 and 0.76 using a Random Forest classifier. We conclude that the electronic nose shows potential for discriminating bladder cancer from non-cancer subjects using urine samples.


Author(s):  
Sibasankar Padhy ◽  
S Sai Suryateja

The purpose of this study is to detect the epileptic seizures, which can be indicated by the abnormal disturbances in intracranial neurons using the electroencephalogram (EEG) signals. The EEG signals are grouped into three categories viz., Normal EEG signals (Z and O subsets), Seizure-free EEG signals (N and F subsets), and Seizure EEG signals (S subset). Whereas, for classification in this study, EEG signals are divided into three groups namely NF-S, O-FS, and ZO-NF-S. The signal length is fixed to be 4096 samples. The EEG signals will be decomposed by using Tunable-Q Wavelet Transform (TQWT), which produces intrinsic mode functions (IMFs) in decreasing order of frequency. These IMFs are analysed to gather the features of these signals, which help to classify them into various categories, and these features are fed as inputs to three classifiers viz., Random Forest (RF), Decision Table (DT), and Logistic Regression (LR). Logistic Regression classifier has showed higher accuracy, specificity and sensitivity for NF-S and O-F-S groups in comparison to RF and DT classifiers, whereas, Random Forest classifier expressed higher accuracy, specificity and sensitivity for ZO-NF-S groups in comparison to other classifiers. By utilising LR classifier, the suitable parameters of TQWT in NF-S (seizure-free vs. Seizure) are Q=6, r=3, and J=9 and showed maximum accuracy of 98%; and in O-F-S (Normal vs. Seizure-free vs. Seizure), Q=1, r=3, and J=9 attained maximum accuracy of 94.7%. Whereas, in ZONF-S (Normal vs. Seizure-free vs. Seizure), Q=4, r=3, and J=9 expressed maximum accuracy of 99.8% utilising Random Forest classifier.


2020 ◽  
Vol 8 (5) ◽  
pp. 5353-5362

Background/Aim: Prostate cancer is regarded as the most prevalent cancer in the word and the main cause of deaths worldwide. The early strategies for estimating the prostate cancer sicknesses helped in settling on choices about the progressions to have happened in high-chance patients which brought about the decrease of their dangers. Methods: In the proposed research, we have considered informational collection from kaggle and we have done pre-processing tasks for missing values .We have three missing data values in compactness attribute and two missing values in fractal dimension were replaced by mean of their column values .The performance of the diagnosis model is obtained by using methods like classification, accuracy, sensitivity and specificity analysis. This paper proposes a prediction model to predict whether a people have a prostate cancer disease or not and to provide an awareness or diagnosis on that. This is done by comparing the accuracies of applying rules to the individual results of Support Vector Machine, Random forest, Naive Bayes classifier and logistic regression on the dataset taken in a region to present an accurate model of predicting prostate cancer disease. Results: The machine learning algorithms under study were able to predict prostate cancer disease in patients with accuracy between 70% and 90%. Conclusions: It was shown that Logistic Regression and Random Forest both has better Accuracy (90%) when compared to different Machine-learning Algorithms.


2021 ◽  
Author(s):  
Yasuhiro Suzuki ◽  
hiroaki suzuki ◽  
tatsuya ishikawa ◽  
yasunori yamada ◽  
shigeru yatoh ◽  
...  

Abstract We aimed to investigate the status of falls and to identify important risk factors for falls in persons with type 2 diabetes (T2D) including the non-elderly. Participants were 316 persons with T2D. They were assessed for medical history, laboratory data and physical capabilities during the hospitalization and given a questionnaire on falls one year after discharge. Two different statistical models, logistic regression and random forest classifier were used to investigate important predictors of falls. The response rate to the survey was 72%; of the 226 respondents, there were 129 males and 97 females (median age 62 years). The fall rate during the first year after discharge was 19%. Logistic regression revealed that knee extension strength (β= -0.698, P = 0.002), fasting C-peptide (F-CPR) level (β= 0.492, P = 0.009) and dorsiflexion strength (β= -0.432, P = 0.047) were independent predictors of falls. The random forest classifier placed knee extension strength, grip strength (0.234), F-CPR level and dorsiflexion strength in the top 4 important variables for falls. Lower extremity muscle weakness as well as elevated F-CPR levels and reduced grip strength was shown to be important risk factors for falls in T2D.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255683
Author(s):  
Vaijayanthee Anand ◽  
Luv Verma ◽  
Aekta Aggarwal ◽  
Priyadarshini Nanjundappa ◽  
Himanshu Rai

Purpose The COVID-19 pandemic has undoubtedly altered the routine of life and caused unanticipated changes resulting in severe psychological responses and mental health crisis. The study aimed to identify psycho-social factors that predicted distress among Indian population during the spread of novel Coronavirus. Method An online survey was conducted to assess the predictors of distress. A global logistic regression model was built, by identifying significant factors from individual logistic regression models built on various groups of independent variables. The prediction capability of the model was compared with the random forest classifier. Results The respondents (N = 1060) who are more likely to be distressed, are in the age group of 21-35 years, are females (OR = 1.425), those working on site (OR = 1.592), have pre-existing medical conditions (OR = 1.682), do not have health insurance policy covering COVID-19 (OR = 1.884), have perceived seriousness of COVID-19 (OR = 1.239), have lack of trust in government (OR = 1.246) and whose basic needs’ fulfillment are unsatisfactory (OR = 1.592). The ones who are less likely to be distressed, have higher social support and psychological capital. Random forest classifier correctly classified 2.3% and 17.1% of people under lower and higher distress respectively, with respect to logistic regression. Conclusions This study confirms the prevalence of high distress experienced by Indians at the time of COVID-19 and provides pragmatic implications for psychological health at macro and micro levels during an epidemiological crisis.


Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2326
Author(s):  
Mazhar Javed Awan ◽  
Awais Yasin ◽  
Haitham Nobanee ◽  
Ahmed Abid Ali ◽  
Zain Shahzad ◽  
...  

Before the internet, people acquired their news from the radio, television, and newspapers. With the internet, the news moved online, and suddenly, anyone could post information on websites such as Facebook and Twitter. The spread of fake news has also increased with social media. It has become one of the most significant issues of this century. People use the method of fake news to pollute the reputation of a well-reputed organization for their benefit. The most important reason for such a project is to frame a device to examine the language designs that describe fake and right news through machine learning. This paper proposes models of machine learning that can successfully detect fake news. These models identify which news is real or fake and specify the accuracy of said news, even in a complex environment. After data-preprocessing and exploration, we applied three machine learning models; random forest classifier, logistic regression, and term frequency-inverse document frequency (TF-IDF) vectorizer. The accuracy of the TFIDF vectorizer, logistic regression, random forest classifier, and decision tree classifier models was approximately 99.52%, 98.63%, 99.63%, and 99.68%, respectively. Machine learning models can be considered a great choice to find reality-based results and applied to other unstructured data for various sentiment analysis applications.


2021 ◽  
Vol 1 (1) ◽  
pp. 21-32
Author(s):  
Mawaddah Harahap ◽  
Yusniar Lubis ◽  
Zakarias Situmorang

Dalam kegiatan pemasaran digital, data Science (DS) memiliki peran penting dalam memahami kinerja industri pemasaran sebelum menerapkan teknik pemasaran digital pada pemasaran produk. Hal ini dikarenakan setiap pelanggan merespons secara berbeda setiap penawaran. Perilaku pelanggan juga berubah berdasarkan waktu karena mereka mungkin memiliki kebutuhan yang berbeda pada situasi yang berbeda. Pada makalah ini fokus menyajikan analisis bisnis dengan penerapan DS untuk mengeksplorasi pola perilaku dan juga memprediksi bagaimana pelanggan akan merespons penawaran yang berbeda. Penerapan analisis data eksplorasi juga diterapkan untuk menjawab beberapa pertanyaan bisnis, dari hasil pengamatan menghasilkan lima kelompok pelanggan yang disajikan dalam bentuk visualisasi dan model Random Forest Classifier memiliki skor akurasi prediksi terbaik sebesar 91%, kemudian K neighbors Classifier dan Logistic Regression.


2021 ◽  
Vol 1 (1) ◽  
pp. 8-13
Author(s):  
Amir Mahmud Husein ◽  
Mawaddah Harahap

Peralihan pelanggan merupakan fenomena dimana pelanggan perusahaan berhenti membeli atau berinteraksi sehingga sangat penting bagi perusahaan khususnya perbankan untuk memprediksi kemungkinan churn pelanggan dan hasilnya dapat digunakan untuk membantu retensi pelanggan dan bagian dari strategi perusahaan. Makalah ini menyajikan analisis dan prediksi churn pelanggan dengan menggunakan lima model berbeda yaitu Kneighbors Classifier, Logistic Regression, Linear SVC, Random Tree Classifier dan Random Forest Classifier. Berdasarkan hasil pengujian pendekatan model Random Forest Classifier dan Kneighbors Classifier lebih baik dari pada model lain dengan akurasi sebesar 86% dan 84%. Rekayasa fitur dengan pendekatan Anova dan Chi Square memiliki pengaruh yang signifikan terhadap peningkatan kinerja model prediksi.


2018 ◽  
Vol 10 (5) ◽  
pp. 1-12
Author(s):  
B. Nassih ◽  
A. Amine ◽  
M. Ngadi ◽  
D. Naji ◽  
N. Hmina

Sign in / Sign up

Export Citation Format

Share Document