scholarly journals Predictive Modeling and Analysis of Logistic Regression and k-Nearest Neighbor for Personal Loan Campaign

Author(s):  
Bhavya Alankar ◽  
Iftikhar Alam
Author(s):  
Ali Pala ◽  
Jing Zhang ◽  
Jun Zhuang ◽  
Nathan Allen

Abstract Illegal fishing activities in the Gulf of Mexico pose a threat to the US national security, as well as damage to the economy. The US Coast Guard (USCG) estimates over 1100 incursions by Mexican fisherman into US regulated waters in the Gulf of Mexico annually. Fishermen enter the water borders to catch red snapper, which is one of the Gulf of Mexico’s signature and most valuable fish. There are a number of academic contributions which have sought to improve the understanding of the problem of illegal fishing, and to try to generate better solutions. In this study, we investigate the relationship between illegal fishing activities and environmental factors with one-year of historical sight, weather, and moon phase data. Descriptive analysis provides some interesting insights such as sight patterns depending on wave height, moon phase, and hours of a day. Also, we develop logistic regression models that shows wave height is negatively correlated with sight occurrences for all sight types. In addition, we oversample the data and develop two pre diction models using logistic regression and k-nearest neighbor algorithm and compare prediction accuracies. The results show that k-nearest neighbor algorithm performs better in most of the cases.


2020 ◽  
Vol 6 (1) ◽  
pp. 116-125
Author(s):  
Fajar Sarasati ◽  
Lia Dwi Cahyanti ◽  
Annida Purnamawati ◽  
Riyan Latifahul Hasanah

Abstract: Building a brand new company that starts a business by conducting market research is intended to introduce new products and maintain existing businesses. But the market survey actually requires quite a lot of costs for transportation costs, brochure printing costs, more employee salaries and so forth. Surveys conducted offline also reach a less extensive market, less maximum results and less detail, and require more time. Based on the description above, the researchers conducted a study using Facebook performance metric data that assessed the construction of cosmetics brands using the K-Nearest Neighbor and Logistics Regression (SVM) algorithm by classifying which posts were the most desirable and less desirable by consumers, as well as measuring by the EnBag method K-LoGres of the two algorithms to improve the performance of the two proposed algorithms. Bagging technique was chosen because it has the advantage of being able to improve the measurement results and improve the accuracy of classification measurements by combining two or more algorithms. Based on the measurement results of Facebook metric data which assesses the development of cosmetic brands with the K-NN algorithm it gets an accuracy of 68.67% and a Logistic Regression (SVM) of 72.67% then the two algorithms are processed using the EnBag K-LoGres method getting an accuracy of 73.91%. Based on the results of measurements with the EnBag K-LoGres method the results increased by 1.24%.Keywords: Brand Development, Cosmetics, K-Nearest-Neighbour, Logistic (SVM), EnBag K-LogresAbstrak: Membangun merek perusahaan yang baru memulai usaha dengan melakukan riset pasar dimaksudkan untuk memperkenalkan produk baru serta mempertahankan usaha yang sudah ada. Namun survei pasar justru membutuhkan biaya yang cukup banyak untuk biaya transportasi, biaya cetak brosur, gaji karyawan lebih banyak dan lain sebagainya. Survei yang dilakukan secara offline juga menjangkau pasar kurang luas, hasil kurang maksimal dan kurang merinci, serta membutuhkan waktu yang lebih lama. Berdasarkan uraian diatas maka peneliti melakukan penelitian dengan memanfaatkan data metrik kinerja facebook yang menilai pembangunan merk kosmetik dengan menggunakan algoritma K-Nearest Neighbourdan Logistic Regreesion (SVM) dengan mengklasifikasikan postingan mana yang paling diminati dan kurang diminati oleh konsumen, serta melakukan pengukuran dengan metode EnBag K-LoGres dari kedua algoritma untuk meningkatkan kinerja kedua algoritma yang diusulkan. Teknik bagging dipilih karena memiliki kelebihan dapat memperbaiki hasil pengukuran serta meningkatkan akurasi dari pengukuran klasifikasi dengan menggabungkan dua atau lebih algoritma. Berdasarkan hasil pengukuran data metrik facebook yang menilai pembangunan merek kosmetik denganalgoritma K-NN memperoleh akurasi sebesar 68.67% dan Logistic Regression (SVM) sebesar 72.67% selanjutnya kedua algoritma diproses dengan metode EnBag K-LoGres mendapat akurasi sebesar 73.91%. Berdasarkan hasil pengukuran dengan metode EnBag K-LoGreshasilnya mengalami kenaikan sebesar 1.24 %.Kata kunci: Pembangunan Merek, Kosmetik, K-Nearest Neighbour, Logistic Regression (SVM), EnBag K-LoGres


Sebatik ◽  
2020 ◽  
Vol 24 (2) ◽  
Author(s):  
Anifuddin Azis

Indonesia merupakan negara dengan keanekaragaman hayati terbesar kedua di dunia setelah Brazil. Indonesia memiliki sekitar 25.000 spesies tumbuhan dan 400.000 jenis hewan dan ikan. Diperkirakan 8.500 spesies ikan hidup di perairan Indonesia atau merupakan 45% dari jumlah spesies yang ada di dunia, dengan sekitar 7.000an adalah spesies ikan laut. Untuk menentukan berapa jumlah spesies tersebut dibutuhkan suatu keahlian di bidang taksonomi. Dalam pelaksanaannya mengidentifikasi suatu jenis ikan bukanlah hal yang mudah karena memerlukan suatu metode dan peralatan tertentu, juga pustaka mengenai taksonomi. Pemrosesan video atau citra pada data ekosistem perairan yang dilakukan secara otomatis mulai dikembangkan. Dalam pengembangannya, proses deteksi dan identifikasi spesies ikan menjadi suatu tantangan dibandingkan dengan deteksi dan identifikasi pada objek yang lain. Metode deep learning yang berhasil dalam melakukan klasifikasi objek pada citra mampu untuk menganalisa data secara langsung tanpa adanya ekstraksi fitur pada data secara khusus. Sistem tersebut memiliki parameter atau bobot yang berfungsi sebagai ektraksi fitur maupun sebagai pengklasifikasi. Data yang diproses menghasilkan output yang diharapkan semirip mungkin dengan data output yang sesungguhnya.  CNN merupakan arsitektur deep learning yang mampu mereduksi dimensi pada data tanpa menghilangkan ciri atau fitur pada data tersebut. Pada penelitian ini akan dikembangkan model hybrid CNN (Convolutional Neural Networks) untuk mengekstraksi fitur dan beberapa algoritma klasifikasi untuk mengidentifikasi spesies ikan. Algoritma klasifikasi yang digunakan pada penelitian ini adalah : Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbor (KNN),  Random Forest, Backpropagation.


2021 ◽  
Author(s):  
Michael Zhang ◽  
Elizabeth Tong ◽  
Sam Wong ◽  
Forrest Hamrick ◽  
Maryam Mohammadzadeh ◽  
...  

Abstract Background Non-invasive differentiation between schwannomas and neurofibromas is important for appropriate management, preoperative counseling, and surgical planning, but has proven difficult using conventional imaging. The objective of this study was to develop and evaluate machine learning approaches for differentiating peripheral schwannomas from neurofibromas. Methods We assembled a cohort of schwannomas and neurofibromas from 3 independent institutions and extracted high-dimensional radiomic features from gadolinium-enhanced, T1-weighted MRI using the PyRadiomics package on Quantitative Imaging Feature Pipeline. Age, sex, neurogenetic syndrome, spontaneous pain, and motor deficit were recorded. We evaluated the performance of 6 radiomics-based classifier models with and without clinical features and compared model performance against human expert evaluators. Results 107 schwannomas and 59 neurofibroma were included. The primary models included both clinical and imaging data. The accuracy of the human evaluators (0.765) did not significantly exceed the no-information rate (NIR), whereas the Support Vector Machine (0.929), Logistic Regression (0.929), and Random Forest (0.905) classifiers exceeded the NIR. Using the method of DeLong, the AUC for the Logistic Regression (AUC=0.923) and K Nearest Neighbor (AUC=0.923) classifiers was significantly greater than the human evaluators (AUC=0.766; p = 0.041). Conclusions The radiomics-based classifiers developed here proved to be more accurate and had a higher AUC on the ROC curve than expert human evaluators. This demonstrates that radiomics using routine MRI sequences and clinical features can aid in differentiation of peripheral schwannomas and neurofibromas.


2021 ◽  
Author(s):  
Yong Li

BACKGROUND Preventing in-hospital mortality in Patients with ST-segment elevation myocardial infarction (STEMI) is a crucial step. OBJECTIVE The objective of our research was to to develop and externally validate the diagnostic model of in-hospital mortality in acute STEMI patients used artificial intelligence methods. METHODS As our datasets were highly imbalanced, we evaluated the effect of down-sampling methods. Therefore, down-sampling techniques was additionally implemented on the original dataset to create 1 balanced datasets. This ultimately yielded 2 datasets; original, and down-sampling. We divide non-randomly the American population into a training set and a test set , and anther American population as the validation set. We used artificial intelligence methods to develop and externally validate the diagnostic model of in-hospital mortality in acute STEMI patients, including logistic regression, decision tree, extreme gradient boosting (XGBoost), K nearest neighbor classification model ,and multi-layer perceptron.We used confusion matrix combined with the area under the receiver operating characteristic curve (AUC) to evaluate the pros and cons of the above models. RESULTS The strongest predictors of in-hospital mortality were age, female, cardiogenic shock, atrial fibrillation(AF), ventricular fibrillation(VF),in-hospital bleeding and medical history such as hypertension, old myocardial infarction.The F2 score of logistic regression in the training set, the test set , and the validation data set were 0.7, 0.7, and 0.54 respectively.The F2 score of XGBoost were 0.74, 0.52, and 0.54 respectively. The F2 score of decision tree were 0.72, 0.51,and 0.52 respectively. The F2 score of K nearest neighbor classification model were 0.64,0.47, and 0.49 respectively. The F2 score of multi-layer perceptron were 0.71, 0.54, and 0.54 respectively. The AUC of logistic regression in the training set, the test set, and the validation data set were 0.72, 0.73, and 0.76 respectively. The AUC of XGoBost were 0.75, 0.73, and 0.75 respectively. The AUC of decision tree were 0.75, 0.71,and 0.74 respectively. The AUC of K nearest neighbor classification model were 0.71,0.69, and 0.72 respectively. The AUC of multi-layer perceptron were 0.73, 0.74, and 0.75 respectively. The diagnostic model built by logistic regression was the best. CONCLUSIONS The strongest predictors of in-hospital mortality were age, female, cardiogenic shock, AF, VF,in-hospital bleeding and medical history such as hypertension, old myocardial infarction. We had used artificial intelligence methods developed and externally validated the diagnostic model of in-hospital mortality in acute STEMI patients.The diagnostic model built by logistic regression was the best. CLINICALTRIAL We registered this study with WHO International Clinical Trials Registry Platform (ICTRP) (registration number: ChiCTR1900027129; registered date: 1 November 2019). http://www.chictr.org.cn/edit.aspx?pid=44888&htm=4.


MATICS ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 21-27
Author(s):  
Via Ardianto Nugroho ◽  
Derry Pramono Adi ◽  
Achmad Teguh Wibowo ◽  
MY Teguh Sulistyono ◽  
Agustinus Bimo Gumelar

Pada industri jasa pelayanan peti kemas, Terminal Nilam merupakan pelanggan dari PT. BIMA, yang secara khusus bergerak dibidang jasa perbaikan dan perawatan alat berat. Terminal ini menjadi sentral tempat untuk melakukan aktifitas bongkar muat peti kemas domestik yang memiliki empat buah container crane untuk melayani dua kapal. Proses perawatan alat berat seperti container crane yang selama ini beroperasi, agaknya kurang memperhatikan data pengelompokkan atau klasifikasi jenis perawatan yang dibutuhkan oleh alat berat tersebut. Di kemudian hari, alat berat dapat menunjukkan kinerja yang tidak maksimal bahkan dapat berujung pada kecelakaan kerja. Selain itu, kelalaian perawatan container crane juga dapat menyebabkan pembengkakan biaya perawatan lanjut. Target produksi bongkar muat dapat berkurang dan juga keterlambatan jadwal kapal sandar sangat mungkin terjadi. Metode pembelajaran menggunakan mesin atau biasa disebut dengan Machine Learning (ML), dengan mudah dapat melenyapkan kemungkinan-kemungkinan tersebut. ML dalam penelitian ini, kami rancang agar bekerja dengan mengidentifikasi lalu mengelompokkan jenis perawatan container crane yang sesuai, yaitu ringan atau berat. Metode ML yang pilih untuk digunakan dalam penelitian ini yaitu Random Forest, Support Vector Machine, k-Nearest Neighbor, Naïve Bayes, Logistic Regression, J48, dan Decision Tree. Penelitian ini menunjukkan keberhasilan ML model tree dalam melakukan pembelajaran jenis data perawatan container crane (numerik dan kategoris), dengan J48 menunjukkan performa terbaik dengan nilai akurasi dan nilai ROC-AUC mencapai 99,1%. Pertimbangan klasifikasi kami lakukan dengan mengacu kepada tanggal terakhir perawatan, hour meter, breakdown, shutdown, dan sparepart.


2021 ◽  
Vol 14 (1) ◽  
pp. 134-146
Author(s):  
Adi Wijaya ◽  
◽  
Teguh Adji ◽  
Noor Setiawan ◽  
◽  
...  

Electroencephalogram (EEG) based motor imagery (MI) classification requires efficient feature extraction and consistent accuracy for reliable brain-computer interface (BCI) systems. Achieving consistent accuracy in EEGMI classification is still big challenge according to the nature of EEG signal which is subject dependent. To address this problem, we propose a feature selection scheme based on Logistic Regression (LRFS) and two-stage detection (TSD) in channel instantiation approach. In TSD scheme, Linear Discriminant Analysis was utilized in first-stage detection; while Gradient Boosted Tree and k-Nearest Neighbor in second-stage detection. To evaluate the proposed method, two publicly available datasets, BCI competition III-Dataset IVa and BCI competition IV-Dataset 2a, were used. Experimental results show that the proposed method yielded excellent accuracy for both datasets with 95.21% and 94.83%, respectively. These results indicated that the proposed method has consistent accuracy and is promising for reliable BCI systems.


Sign in / Sign up

Export Citation Format

Share Document