Predictions from algorithmic modeling result in better decisions than from data modeling for soybean iron deficiency chlorosis

In soybean variety development and genetic improvement projects, iron deficiency chlorosis (IDC) is visually assessed as an ordinal response variable. Linear Mixed Models for Genomic Prediction (GP) have been developed, compared, and used to select continuous plant traits such as yield, height, and maturity, but can be inappropriate for ordinal traits. Generalized Linear Mixed Models have been developed for GP of ordinal response variables. However, neither approach addresses the most important questions for cultivar development and genetic improvement: How frequently are the ‘wrong’ genotypes retained, and how often are the ‘correct’ genotypes discarded? The research objective reported herein was to compare outcomes from four data modeling and six algorithmic modeling GP methods applied to IDC using decision metrics appropriate for variety development and genetic improvement projects. Appropriate metrics for decision making consist of specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. Data modeling methods for GP included ridge regression, logistic regression, penalized logistic regression, and Bayesian generalized linear regression. Algorithmic modeling methods include Random Forest, Gradient Boosting Machine, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, and Artificial Neural Network. We found that a Support Vector Machine model provided the most specific decisions of correctly discarding IDC susceptible genotypes, while a Random Forest model resulted in the best decisions of retaining IDC tolerant genotypes, as well as the best outcomes when considering all decision metrics. Overall, the predictions from algorithmic modeling result in better decisions than from data modeling methods applied to soybean IDC.

Download Full-text

Perbandingan Algoritma Klasifikasi Sentimen Twitter Terhadap Insiden Kebocoran Data Tokopedia

JISKA (Jurnal Informatika Sunan Kalijaga) ◽

10.14421/jiska.2021.6.2.120-129 ◽

2021 ◽

Vol 6 (2) ◽

pp. 120-129

Author(s):

Nadhif Ikbar Wibowo ◽

Tri Andika Maulana ◽

Hamzah Muhammad ◽

Nur Aini Rakhmawati

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Supervised Learning ◽

Support Vector ◽

Data Set ◽

Logistic Regression Classifier

Public responses, posted on Twitter reacting to the Tokopedia data leak incident, were used as a data set to compare the performance of three different classifiers, trained using supervised learning modeling, to classify sentiment on the text. All tweets were classified into either positive, negative, or neutral classes. This study compares the performance of Random Forest, Support-Vector Machine, and Logistic Regression classifier. Data was scraped automatically and used to evaluate several models; the SVM-based model has the highest f1-score 0.503583. SVM is the best performing classifier.

Download Full-text

IDENTIFIKASI JENIS IKAN MENGGUNAKAN MODEL HYBRID DEEP LEARNING DAN ALGORITMA KLASIFIKASI

Sebatik ◽

10.46984/sebatik.v24i2.1057 ◽

2020 ◽

Vol 24 (2) ◽

Author(s):

Anifuddin Azis

Keyword(s):

Neural Networks ◽

Support Vector Machine ◽

Logistic Regression ◽

Deep Learning ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Output

Indonesia merupakan negara dengan keanekaragaman hayati terbesar kedua di dunia setelah Brazil. Indonesia memiliki sekitar 25.000 spesies tumbuhan dan 400.000 jenis hewan dan ikan. Diperkirakan 8.500 spesies ikan hidup di perairan Indonesia atau merupakan 45% dari jumlah spesies yang ada di dunia, dengan sekitar 7.000an adalah spesies ikan laut. Untuk menentukan berapa jumlah spesies tersebut dibutuhkan suatu keahlian di bidang taksonomi. Dalam pelaksanaannya mengidentifikasi suatu jenis ikan bukanlah hal yang mudah karena memerlukan suatu metode dan peralatan tertentu, juga pustaka mengenai taksonomi. Pemrosesan video atau citra pada data ekosistem perairan yang dilakukan secara otomatis mulai dikembangkan. Dalam pengembangannya, proses deteksi dan identifikasi spesies ikan menjadi suatu tantangan dibandingkan dengan deteksi dan identifikasi pada objek yang lain. Metode deep learning yang berhasil dalam melakukan klasifikasi objek pada citra mampu untuk menganalisa data secara langsung tanpa adanya ekstraksi fitur pada data secara khusus. Sistem tersebut memiliki parameter atau bobot yang berfungsi sebagai ektraksi fitur maupun sebagai pengklasifikasi. Data yang diproses menghasilkan output yang diharapkan semirip mungkin dengan data output yang sesungguhnya. CNN merupakan arsitektur deep learning yang mampu mereduksi dimensi pada data tanpa menghilangkan ciri atau fitur pada data tersebut. Pada penelitian ini akan dikembangkan model hybrid CNN (Convolutional Neural Networks) untuk mengekstraksi fitur dan beberapa algoritma klasifikasi untuk mengidentifikasi spesies ikan. Algoritma klasifikasi yang digunakan pada penelitian ini adalah : Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbor (KNN), Random Forest, Backpropagation.

Download Full-text

Machine learning in the diagnosis of Myocardial Infarction with Non-Obstructive Coronary Arteries

European Heart Journal ◽

10.1093/eurheartj/ehab724.3067 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

M J Espinosa Pascual ◽

P Vaquero Martinez ◽

V Vaquero Martinez ◽

J Lopez Pais ◽

B Izquierdo Coronel ◽

...

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Obstructive Coronary Artery Disease ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector

Abstract Introduction Out of all patients admitted with Myocardial Infarction, 10 to 15% have Myocardial Infarction with Non-Obstructive Coronaries Arteries (MINOCA). Classification algorithms based on deep learning substantially exceed traditional diagnostic algorithms. Therefore, numerous machine learning models have been proposed as useful tools for the detection of various pathologies, but to date no study has proposed a diagnostic algorithm for MINOCA. Purpose The aim of this study was to estimate the diagnostic accuracy of several automated learning algorithms (Support-Vector Machine [SVM], Random Forest [RF] and Logistic Regression [LR]) to discriminate between people suffering from MINOCA from those with Myocardial Infarction with Obstructive Coronary Artery Disease (MICAD) at the time of admission and before performing a coronary angiography, whether invasive or not. Methods A Diagnostic Test Evaluation study was carried out applying the proposed algorithms to a database constituted by 553 consecutive patients admitted to our Hospital with Myocardial Infarction. According to the definitions of 2016 ESC Position Paper on MINOCA, patients were classified into two groups: MICAD and MINOCA. Out of the total 553 patients, 214 were discarded due to the lack of complete data. The set of machine learning algorithms was trained on 244 patients (training sample: 75%) and tested on 80 patients (test sample: 25%). A total of 64 variables were available for each patient, including demographic, clinical and laboratorial features before the angiographic procedure. Finally, the diagnostic precision of each architecture was taken. Results The most accurate classification model was the Random Forest algorithm (Specificity [Sp] 0.88, Sensitivity [Se] 0.57, Negative Predictive Value [NPV] 0.93, Area Under the Curve [AUC] 0.85 [CI 0.83–0.88]) followed by the standard Logistic Regression (Sp 0.76, Se 0.57, NPV 0.92 AUC 0.74 and Support-Vector Machine (Sp 0.84, Se 0.38, NPV 0.90, AUC 0.78) (see graph). The variables that contributed the most in order to discriminate a MINOCA from a MICAD were the traditional cardiovascular risk factors, biomarkers of myocardial injury, hemoglobin and gender. Results were similar when the 19 patients with Takotsubo syndrome were excluded from the analysis. Conclusion A prediction system for diagnosing MINOCA before performing coronary angiographies was developed using machine learning algorithms. Results show higher accuracy of diagnosing MINOCA than conventional statistical methods. This study supports the potential of machine learning algorithms in clinical cardiology. However, further studies are required in order to validate our results. FUNDunding Acknowledgement Type of funding sources: None. ROC curves of different algorithms

Download Full-text

Ovarian cancer diagnosis using pretrained mask CNN-based segmentation with VGG-19 architecture

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2021-0098 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Kavitha Senthil ◽

Vidyaathulasiramam

Keyword(s):

Neural Network ◽

Ovarian Cancer ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Cancer Diagnosis ◽

Support Vector ◽

Hybrid Neural Network ◽

Cancer Prediction ◽

The Neural Network

Abstract Objectives This paper proposed the neural network-based segmentation model using Pre-trained Mask Convolutional Neural Network (CNN) with VGG-19 architecture. Since ovarian is very tiny tissue, it needs to be segmented with higher accuracy from the annotated image of ovary images collected in dataset. This model is proposed to predict and suppress the illness early and to correctly diagnose it, helping the doctor save the patient's life. Methods The paper uses the neural network based segmentation using Pre-trained Mask CNN integrated with VGG-19 NN architecture for CNN to enhance the ovarian cancer prediction and diagnosis. Results Proposed segmentation using hybrid neural network of CNN will provide higher accuracy when compared with logistic regression, Gaussian naïve Bayes, and random Forest and Support Vector Machine (SVM) classifiers.

Download Full-text

Image Classification of Tourist Attractions with K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine

International Journal on Advanced Science Engineering and Information Technology ◽

10.18517/ijaseit.10.6.9098 ◽

2020 ◽

Vol 10 (6) ◽

pp. 2207

Author(s):

Herry Sujaini

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Image Classification ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Tourist Attractions

Download Full-text

Klasifikasi Jenis Pemeliharaan dan Perawatan Container Crane menggunakan Algoritma Machine Learning

MATICS ◽

10.18860/mat.v13i1.11525 ◽

2021 ◽

Vol 13 (1) ◽

pp. 21-27

Author(s):

Via Ardianto Nugroho ◽

Derry Pramono Adi ◽

Achmad Teguh Wibowo ◽

MY Teguh Sulistyono ◽

Agustinus Bimo Gumelar

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Container Crane ◽

Model Tree

Pada industri jasa pelayanan peti kemas, Terminal Nilam merupakan pelanggan dari PT. BIMA, yang secara khusus bergerak dibidang jasa perbaikan dan perawatan alat berat. Terminal ini menjadi sentral tempat untuk melakukan aktifitas bongkar muat peti kemas domestik yang memiliki empat buah container crane untuk melayani dua kapal. Proses perawatan alat berat seperti container crane yang selama ini beroperasi, agaknya kurang memperhatikan data pengelompokkan atau klasifikasi jenis perawatan yang dibutuhkan oleh alat berat tersebut. Di kemudian hari, alat berat dapat menunjukkan kinerja yang tidak maksimal bahkan dapat berujung pada kecelakaan kerja. Selain itu, kelalaian perawatan container crane juga dapat menyebabkan pembengkakan biaya perawatan lanjut. Target produksi bongkar muat dapat berkurang dan juga keterlambatan jadwal kapal sandar sangat mungkin terjadi. Metode pembelajaran menggunakan mesin atau biasa disebut dengan Machine Learning (ML), dengan mudah dapat melenyapkan kemungkinan-kemungkinan tersebut. ML dalam penelitian ini, kami rancang agar bekerja dengan mengidentifikasi lalu mengelompokkan jenis perawatan container crane yang sesuai, yaitu ringan atau berat. Metode ML yang pilih untuk digunakan dalam penelitian ini yaitu Random Forest, Support Vector Machine, k-Nearest Neighbor, Naïve Bayes, Logistic Regression, J48, dan Decision Tree. Penelitian ini menunjukkan keberhasilan ML model tree dalam melakukan pembelajaran jenis data perawatan container crane (numerik dan kategoris), dengan J48 menunjukkan performa terbaik dengan nilai akurasi dan nilai ROC-AUC mencapai 99,1%. Pertimbangan klasifikasi kami lakukan dengan mengacu kepada tanggal terakhir perawatan, hour meter, breakdown, shutdown, dan sparepart.

Download Full-text

Comparative Analysis of Intellectual Methods for Muscular Contraction Interpretation for Gesture Interface Implementation

Journal of Physics Conference Series ◽

10.1088/1742-6596/2096/1/012190 ◽

2021 ◽

Vol 2096 (1) ◽

pp. 012190

Author(s):

E V Bunyaeva ◽

I V Kuznetsov ◽

Y V Ponomarchuk ◽

P S Timosh

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Comparative Analysis ◽

Random Forest ◽

Decision Tree ◽

Single Channel ◽

Muscular Contraction ◽

Support Vector ◽

Machine Learning Methods

Abstract The paper considers comparative analysis results of the machine learning methods used for the gesture recognition based on the surface single-channel electromyography (sEMG) data. The data were processed using multilayer perceptron, support vector machine, decision tree ensemble (Random Forest) and logistic regression for the chosen four gesture types. The conclusion was derived on the analysis efficiency of these methods using commonly recommended accuracy metrics.

Download Full-text

Prediction of active debt in the State of Pernambuco, Brazil

Revista de Engenharia e Pesquisa Aplicada ◽

10.25286/repa.v5i1.1299 ◽

2020 ◽

Vol 5 (1) ◽

pp. 88-95

Author(s):

Álvaro Farias Pinheiro ◽

João Alberto Da Silva Amaral ◽

Geraldo Torres Galindo Neto ◽

José Nilo Martins Sampaio ◽

Wedson Lino Soares

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

The State ◽

Support Vector ◽

Data Mining Techniques ◽

Collection Process ◽

Mining Model

Application of data mining (DM) techniques to optimize the process of collection of Active Debt (AD) of the State of Pernambuco, Brazil. We apply the following data mining techniques: Decision Tree (DT), Logistic regression (LR), Nayve bayes (NB), Support vector machine (SVM), also applied to the Random Forest technique which is considered an essemble method. We observed that the RF technique obtained better results than all the techniques of classification, reaching higher values in all metrics analyzed. We note that the creation of a data mining model to choose which debts can succeed in the collection process can bring benefits to the pernambuco government. With the application of RF technique, we obtained indexes above 85% in the evaluation of the metrics.

Download Full-text

802 An electronic health record-based approach to identify and characterize patients with immune checkpoint inhibitor-associated arthritis

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2021-sitc2021.802 ◽

2021 ◽

Vol 9 (Suppl 3) ◽

pp. A838-A839

Author(s):

Steven Tran ◽

Luke Rasmussen ◽

Jennifer Pacheco ◽

Carlos Galvez ◽

Kyle Tegtmeyer ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Adverse Events ◽

Operating Characteristic ◽

Area Under The Curve ◽

Gradient Boosting ◽

Support Vector ◽

Health Record

BackgroundImmune checkpoint inhibitors (ICIs) are a pillar of cancer therapy with demonstrated efficacy in a variety of malignancies. However, they are associated with immune-related adverse events (irAEs) that affect many organ systems with varying severity, inhibiting patient quality of life and in some cases the ability to continue immunotherapy. Research into irAEs is nascent, and identifying patients with adverse events poses a critical challenge for future research efforts and patient care. This study's objective was to develop an electronic health record (EHR)-based model to identify and characterize patients with ICI-associated arthritis (checkpoint arthritis).MethodsForty-two patients with checkpoint arthritis were chart abstracted from a cohort of all patients who received checkpoint therapy for cancer (n=2,612) in a single-center retrospective study. All EHR clinical codes (N=32,198) were extracted including International Classification of Diseases (ICD)-9 and ICD-10, Logical Observation Identifiers Names and Codes (LOINC), RxNorm, and Current Procedural Terminology (CPT). Logistic regression, random forest, gradient boosting, support vector machine, K-nearest neighbors, and neural network machine learning models were trained to identify checkpoint arthritis patients using these clinical codes. Models were evaluated using receiver operating characteristic area under the curve (ROC-AUC), and the most important variables were determined from the logistic regression model. Models were retrained on smaller fractions of the important variables to determine the minimum variable set necessary to achieve accurate identification of checkpoint arthritis.ResultsLogistic regression and random forest were the highest performing models on the full variable set of 32,198 clinical codes (AUCs: 0.911, 0.894, respectively) (table 1). Retraining the models on smaller fractions of the most important variables demonstrated peak performance using the top 31 clinical codes, or 0.1% of the total variables (figure 1). The most important features included presence of ESR, CRP, rheumatoid factor lab, prednisone, joint pain, creatine kinase lab, thyroid labs, and immunization, all positively associated with checkpoint arthritis (figure 2).ConclusionsOur study demonstrates that a data-driven, EHR based approach can robustly identify checkpoint arthritis patients. The high performance of the models using only the 0.1% most important variables suggests that only a small number of clinical attributes are needed to identify these patients. The variables most important for identifying checkpoint arthritis included several unexpected clinical features, such as thyroid labs and immunization, indicating potential underlying irAE associations that warrant further exploration. Finally, the flexibility of this approach and its demonstrated effectiveness could be applied to identify and characterize other irAEs.Ethics ApprovalThis study was approved by the Northwestern University Institutional Review Board, ID STU00210502, with a granted waiver of consentAbstract 802 Table 1Model performance metricsAUC was calculated from the ROC curve. Sensitivity, specificity, PPV, and NPV were determined at the threshold maximizing the F1-score. AUC = area under the curve, ROC = receiver operating characteristic, PPV = positive predictive value, NPV = negative predictive valueAbstract 802 Figure 1Model AUC trained on decreasing fractions of the most important variables, determined by the random forest model. 100% = 32,198 clinical codes. LReg = logistic regression, RF = random forest, GB = gradient boosting, NN = neural network, KNN = K-nearest neighbor, SVM = support vector machine, SVMAnom = SVM anomaly detectionAbstract 802 Figure 2The 31 most important variables determined by the logistic regression (A, coefficients) and random forest (B, relative importance) models

Download Full-text

Predicting preeclampsia and related risk factors using data mining approaches: A cross-sectional study

International Journal of Reproductive BioMedicine ◽

10.18502/ijrm.v19i11.9911 ◽

2021 ◽

Author(s):

Zohreh Manoochehri ◽

Sara Manoochehri ◽

Farzaneh Soltani ◽

Majid Sadeghifar

Keyword(s):

Risk Factors ◽

Data Mining ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Cross Sectional Study ◽

Support Vector ◽

Cross Sectional ◽

C5.0 Decision Tree

Background: Preeclampsia is a type of pregnancy hypertension disorder that has adverse effects on both the mother and the fetus. Despite recent advances in the etiology of preeclampsia, no adequate clinical screening tests have been identified to diagnose the disorder. Objective: We aimed to provide a model based on data mining approaches that can be used as a screening tool to identify patients with this syndrome and also to identify the risk factors associated with it. Materials and Methods: The data used to perform this cross-sectional study were extracted from the clinical records of 726 mothers with preeclampsia and 726 mothers without preeclampsia who were referred to Fatemieh Hospital in Hamadan City during April 2005–March 2015. In this study, six data mining methods were adopted, including logistic regression, k-nearest neighborhood, C5.0 decision tree, discriminant analysis, random forest, and support vector machine, and their performance was compared using the criteria of accuracy, sensitivity, and specificity. Results: Underlying condition, age, pregnancy season and the number of pregnancies were the most important risk factors for diagnosing preeclampsia. The accuracy of the models were as follows: logistic regression (0.713), k-nearest neighborhood (0.742), C5.0 decision tree (0.788), discriminant analysis (0.687), random forest (0.758) and support vector machine (0.791). Conclusion: Among the data mining methods employed in this study, support vector machine was the most accurate in predicting preeclampsia. Therefore, this model can be considered as a screening tool to diagnose this disorder. Key words: Preeclampsia, Random forest, C5.0 decision tree, Support vector machine, Logistic regression.

Download Full-text