Predicting Malignancy with Pediatric Thyroid Nodules: Early Experience in Machine Learning for Clinical Decision Support

Abstract Background Papillary thyroid carcinoma is the most common endocrine malignancy. Since most nodules are benign, the challenge for the clinician is to identify those most likely to harbour malignancy while limiting exposure to surgical risks among those with benign nodules. Methods Random Forests (augmented to select features based on our clinical measure of interest), in conjunction with interpretable rule sets, were used on demographic, ultrasound and biopsy data of thyroid nodules from children <18 years at a tertiary pediatric hospital. Accuracy, False Positive Rate (FPR), False Negative Rate (FNR) and Area Under the Receiver Operator Curve (AUROC) are reported. Results Our models predict non-benign cytology and malignant histology better than historical outcomes. Specifically, we expect a 68.04% improvement in the FPR, 11.90% increase in accuracy and 24.85% increase in AUROC for biopsy predictions in 67 patients (28 with benign and 39 with non-benign histology). We expect an 23.22% decrease in FPR, 32.19% increase in accuracy, and 3.84% decrease in AUROC for surgery prediction in 53 patients (42 with benign and 11 with non-benign histology). This improvement comes at the expense of the FNR, where we expect 10.27% with malignancy would be discouraged from performing biopsy, and 11.67% from surgery. Given the small number of patients, these improvements are estimates and are not tested on an independent test set Conclusions This work presents a first attempt at developing an interpretable machine learning based clinical tool to aid clinicians. Future work will involve sourcing more data and developing probabilistic estimates for predictions.

Download Full-text

Diagnosis and Classification of the Diabetes Using Machine Learning Algorithms

10.21203/rs.3.rs-514771/v2 ◽

2021 ◽

Author(s):

Prasannavenkatesan Theerthagiri ◽

Usha Ruby A ◽

Vidya J

Keyword(s):

Machine Learning ◽

Multilayer Perceptron ◽

Nearest Neighbor ◽

False Positive Rate ◽

Learning Algorithms ◽

False Negative ◽

False Negative Rate ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

K Nearest Neighbor

Abstract Diabetes mellitus is characterized as a chronic disease may cause many complications. The machine learning algorithms are used to diagnosis and predict the diabetes. The learning based algorithms plays a vital role on supporting decision making in disease diagnosis and prediction. In this paper, traditional classification algorithms and neural network based machine learning are investigated for the diabetes dataset. Also, various performance methods with different aspects are evaluated for the K-nearest neighbor, Naive Bayes, extra trees, decision trees, radial basis function, and multilayer perceptron algorithms. It supports the estimation on patients suffering from diabetes in future. The results of this work shows that the multilayer perceptron algorithm gives the highest prediction accuracy with lowest MSE of 0.19. The MLP gives the lowest false positive rate and false negative rate with highest area under curve of 86 %.

Download Full-text

Tuning the False Positive Rate / False Negative Rate with Phishing Detection Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1002.1291s52019 ◽

2019 ◽

Vol 9 (1S5) ◽

pp. 7-13

Keyword(s):

Machine Learning ◽

Neural Networks ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Trade Off ◽

Detection Model ◽

Phishing Attacks ◽

Positive Rate ◽

Phishing Detection

Phishing attacks have risen by 209% in the last 10 years according to the Anti Phishing Working Group (APWG) statistics [19]. Machine learning is commonly used to detect phishing attacks. Researchers have traditionally judged phishing detection models with either accuracy or F1-scores, however in this paper we argue that a single metric alone will never correlate to a successful deployment of machine learning phishing detection model. This is because every machine learning model will have an inherent trade-off between it’s False Positive Rate (FPR) and False Negative Rate (FNR). Tuning the trade-off is important since a higher or lower FPR/FNR will impact the user acceptance rate of any deployment of a phishing detection model. When models have high FPR, they tend to block users from accessing legitimate webpages, whereas a model with a high FNR will allow the users to inadvertently access phishing webpages. Either one of these extremes may cause a user base to either complain (due to blocked pages) or fall victim to phishing attacks. Depending on the security needs of a deployment (secure vs relaxed setting) phishing detection models should be tuned accordingly. In this paper, we demonstrate two effective techniques to tune the trade-off between FPR and FNR: varying the class distribution of the training data and adjusting the probabilistic prediction threshold. We demonstrate both techniques using a data set of 50,000 phishing and 50,000 legitimate sites to perform all experiments using three common machine learning algorithms for example, Random Forest, Logistic Regression, and Neural Networks. Using our techniques we are able to regulate a model’s FPR/FNR. We observed that among the three algorithms we used, Neural Networks performed best; resulting in an higher F1-score of 0.98 with corresponding FPR/FNR values of 0.0003 and 0.0198 respectively.

Download Full-text

Registration Status Prediction of Students using Machine Learning in the Context of Private University of Bangladesh

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5292.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2594-2600 ◽

Cited By ~ 1

Keyword(s):

Higher Education ◽

Machine Learning ◽

Random Forest ◽

Performance Metrics ◽

False Positive Rate ◽

False Negative ◽

Private Universities ◽

False Negative Rate ◽

Private University ◽

Data Set

Bangladesh is a densely populated country where a large portion of citizens is living under poverty. In Bangladesh, a significant portion of higher education is accomplished at private universities. In this twenty-first century, these students of higher education are highly mobile and different from earlier generations. Thus, retaining existing students has become a great challenge for many private universities in Bangladesh. Early prediction of the total number of registered students in a semester can help in this regard. This can have a direct impact on a private university in terms of budget, marketing strategy, and sustainability. In this paper, we have predicted the number of registered students in a semester in the context of a private university by following several machine learning approaches. We have applied seven prominent classifiers, namely SVM, Naive Bayes, Logistic, JRip, J48, Multilayer Perceptron, and Random Forest on a data set of more than a thousand students of a private university in Bangladesh, where each record contains five attributes. First, all data are preprocessed. Then preprocessed data are separated into the training and testing set. Then, all these classifiers are trained and tested. Since a suitable classifier is required to solve the problem, the performances of all seven classifiers need to be thoroughly assessed. So, we have computed six performance metrics, i.e. accuracy, sensitivity, specificity, precision, false positive rate (FPR) and false negative rate (FNR) for each of the seven classifiers and compare them. We have found that SVM outperforms all other classifiers achieving 85.76% accuracy, whereas Random Forest achieved the lowest accuracy which is 79.65%.

Download Full-text

Machine Learning Based Classification Models for Financial Crisis Prediction

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8362.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 4887-4893

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Financial Crisis ◽

Large Scale ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Small Scale ◽

Early Prediction ◽

Rbf Network

Financial Crisis Prediction (FCP) being the most complicated and expected problem to be solved from the context of corporate organization, small scale to large scale industries, investors, bank organizations and government agencies, it is important to design a framework to determine a methodology that will reveal a solution for early prediction of the Financial Crisis Prediction (FCP). Earlier methods are reviewed through the various works in statistical techniques applied to solve the problem. However, it is not sufficient to predict the results with much more intelligence and automated manner. The major objective of this paper is to enhance the early prediction of Financial Crisis in any organization based on machine learning models like Multilayer Perceptron, Radial basis Function (RBF) Network, Logistic regression and Deep Learning methods and conduct a comparative analysis of them to determine the best methods for Financial Crisis Prediction (FDP). The testing is conducted with globalized benchmark datasets namely German dataset, Weislaw dataset and Polish Dataset. The testing is performed in both WEKA and Rapid Miner Framework design and obtained with accuracies and other performance measures like False Positive Rate (FPR), False Negative Rate (FNR), Precision, Recall, F-score and Kappa that would determine the best result from specific algorithm that will intelligently identify the financial crisis before it actually occurs in an organization. The results achieved the algorithms DL, MLP, LR and RBF Network with accuracies 96%, 72.10%, 75.20% and 74% on German Dataset, 91.25%, 85.83%, 83.75% and 73.75% on Weislaw dataset, 99.70%, 96.30%, 96.21% and 96.14 on Polish dataset respectively. It is evident from all the predictive results and the analytics in Rapid Miner that Deep Learning (DL) is the best classifier and performer among other machine learners and classifiers. This method will enhance the future predictions and would provide efficient solutions for financial crisis predictions.

Download Full-text

Diagnosis and Classification of the Diabetes Using Machine Learning Algorithms

10.21203/rs.3.rs-514771/v1 ◽

2021 ◽

Author(s):

Prasannavenkatesan Theerthagiri ◽

Usha Ruby A ◽

Vidya J

Keyword(s):

Machine Learning ◽

Multilayer Perceptron ◽

Nearest Neighbor ◽

False Positive Rate ◽

Learning Algorithms ◽

False Negative ◽

False Negative Rate ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

K Nearest Neighbor

Download Full-text

The Sensitivity and Specificity of Fine-Needle Aspiration in Thyroid Neoplasia

Journal of Interdisciplinary Medicine ◽

10.1515/jim-2017-0047 ◽

2017 ◽

Vol 2 (2) ◽

pp. 127-131

Author(s):

Rareș Georgescu ◽

Adela Luciana Oprea ◽

Alexandra Contra ◽

Orsolya Bauer Hanko ◽

Ioana Colcer ◽

...

Keyword(s):

Fine Needle Aspiration ◽

Predictive Value ◽

Thyroid Nodules ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Needle Aspiration ◽

Fine Needle ◽

Causes Of Errors ◽

Positive Rate

AbstractObjective:To evaluate and demonstrate the accuracy of fine-needle aspiration (FNA) in thyroid lesions in our department and to highlight probable causes of errors leading to unsatisfactory sampling, which may depend on the characteristics of the nodule.Methods:This is a retrospective study conducted on 319 diagnosed cases of thyroid nodules referred to the Surgery Unit of Puls hospital, Tîrgu Mureș in the January 2014 – December 2015 period, who underwent fine-needle aspiration. Histological examination was considered to be the gold standard; therefore we compared the cytological diagnosis with the histological one.Results:Of the 319 cases, 289 (90.6%) were female and 30 (9.4%) male patients; 210 cases (69.3%) were interpreted as benign, 46 cases (15.2%) as follicular lesion of undetermined significance, 4 cases (1.3%) as suspect for malignancy, 1 case (0.3%) as malignant sampling, and 42 cases (13.9%) as unsatisfactory. We compared the results of fine-needle aspiration cytology (FNAC) with the corresponding histopathological results (49 in total). FNAC achieved a sensitivity of 76.47%, a specificity of 83.1%, a positive predictive value of 35.1%, a negative predictive value of 96.7%, a false positive rate of 16.9%, a false negative rate of 23%, and an overall accuracy of 82.3%.Conclusions:The results of our study demonstrate the accuracy of the FNA technique in the first-line diagnosis of thyroid nodules.

Download Full-text

Hybrid Machine Learning: A Tool to Detect Phishing Attacks in Communication Networks

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.2021153.240565 ◽

2021 ◽

Vol 15 (3) ◽

pp. 374-389

Author(s):

Ademola Philip Abidoye ◽

Boniface Kabaso

Keyword(s):

Machine Learning ◽

Communication Networks ◽

Credit Card ◽

Personal Information ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Machine Learning Techniques ◽

Sensitive Information ◽

Cyber Attack

Phishing is a cyber-attack that uses disguised email as a weapon and has been on the rise in recent times. Innocent Internet user if peradventure clicking on a fraudulent link may cause him to fall victim of divulging his personal information such as credit card pin, login credentials, banking information and other sensitive information. There are many ways in which the attackers can trick victims to reveal their personal information. In this article, we select important phishing URLs features that can be used by attacker to trick Internet users into taking the attacker’s desired action. We use two machine learning techniques to accurately classify our data sets. We compare the performance of other related techniques with our scheme. The results of the experiments show that the approach is highly effective in detecting phishing URLs and attained an accuracy of 97.8% with 1.06% false positive rate, 0.5% false negative rate, and an error rate of 0.3%. The proposed scheme performs better compared to other selected related work. This shows that our approach can be used for real-time application in detecting phishing URLs.

Download Full-text

Diagnosis and Classification of the Diabetes Using Machine Learning Algorithms

10.21203/rs.3.rs-514771/v3 ◽

2021 ◽

Author(s):

Prasannavenkatesan Theerthagiri ◽

Usha Ruby A ◽

Vidya J

Keyword(s):

Machine Learning ◽

Multilayer Perceptron ◽

Nearest Neighbor ◽

False Positive Rate ◽

Learning Algorithms ◽

False Negative ◽

False Negative Rate ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

K Nearest Neighbor

Download Full-text

A Method of Apple Image Segmentation Based on Color-Texture Fusion Feature and Machine Learning

Agronomy ◽

10.3390/agronomy10070972 ◽

2020 ◽

Vol 10 (7) ◽

pp. 972 ◽

Cited By ~ 2

Author(s):

Chunlong Zhang ◽

Kunlin Zou ◽

Yue Pan

Keyword(s):

Machine Learning ◽

Image Segmentation ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Texture Features ◽

Apple Fruit ◽

Apple Orchard ◽

Machine Learning Algorithms ◽

Color Features

Apples are one of the most kind of important fruit in the world. China has been the largest apple producing country. Yield estimating, robot harvesting, precise spraying are important processes for precise planting apples. Image segmentation is an important step in machine vision systems for precision apple planting. In this paper, an apple fruit segmentation algorithm applied in the orchard was studied. The effect of many color features in classifying apple fruit pixels from other pixels was evaluated. Three color features were selected. This color features could effectively distinguish the apple fruit pixels from other pixels. The GLCM (Grey-Level Co-occurrence Matrix) was used to extract texture features. The best distance and orientation parameters for GLCM were found. Nine machine learning algorithms had been used to develop pixel classifiers. The classifier was trained with 100 pixels and tested with 100 pixels. The accuracy of the classifier based on Random Forest reached 0.94. One hundred images of an apple orchard were artificially labeled with apple fruit pixels and other pixels. At the same time, a classifier was used to segment these images. Regression analysis was performed on the results of artificial labeling and classifier classification. The average values of Af (segmentation error), FPR (false positive rate) and FNR (false negative rate) were 0.07, 0.13 and 0.15, respectively. This result showed that this algorithm could segment apple fruit in orchard images effectively. It could provide a reference for precise apple planting management.

Download Full-text

Comparative study of ultrasonographic findings with the operative findings of biliary surgery

Journal of Surgical Sciences ◽

10.3329/jss.v22i1.44011 ◽

2020 ◽

Vol 22 (1) ◽

pp. 25-29

Author(s):

Zubayer Ahmad ◽

Mohammad Ali ◽

Kazi lsrat Jahan ◽

ABM Khurshid Alam ◽

G M Morshed

Keyword(s):

Bile Duct ◽

Common Bile Duct ◽

Imaging Modality ◽

False Positive Rate ◽

Gallstone Disease ◽

False Negative ◽

False Negative Rate ◽

Biliary Disease ◽

Biliary Surgery ◽

Operative Findings

Background: Biliary disease is one of the most common surgical problems encountered all over the world. Ultrasound is widely accepted for the diagnosis of biliary system disease. However, it is a highly operator dependent imaging modality and its diagnostic success is also influenced by the situation, such as non-fasting, obesity, intestinal gas. Objective: To compare the ultrasonographic findings with the peroperative findings in biliary surgery. Methods: This prospective study was conducted in General Hospital, comilla between the periods of July 2006 to June 2008 among 300 patients with biliary diseases for which operative treatment is planned. Comparison between sonographic findings with operative findings was performed. Results: Right hypochondriac pain and jaundice were two significant symptoms (93% and 15%). Right hypochondriac tenderness, jaundice and palpable gallbladder were most valuable physical findings (respectively, 40%, 15% and 5%). Out of 252 ultrasonically positive gallbladder, stone were confirmed in 249 cases preoperatively. Sensitivity of USG in diagnosis of gallstone disease was 100%. There was, however, 25% false positive rate detection. Specificity was, however, 75% in this case. USG could demonstrate stone in common bile duct in only 12 out of 30 cases. Sensitivity of the test in diagnosing common bile duct stone was 40%, false negative rate 60%. In the series, ultrasonography sensitivity was 100% in diagnosing stone in cystic duct. USG could detect with relatively good but less sensitivity the presence of chronic cholecystitis (92.3%) and worm inside gallbladder (50%). Conclusion: Ultrasonography is the most important investigation in the diagnosis of biliary disease and a useful test for patients undergoing operative management for planning and anticipating technical difficulties. Journal of Surgical Sciences (2018) Vol. 22 (1): 25-29

Download Full-text