Classification Algorithms for Determining Handwritten Digit

Data-intensive science is a critical science paradigm that interferes with all other sciences. Data mining (DM) is a powerful and useful technology with wide potential users focusing on important meaningful patterns and discovers a new knowledge from a collected dataset. Any predictive task in DM uses some attribute to classify an unknown class. Classification algorithms are a class of prominent mathematical techniques in DM. Constructing a model is the core aspect of such algorithms. However, their performance highly depends on the algorithm behavior upon manipulating data. Focusing on binarazaition as an approach for preprocessing, this paper analysis and evaluates different classification algorithms when construct a model based on accuracy in the classification task. The Mixed National Institute of Standards and Technology (MNIST) handwritten digits dataset provided by Yann LeCun has been used in evaluation. The paper focuses on machine learning approaches for handwritten digits detection. Machine learning establishes classification methods, such as K-Nearest Neighbor(KNN), Decision Tree (DT), and Neural Networks (NN). Results showed that the knowledge-based method, i.e. NN algorithm, is more accurate in determining the digits as it reduces the error rate. The implication of this evaluation is providing essential insights for computer scientists and practitioners for choosing the suitable DM technique that fit with their data.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

High-Speed and Accurate Meat Composition Imaging by Mechanically-Flexible Electrical Impedance Tomography With k-Nearest Neighbor and Fuzzy k-Means Machine Learning Approaches

IEEE Access ◽

10.1109/access.2021.3064315 ◽

2021 ◽

Vol 9 ◽

pp. 38792-38801

Author(s):

P. N. Darma ◽

M. Takei

Keyword(s):

Machine Learning ◽

Electrical Impedance Tomography ◽

High Speed ◽

Electrical Impedance ◽

Nearest Neighbor ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Impedance Tomography ◽

Meat Composition

Download Full-text

PREDICTION OF CORONARY ARTERY DISEASE BASED ON ENSEMBLE LEARNING APPROACHES AND CO-EXPRESSED OBSERVATIONS

Journal of Mechanics in Medicine and Biology ◽

10.1142/s0219519416400108 ◽

2016 ◽

Vol 16 (01) ◽

pp. 1640010 ◽

Cited By ~ 3

Author(s):

YING-TSANG LO ◽

HAMIDO FUJITA ◽

TUN-WEN PAI

Keyword(s):

Machine Learning ◽

Coronary Artery Disease ◽

Coronary Artery ◽

Nearest Neighbor ◽

Prediction Method ◽

Medical Decision ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Artery Disease ◽

Voting Mechanism

Background: Coronary artery disease (CAD) is one of the most representative cardiovascular diseases. Early and accurate prediction of CAD based on physiological measurements can reduce the risk of heart attack through medicine therapy, healthy diet, and regular physical activity. Methods:Four heart disease datasets from the UC Irvine Machine Learning Repository were combined and re-examined to remove incomplete entries, and a total of 822 cases were utilized in this study. Seven machine learning methods, including Naïve Bayes, artificial neural networks (ANNs), sequential minimal optimization (SMO), k-nearest neighbor (KNN), AdaBoost, J48, and random forest, were adopted to analyze the collected datasets for CAD prediction. By combining co-expressed observations and an ensemble voting mechanism, we designed and evaluated a new medical decision classifier for CAD prediction. The TOPSIS (Technique for Order Preference by Similarity to an Ideal Solution) algorithm was applied to determine the best prediction method for CAD diagnosis. Results: Features of systolic blood pressure, cholesterol, heart rate, and ST depression are considered to be the most significant differences between patients with and without CADs. We show that the prediction capability of seven machine learning classifiers can be enhanced by integrating combinations of observed co-expressed features. Finally, compared to the use of any single classifier, the proposed voting mechanism achieved optimal performance according to TOPSIS.

Download Full-text

Machine Learning Approach to Differentiation of Peripheral Schwannomas and Neurofibromas: A Multi-Center Study

Neuro-Oncology ◽

10.1093/neuonc/noab211 ◽

2021 ◽

Author(s):

Michael Zhang ◽

Elizabeth Tong ◽

Sam Wong ◽

Forrest Hamrick ◽

Maryam Mohammadzadeh ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Clinical Features ◽

Nearest Neighbor ◽

Motor Deficit ◽

Support Vector ◽

Spontaneous Pain ◽

Learning Approaches ◽

Imaging Data ◽

K Nearest Neighbor

Abstract Background Non-invasive differentiation between schwannomas and neurofibromas is important for appropriate management, preoperative counseling, and surgical planning, but has proven difficult using conventional imaging. The objective of this study was to develop and evaluate machine learning approaches for differentiating peripheral schwannomas from neurofibromas. Methods We assembled a cohort of schwannomas and neurofibromas from 3 independent institutions and extracted high-dimensional radiomic features from gadolinium-enhanced, T1-weighted MRI using the PyRadiomics package on Quantitative Imaging Feature Pipeline. Age, sex, neurogenetic syndrome, spontaneous pain, and motor deficit were recorded. We evaluated the performance of 6 radiomics-based classifier models with and without clinical features and compared model performance against human expert evaluators. Results 107 schwannomas and 59 neurofibroma were included. The primary models included both clinical and imaging data. The accuracy of the human evaluators (0.765) did not significantly exceed the no-information rate (NIR), whereas the Support Vector Machine (0.929), Logistic Regression (0.929), and Random Forest (0.905) classifiers exceeded the NIR. Using the method of DeLong, the AUC for the Logistic Regression (AUC=0.923) and K Nearest Neighbor (AUC=0.923) classifiers was significantly greater than the human evaluators (AUC=0.766; p = 0.041). Conclusions The radiomics-based classifiers developed here proved to be more accurate and had a higher AUC on the ROC curve than expert human evaluators. This demonstrates that radiomics using routine MRI sequences and clinical features can aid in differentiation of peripheral schwannomas and neurofibromas.

Download Full-text

A data-driven methodology for the classification of different liquids in artificial taste recognition applications with a pulse voltammetric electronic tongue

International Journal of Distributed Sensor Networks ◽

10.1177/1550147719881601 ◽

2019 ◽

Vol 15 (10) ◽

pp. 155014771988160 ◽

Cited By ~ 6

Author(s):

Jersson X Leon-Medina ◽

Leydi J Cardenas-Flechas ◽

Diego A Tibaduiza

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Data Analysis ◽

Nearest Neighbor ◽

Electronic Tongue ◽

Sensor Arrays ◽

Principal Component ◽

Machine Learning Techniques ◽

Learning Approaches ◽

K Nearest Neighbor

Electronic tongue-type sensor arrays are devices used to determine the quality of substances and seek to imitate the main components of the human sense of taste. For this purpose, an electronic tongue-based system makes use of sensors, data acquisition systems, and a pattern recognition system. Particularly, in the latter, machine learning techniques are useful in data analysis and have been used to solve classification and regression problems. However, one of the problems in the use of this kind of device is associated with the development of reliable pattern recognition algorithms and robust data analysis. In this sense, this work introduces a taste recognition methodology, which is composed of several steps including unfolding data, data normalization, principal component analysis for compressing the data, and classification through different machine learning models. The proposed methodology is tested using data from an electronic tongue with 13 different liquid substances; this electronic tongue uses multifrequency large amplitude pulse signal voltammetry. Results show that the methodology is able to perform the classification accurately and the best results are obtained when it includes the use of K-nearest neighbor machine in terms of accuracy compared with other kinds of machine learning approaches. Besides, the comparison to evaluate the methodology is made with different classification performance measures that show the behavior of the process in a single number.

Download Full-text

A Survey On Missing Data in Machine Learning

10.21203/rs.3.rs-535520/v1 ◽

2021 ◽

Author(s):

Tlamelo Emmanuel ◽

Thabiso Maupong ◽

Dimane Mpoeleng ◽

Thabo Semong ◽

Mphago Banyatsang ◽

...

Keyword(s):

Machine Learning ◽

Missing Data ◽

Human Error ◽

Missing Values ◽

Nearest Neighbor ◽

Research Direction ◽

Machine Learning Techniques ◽

Future Research ◽

Learning Approaches ◽

K Nearest Neighbor

Abstract Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur as a result of various factors like missing completely at random, missing at random or missing not at random. All these may be as a result of system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of the proposed techniques, how they perform, their limitations and the kind of data they are most suitable for. Finally, we experiment on the K nearest neighbor and random forest imputation techniques on novel power plant induced fan data and offer some possible future research direction.

Download Full-text

Diabetes Prediction Using Machine Learning Techniques

Journal of Intelligent Systems with Applications ◽

10.54856/10.54856/jiswa.202112183 ◽

2021 ◽

pp. 150-152

Author(s):

Seyma Kiziltas Koc ◽

Mustafa Yeniad

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

High Performance ◽

Nearest Neighbor ◽

Classification Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification

Technologies which are used in the healthcare industry are changing rapidly because the technology is evolving to improve people's lifestyles constantly. For instance, different technological devices are used for the diagnosis and treatment of diseases. It has been revealed that diagnosis of disease can be made by computer systems with developing technology.Machine learning algorithms are frequently used tools because of their high performance in the field of health as well as many field. The aim of this study is to investigate different machine learning classification algorithms that can be used in the diagnosis of diabetes and to make comparative analyzes according to the metrics in the literature. In the study, seven classification algorithms were used in the literature. These algorithms are Logistic Regression, K-Nearest Neighbor, Multilayer Perceptron, Random Forest, Decision Trees, Support Vector Machine and Naive Bayes. Firstly, classification performance of algorithms are compared. These comparisons are based on accuracy, sensitivity, precision, and F1-score. The results obtained showed that support vector machine algorithm had the highest accuracy with 78.65%.

Download Full-text

Investigating Machine Learning Approaches for Bitcoin Ransomware Payment Detection Systems

Volume 5 - 2020, Issue 9 - September - International Journal of Innovative Science and Research Technology ◽

10.38124/ijisrt20sep784 ◽

2020 ◽

Vol 5 (9) ◽

pp. 1216-1222

Author(s):

Kirat Jadhav

Keyword(s):

Machine Learning ◽

Detection Rate ◽

Nearest Neighbor ◽

Gradient Boosting ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Detection Systems ◽

Digital World ◽

F Measure ◽

Machine Learning Models

Cryptocurrencies have revolutionized the process of trading in the digital world. Roughly one decade since the induction of the first bitcoin block, thousands of cryptocurrencies have been introduced. The anonymity offered by the cryptocurrencies also attracted the perpetuators of cybercrime. This paper attempts to examine the different machine learning approaches for efficiently identifying ransomware payments made to the operators using bitcoin transactions. Machine learning models may be developed based on patterns differentiating such cybercrime operations from normal bitcoin transactions in order to identify and report attacks. The machine learning approaches are evaluated on bitcoin ransomware dataset. Experimental results show that Gradient Boosting and XGBoost algorithms achieved better detection rate with respect to precision, recall and F-measure rates when compared with k-Nearest Neighbor, Random Forest, Naïve Bayes and Multilayer Perceptron approaches

Download Full-text

Performance Evaluation of Different Machine Learning Classification Algorithms for Disease Diagnosis

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101.oa5 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-28

Author(s):

Munder Abdulatef Al-Hashem ◽

Ali Mohammad Alqudah ◽

Qasem Qananwah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Confusion Matrix ◽

Learning Algorithms ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.

Download Full-text

A survey on missing data in machine learning

Journal Of Big Data ◽

10.1186/s40537-021-00516-9 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Tlamelo Emmanuel ◽

Thabiso Maupong ◽

Dimane Mpoeleng ◽

Thabo Semong ◽

Banyatsang Mphago ◽

...

Keyword(s):

Machine Learning ◽

Missing Data ◽

Human Error ◽

Missing Values ◽

Nearest Neighbor ◽

Research Direction ◽

Machine Learning Techniques ◽

Future Research ◽

Learning Approaches ◽

K Nearest Neighbor

AbstractMachine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.

Download Full-text