Theoretical Analysis of Different Classifiers under Reduction Rough Data Set

2016 ◽  
Vol 3 (3) ◽  
pp. 1-20 ◽  
Author(s):  
Shamim H Ripon ◽  
Sarwar Kamal ◽  
Saddam Hossain ◽  
Nilanjan Dey

Rough set plays vital role to overcome the complexities, vagueness, uncertainty, imprecision, and incomplete data during features analysis. Classification is tested on certain dataset that maintain an exact class and review process where key attributes decide the class positions. To assess efficient and automated learning, algorithms are used over training datasets. Generally, classification is supervised learning whereas clustering is unsupervised. Classifications under mathematical models deal with mining rules and machine learning. The Objective of this work is to establish a strong theoretical and manual analysis among three popular classifier namely K-nearest neighbor (K-NN), Naive Bayes and Apriori algorithm. Hybridization with rough sets among these three classifiers enables enable to address larger datasets. Performances of three classifiers have tested in absence and presence of rough sets. This work is in the phase of implementation for DNA (Deoxyribonucleic Acid) datasets and it will design automated system to assess classifier under machine learning environment.

Author(s):  
M. Jeyanthi ◽  
C. Velayutham

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.


Diagnostics ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. 104 ◽  
Author(s):  
Ahmed ◽  
Yigit ◽  
Isik ◽  
Alpkocak

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.


2019 ◽  
Vol 8 (4) ◽  
pp. 9155-9158

Classification is a machine learning task which consists in predicting the set association of unclassified examples, whose label is not known, by the properties of examples in a representation learned earlier as of training examples, that label was known. Classification tasks contain a huge assortment of domains and real world purpose: disciplines such as medical diagnosis, bioinformatics, financial engineering and image recognition between others, where domain experts can use the model erudite to sustain their decisions. All the Classification Approaches proposed in this paper were evaluate in an appropriate experimental framework in R Programming Language and the major emphasis is on k-nearest neighbor method which supports vector machines and decision trees over large number of data sets with varied dimensionality and by comparing their performance against other state-of-the-art methods. In this process the experimental results obtained have been verified by statistical tests which support the better performance of the methods. In this paper we have survey various classification techniques of Data Mining and then compared them by using diverse datasets from “University of California: Irvine (UCI) Machine Learning Repository” for acquiring the accurate calculations on Iris Data set.


Author(s):  
Mahziyar Darvishi ◽  
Omid Ziaee ◽  
Arash Rahmati ◽  
Mohammad Silani

Numerous structure geometries are available for cellular structures, and selecting the suitable structure that reflects the intended characteristics is cumbersome. While testing many specimens for determining the mechanical properties of these materials could be time-consuming and expensive, finite element analysis (FEA) is considered an efficient alternative. In this study, we present a method to find the suitable geometry for the intended mechanical characteristics by implementing machine learning (ML) algorithms on FEA results of cellular structures. Different cellular structures of a given material are analyzed by FEA, and the results are validated with their corresponding analytical equations. The validated results are employed to create a data set used in the ML algorithms. Finally, by comparing the results with the correct answers, the most accurate algorithm is identified for the intended application. In our case study, the cellular structures are three widely used cellular structures as bone implants: Cube, Kelvin, and Rhombic dodecahedron, made of Ti–6Al–4V. The ML algorithms are simple Bayesian classification, K-nearest neighbor, XGBoost, random forest, and artificial neural network. By comparing the results of these algorithms, the best-performing algorithm is identified.


Author(s):  
Stephan M. Winkler ◽  
Gabriel Kronberger ◽  
Michael Affenzeller ◽  
Herbert Stekel

In this paper the authors describe the identification of variable interaction networks based on the analysis of medical data. The main goal is to generate mathematical models for medical parameters using other available parameters in this data set. For each variable the authors identify those features that are most relevant for modeling it; the relevance of a variable can in this context be defined via the frequency of its occurrence in models identified by evolutionary machine learning methods or via the decrease in modeling quality after removing it from the data set. Several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected continuous as well as discrete medical variables and cancer diagnoses: Genetic programming, linear regression, k-nearest-neighbor regression, support vector machines (optimized using evolutionary algorithms), and random forests. In the empirical section of this paper the authors describe interaction networks identified for a medical data base storing data of more than 600 patients. The authors see that whatever modeling approach is used, it is possible to identify the most important influence factors and display those in interaction networks which can be interpreted without domain knowledge in machine learning or informatics in general.


2020 ◽  
Vol 2020 ◽  
pp. 1-6 ◽  
Author(s):  
Luo GuangJun ◽  
Shah Nazir ◽  
Habib Ullah Khan ◽  
Amin Ul Haq

The spam detection is a big issue in mobile message communication due to which mobile message communication is insecure. In order to tackle this problem, an accurate and precise method is needed to detect the spam in mobile message communication. We proposed the applications of the machine learning-based spam detection method for accurate detection. In this technique, machine learning classifiers such as Logistic regression (LR), K-nearest neighbor (K-NN), and decision tree (DT) are used for classification of ham and spam messages in mobile device communication. The SMS spam collection data set is used for testing the method. The dataset is split into two categories for training and testing the research. The results of the experiments demonstrated that the classification performance of LR is high as compared with K-NN and DT, and the LR achieved a high accuracy of 99%. Additionally, the proposed method performance is good as compared with the existing state-of-the-art methods.


10.2196/28856 ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. e28856
Author(s):  
Zahid Ullah ◽  
Farrukh Saleem ◽  
Mona Jamjoom ◽  
Bahjat Fakieh

Background The use of artificial intelligence has revolutionized every area of life such as business and trade, social and electronic media, education and learning, manufacturing industries, medicine and sciences, and every other sector. The new reforms and advanced technologies of artificial intelligence have enabled data analysts to transmute raw data generated by these sectors into meaningful insights for an effective decision-making process. Health care is one of the integral sectors where a large amount of data is generated daily, and making effective decisions based on these data is therefore a challenge. In this study, cases related to childbirth either by the traditional method of vaginal delivery or cesarean delivery were investigated. Cesarean delivery is performed to save both the mother and the fetus when complications related to vaginal birth arise. Objective The aim of this study was to develop reliable prediction models for a maternity care decision support system to predict the mode of delivery before childbirth. Methods This study was conducted in 2 parts for identifying the mode of childbirth: first, the existing data set was enriched and second, previous medical records about the mode of delivery were investigated using machine learning algorithms and by extracting meaningful insights from unseen cases. Several prediction models were trained to achieve this objective, such as decision tree, random forest, AdaBoostM1, bagging, and k-nearest neighbor, based on original and enriched data sets. Results The prediction models based on enriched data performed well in terms of accuracy, sensitivity, specificity, F-measure, and receiver operating characteristic curves in the outcomes. Specifically, the accuracy of k-nearest neighbor was 84.38%, that of bagging was 83.75%, that of random forest was 83.13%, that of decision tree was 81.25%, and that of AdaBoostM1 was 80.63%. Enrichment of the data set had a good impact on improving the accuracy of the prediction process, which supports maternity care practitioners in making decisions in critical cases. Conclusions Our study shows that enriching the data set improves the accuracy of the prediction process, thereby supporting maternity care practitioners in making informed decisions in critical cases. The enriched data set used in this study yields good results, but this data set can become even better if the records are increased with real clinical data.


Author(s):  
Nofriani Nofriani

Various approaches have been attempted by the Government of Indonesia to eradicate poverty throughout the country, one of which is equitable distribution of social assistance for target households according to their classification of social welfare status. This research aims to re-evaluate the prior evaluation of five well-known machine learning techniques; Naïve Bayes, Random Forest, Support Vector Machines, K-Nearest Neighbor, and C4.5 Algorithm; on how well they predict the classifications of social welfare statuses. Afterwards, the best-performing one is implemented into an executable machine learning application that may predict the user’s social welfare status. Other objectives are to analyze the reliability of the chosen algorithm in predicting new data set, and generate a simple classification-prediction application. This research uses Python Programming Language, Scikit-Learn Library, Jupyter Notebook, and PyInstaller to perform all the methodology processes. The results shows that Random Forest Algorithm is the best machine learning technique for predicting household’s social welfare status with classification accuracy of 74.20% and the resulted application based on it could correctly predict 60.00% of user’s social welfare status out of 40 entries.


Classification is a form of data mining (regarding machine learning) approach that is helpful in the prediction of group membership for data instances, where the data input is used by the computer program for learning and thereafter this learning is used for classifying the fresh observation made. This data set might just be bi-class or it can be multi-class also. Few instances of the problems in classification include: speech identification, handwriting identification, bio metric detection, document classification etc. Many classification methods exist, which can be utilized for classification. In this research work, the fundamental classification approaches and few important kinds of classification approaches that include decision tree induction, Bayesian networks,k-nearest neighbor classifier and Support Vector Machines (SVM) and fuzzy learning classifiers with their merits, drawbacks, probable applications and challenges faced with the solution available. There are different problems that have an effect on the classification and prediction. The objective of this research work is to render an extensive review of various classification approaches in machine learning. At last, the future work intended on the best classification techniques for the input data are discussed.


2021 ◽  
Vol 880 ◽  
pp. 89-94
Author(s):  
Hasan Kurban ◽  
Mustafa Kurban ◽  
Parichit Sharma ◽  
Mehmet M. Dalkilic

Machine learning (ML) has recently made a major contribution to the fields of Material Science (MS). In this study, ML algorithms are used to learn atoms types over structural geometrical data of anatase TiO2 nanoparticles produced at different temperature levels with the density-functional tight-binding method (DFTB). Especially for this work, Random Forest (RF), Decision Trees (DT), K-Nearest Neighbor (KNN), Naïve Bayes (NB), which are among the most popular ML algorithms, were run to learn titanium (Ti) and oxygen (O) atoms. RF outperforms other algorithms, almost succeeding in learning this skewed data set close to perfect. The use of ML algorithms with datasets compatible with its mathematical design increases their learning performance. Therefore, we find it remarkable that a certain type of ML algorithm performs almost perfectly. Because it can help material scientists predict the behavior and structural and electronic properties of atoms at different temperatures.


Sign in / Sign up

Export Citation Format

Share Document