Prediction of Skin Diseases Using Machine Learning

Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.

Download Full-text

Heart disease prediction using Advanced Machine Learning Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35495 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 2160-2163

Author(s):

Minal Shahakar

Keyword(s):

Machine Learning ◽

Data Mining ◽

Heart Disease ◽

Web Application ◽

Intelligent System ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Disease Prediction ◽

End User ◽

Data Mining Techniques

It might have happened so many times that you or someone yours need doctors help immediately, but they are not available due to some reason. The Heart Disease Prediction application is an end user support to the online. Here, we propose a web application that allows users to get instant guidance on their heart disease through an intelligent system online. The application is fed with various details and the heart disease associated with those details. The applications allows user to share their heart related issues. It then processes user specific details to check for various illnesses that could be associated with it. Here we use some intelligent data mining techniques to the most accurate that could be associated with patient‟s details. Based on result, system automatically shows the result specific doctors for further treatment and the system allows user to view doctor‟s details.

Download Full-text

Making Use of Functional Dependencies Based on Data to Find Better Classification Trees

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2021.15.160 ◽

2021 ◽

Vol 15 ◽

pp. 1475-1485

Author(s):

Hyontai Sug

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Trees ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Functional Dependencies ◽

Chi Square ◽

Chi Square Test ◽

Novel Method ◽

Categorical Attributes

For the classification task of machine learning algorithms independency between conditional attributes is a precondition for success of data mining. On the other hand, decision trees are one of the mostly used machine learning algorithms because of their good understandability. So, because dependency between conditional attributes can cause more complex trees, supplying conditional attributes independent each other is very important, the requirement of conditional attributes for decision trees as well as other machine learning algorithms is that they are independent each other and dependent on decisional attributes only. Statistical method to check independence between attributes is Chi-square test, but the test can be effective for categorical attributes only. So, the applicability of Chi-square test is limited, because most datasets for data mining have mixed attributes of categorical and numerical. In order to overcome the problem, and as a way to test dependency between conditional attributes, a novel method based on functional dependency based on data that can be applied to any datasets irrespective of data type of attributes is suggested. After removing highly dependent attributes between conditional attributes, we can generate better decision trees. Experiments were performed to show that the method is effective, and the experiments showed very good results.

Download Full-text

CLASSIFICATION OF HEAD AND NECK CANCER TYPES USING MACHINE LEARNING ALGORITHM

EPRA International Journal of Research & Development (IJRD) ◽

10.36713/epra3289 ◽

2020 ◽

pp. 198-205

Author(s):

Prof O. Olabode ◽

Prof A. O. Adetunmbi ◽

Folake Akinbohun ◽

Dr Ambrose Akinbohun

Keyword(s):

Machine Learning ◽

Head And Neck Cancer ◽

Head And Neck ◽

Neck Cancer ◽

Naive Bayes ◽

Information Gain ◽

Learning Algorithm ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Chi Square

The worldwide incidence of head and neck cancer exceeds half a million cases annually. The morbidity and mortality of head and neck cancers considering thyroid, nasopharyngeal, sinonasal and laryngeal were reported high. The degree of facial disfigurement is unrivalled. Information Gain and Chi Square, Decision and Naïve Bayes were deployed for the study. The dataset was divided into training and test data. The results showed that the performance of Naïve Bayes outperformed Decision Trees. With the application of machine learning algorithms, head and neck cancer can be classified. KEYWORDS: Head and Neck, thyroid, Chi Square, Information Gain

Download Full-text

An Extensive Text Mining Study for the Turkish Language

Advances in Business Information Systems and Analytics - Natural Language Processing for Global and Local Business ◽

10.4018/978-1-7998-4240-8.ch012 ◽

2021 ◽

pp. 272-306

Author(s):

Durmuş Özkan Şahin ◽

Erdal Kılıç

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Text Mining ◽

Language Processing ◽

Information Gain ◽

Learning Algorithms ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

Chi Square

In this study, the authors give both theoretical and experimental information about text mining, which is one of the natural language processing topics. Three different text mining problems such as news classification, sentiment analysis, and author recognition are discussed for Turkish. They aim to reduce the running time and increase the performance of machine learning algorithms. Four different machine learning algorithms and two different feature selection metrics are used to solve these text classification problems. Classification algorithms are random forest (RF), logistic regression (LR), naive bayes (NB), and sequential minimal optimization (SMO). Chi-square and information gain metrics are used as the feature selection method. The highest classification performance achieved in this study is 0.895 according to the F-measure metric. This result is obtained by using the SMO classifier and information gain metric for news classification. This study is important in terms of comparing the performances of classification algorithms and feature selection methods.

Download Full-text

Machine Learning Algorithms based Skin Disease Detection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7686.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4044-4049

Keyword(s):

Machine Learning ◽

Skin Disease ◽

Skin Diseases ◽

Confusion Matrix ◽

Learning Algorithms ◽

Well Being ◽

Machine Learning Algorithms ◽

Significant Advance ◽

Main Concern ◽

Data Set

Skin disease recognition and observing is a major challenge looked by the medical industry. Because of expanding contamination and utilization of lousy nourishment, the tally of patients experiencing skin related issues is expanding at a quicker rate. Well-being isn’t the main concern, however unfortunate skin hurts our certainty. Customary and appropriate skin checking is a significant advance towards early discovery of any destructive or starting changes in skin that may bring about skin disease. Machine learning methods can add to the improvement of capable frameworks which can order various classes of skin illnesses. To identify skin maladies, first, it is required to separate the skin and non-skin. In this paper, five diverse machine learning algorithms have been chosen and executed on skin infection data set to anticipate the exact class of skin disease. Out of a few machine learning algorithms, we have worked on Random forest, naive bayes, logistic regression, kernel SVM and CNN. A similar examination dependent on confusion matrix parameters and training accuracy has been performed and delineated utilizing graphs. It is discovered that CNN is giving best training precision for the right expectation of skin diseases among all selected.

Download Full-text

Modified Associative Algorithm to Determine Frequent Pattern from Student Dataset

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d8321.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2449-2452

Keyword(s):

Machine Learning ◽

Data Mining ◽

Student Performance ◽

High Throughput ◽

Machine Learning Algorithms ◽

Frequent Pattern ◽

Huge Amount ◽

Student Records ◽

Data Mining Techniques ◽

Main Disadvantage

The phenomenal advances in Students produces huge amount of data like MOOC data and high throughput information that makes Electronic Student records (ESRs) expensive and complex. For the analysis of such a huge amount of data, AI and data mining techniques have been utilized along with Student services. Today, Data mining is utilized to detect performances using various informational datasets along with machine learning algorithms. There are many techniques available which are utilized for diagnosis of student performance like FP growth, Apriori and Associative algorithm etc. These techniques discover unknown patterns or relationships from large amount of data and these are utilized for making decisions for preventive and suggestive medicine. The main disadvantage of these techniques is it discovers fewer patterns. In this paper we proposed modified associative algorithm that discovers patterns to detect performance accurately. The results will help in predicting the performance quicker and more accurately, so that it leads to timely aware the students.

Download Full-text

Data mining techniques with machine learning algorithm to predict patients of heart disease

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1088/1/012035 ◽

2021 ◽

Vol 1088 (1) ◽

pp. 012035

Author(s):

Mulyawan ◽

Agus Bahtiar ◽

Githera Dwilestari ◽

Fadhil Muhammad Basysyar ◽

Nana Suarna

Keyword(s):

Machine Learning ◽

Data Mining ◽

Heart Disease ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Data Mining Techniques

Download Full-text

Data Mining-based Financial Statement Fraud Detection: Systematic Literature Review and Meta-analysis to Estimate Data Sample Mapping of Fraudulent Companies Against Non-fraudulent Companies

Global Business Review ◽

10.1177/0972150920984857 ◽

2021 ◽

pp. 097215092098485

Author(s):

Sonika Gupta ◽

Sushil Kumar Mehta

Keyword(s):

Machine Learning ◽

Data Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Classification Accuracy ◽

Meta Analysis ◽

Financial Statement ◽

Research Articles ◽

Financial Statement Fraud ◽

Data Mining Techniques

Data mining techniques have proven quite effective not only in detecting financial statement frauds but also in discovering other financial crimes, such as credit card frauds, loan and security frauds, corporate frauds, bank and insurance frauds, etc. Classification of data mining techniques, in recent years, has been accepted as one of the most credible methodologies for the detection of symptoms of financial statement frauds through scanning the published financial statements of companies. The retrieved literature that has used data mining classification techniques can be broadly categorized on the basis of the type of technique applied, as statistical techniques and machine learning techniques. The biggest challenge in executing the classification process using data mining techniques lies in collecting the data sample of fraudulent companies and mapping the sample of fraudulent companies against non-fraudulent companies. In this article, a systematic literature review (SLR) of studies from the area of financial statement fraud detection has been conducted. The review has considered research articles published between 1995 and 2020. Further, a meta-analysis has been performed to establish the effect of data sample mapping of fraudulent companies against non-fraudulent companies on the classification methods through comparing the overall classification accuracy reported in the literature. The retrieved literature indicates that a fraudulent sample can either be equally paired with non-fraudulent sample (1:1 data mapping) or be unequally mapped using 1:many ratio to increase the sample size proportionally. Based on the meta-analysis of the research articles, it can be concluded that machine learning approaches, in comparison to statistical approaches, can achieve better classification accuracy, particularly when the availability of sample data is low. High classification accuracy can be obtained with even a 1:1 mapping data set using machine learning classification approaches.

Download Full-text

Privacy Preservation using (L, D) Inference Model Based on Dependency Identification Information Gain

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1196.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 1170-1173

Keyword(s):

Data Mining ◽

Information Gain ◽

Original Data ◽

Perturbation Approach ◽

Sensitive Information ◽

Functional Dependencies ◽

Inference Model ◽

Data Set ◽

Data Mining Techniques ◽

Original Dataset

The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.

Download Full-text

Predicting Student Failure in University Examination using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2643.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 956-959

Keyword(s):

Machine Learning ◽

Data Mining ◽

Performance Management ◽

Student Performance ◽

Learning Algorithms ◽

Educational Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Social Characteristics ◽

Student Failure

Student Performance Management is one of the key pillars of the higher education institutions since it directly impacts the student’s career prospects and college rankings. This paper follows the path of learning analytics and educational data mining by applying machine learning techniques in student data for identifying students who are at the more likely to fail in the university examinations and thus providing needed interventions for improved student performance. The Paper uses data mining approach with 10 fold cross validation to classify students based on predictors which are demographic and social characteristics of the students. This paper compares five popular machine learning algorithms Rep Tree, Jrip, Random Forest, Random Tree, Naive Bayes algorithms based on overall classifier accuracy as well as other class specific indicators i.e. precision, recall, f-measure. Results proved that Rep tree algorithm outperformed other machine learning algorithms in classifying students who are at more likely to fail in the examinations.

Download Full-text