scholarly journals IMPROVING CLASSIFICATION PERFORMANCE OF NEURO-FUZZY CLASSIFIER BY IMPUTING MISSING DATA

2019 ◽  
pp. 495-501
Author(s):  
Balasaheb Tarle ◽  
Muddana Akkalaksmi

In medical data classification, if the size of data sets is small and if it contains multiple missing attribute values, in such cases improving classification performance is an important issue. The foremost objective of machine learning research is to improve the classification performance of the classifiers. The number of training instances provided for training must be sufficient in size. In the proposed algorithm, we substitute missing attribute values with attribute available domain values and generate additional training tuples that are in addition to original training tuples. These additional, plus original training samples provide sufficient data samples for learning. The neuro-fuzzy classifier trained on this dataset. The classification performance on test data for the neuro-fuzzy classifier is obtained using the k-fold cross-validation method. The proposed method attains around 2.8% and 3.61% improvement in classification accuracy for this classifier.

2012 ◽  
Vol 58 (4) ◽  
pp. 425-431 ◽  
Author(s):  
D. Selvathi ◽  
N. Emimal ◽  
Henry Selvaraj

Abstract The medical imaging field has grown significantly in recent years and demands high accuracy since it deals with human life. The idea is to reduce human error as much as possible by assisting physicians and radiologists with some automatic techniques. The use of artificial intelligent techniques has shown great potential in this field. Hence, in this paper the neuro fuzzy classifier is applied for the automated characterization of atheromatous plaque to identify the fibrotic, lipidic and calcified tissues in Intravascular Ultrasound images (IVUS) which is designed using sixteen inputs, corresponds to sixteen pixels of instantaneous scanning matrix, one output that tells whether the pixel under consideration is Fibrotic, Lipidic, Calcified or Normal pixel. The classification performance was evaluated in terms of sensitivity, specificity and accuracy and the results confirmed that the proposed system has potential in detecting the respective plaque with the average accuracy of 98.9%.


2020 ◽  
Vol 6 (2) ◽  
pp. 90-97
Author(s):  
Sagir Masanawa ◽  
Hamza Abubakar

In this paper, a hybrid intelligent system that consists of the sparse matrix approach incorporated in neural network learning model as a decision support tool for medical data classification is presented. The main objective of this research is to develop an effective intelligent system that can be used by medical practitioners to accelerate diagnosis and treatment processes. The sparse matrix approach incorporated in neural network learning algorithm for scalability, minimize higher memory storage capacity usage, enhancing implementation time and speed up the analysis of the medical data classification problem. The hybrid intelligent system aims to exploit the advantages of the constituent models and, at the same time, alleviate their limitations. The proposed intelligent classification system maximizes the intelligently classification of medical data and minimizes the number of trends inaccurately identified. To evaluate the effectiveness of the hybrid intelligent system, three benchmark medical data sets, viz., Hepatitis, SPECT Heart and Cleveland Heart from the UCI Repository of Machine Learning, are used for evaluation. A number of useful performance metrics in medical applications which include accuracy, sensitivity, specificity. The results were analyzed and compared with those from other methods published in the literature. The experimental outcomes positively demonstrate that the hybrid intelligent system was effective in undertaking medical data classification tasks.


Author(s):  
WASIF AFZAL ◽  
RICHARD TORKAR ◽  
ROBERT FELDT

In the presence of a number of algorithms for classification and prediction in software engineering, there is a need to have a systematic way of assessing their performances. The performance assessment is typically done by some form of partitioning or resampling of the original data to alleviate biased estimation. For predictive and classification studies in software engineering, there is a lack of a definitive advice on the most appropriate resampling method to use. This is seen as one of the contributing factors for not being able to draw general conclusions on what modeling technique or set of predictor variables are the most appropriate. Furthermore, the use of a variety of resampling methods make it impossible to perform any formal meta-analysis of the primary study results. Therefore, it is desirable to examine the influence of various resampling methods and to quantify possible differences. Objective and method: This study empirically compares five common resampling methods (hold-out validation, repeated random sub-sampling, 10-fold cross-validation, leave-one-out cross-validation and non-parametric bootstrapping) using 8 publicly available data sets with genetic programming (GP) and multiple linear regression (MLR) as software quality classification approaches. Location of (PF, PD) pairs in the ROC (receiver operating characteristics) space and area under an ROC curve (AUC) are used as accuracy indicators. Results: The results show that in terms of the location of (PF, PD) pairs in the ROC space, bootstrapping results are in the preferred region for 3 of the 8 data sets for GP and for 4 of the 8 data sets for MLR. Based on the AUC measure, there are no significant differences between the different resampling methods using GP and MLR. Conclusion: There can be certain data set properties responsible for insignificant differences between the resampling methods based on AUC. These include imbalanced data sets, insignificant predictor variables and high-dimensional data sets. With the current selection of data sets and classification techniques, bootstrapping is a preferred method based on the location of (PF, PD) pair data in the ROC space. Hold-out validation is not a good choice for comparatively smaller data sets, where leave-one-out cross-validation (LOOCV) performs better. For comparatively larger data sets, 10-fold cross-validation performs better than LOOCV.


Author(s):  
Gede Aditra Pradnyana ◽  
I Komang Agus Suryantara ◽  
I Gede Mahendra Darmawiguna

An impression can be interpreted as a psychological feeling toward a product and it plays an important role in decision making. Therefore, the understanding of the data in the domain of impressions will be very useful. This research had the objective of knowing the performance of K-Nearest Neighbors method to classify endek image impression using K-Fold Cross Validation method. The images were taken from 3 locations, namely CV. Artha Dharma, Agung Bali Collection, and Pengrajin Sri Rejeki. To get the image impression was done by consulting with an endek expert named Dr. D.A Tirta Ray, M.Si. The process of data mining was done by using K-Nearest Neighbors Method which was a classification method to a set of data based on learning data that had been classified previously and to classify new objects based on attributes and training samples. K-Fold Cross Validation testing obtained accuracy of 91% with K value in K-Nearest Neighbors of 3, 4, 7, 8.


10.2196/22555 ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. e22555
Author(s):  
Yao Lu ◽  
Tianshu Zhou ◽  
Yu Tian ◽  
Shiqiang Zhu ◽  
Jingsong Li

Background Data sharing in multicenter medical research can improve the generalizability of research, accelerate progress, enhance collaborations among institutions, and lead to new discoveries from data pooled from multiple sources. Despite these benefits, many medical institutions are unwilling to share their data, as sharing may cause sensitive information to be leaked to researchers, other institutions, and unauthorized users. Great progress has been made in the development of secure machine learning frameworks based on homomorphic encryption in recent years; however, nearly all such frameworks use a single secret key and lack a description of how to securely evaluate the trained model, which makes them impractical for multicenter medical applications. Objective The aim of this study is to provide a privacy-preserving machine learning protocol for multiple data providers and researchers (eg, logistic regression). This protocol allows researchers to train models and then evaluate them on medical data from multiple sources while providing privacy protection for both the sensitive data and the learned model. Methods We adapted a novel threshold homomorphic encryption scheme to guarantee privacy requirements. We devised new relinearization key generation techniques for greater scalability and multiplicative depth and new model training strategies for simultaneously training multiple models through x-fold cross-validation. Results Using a client-server architecture, we evaluated the performance of our protocol. The experimental results demonstrated that, with 10-fold cross-validation, our privacy-preserving logistic regression model training and evaluation over 10 attributes in a data set of 49,152 samples took approximately 7 minutes and 20 minutes, respectively. Conclusions We present the first privacy-preserving multiparty logistic regression model training and evaluation protocol based on threshold homomorphic encryption. Our protocol is practical for real-world use and may promote multicenter medical research to some extent.


Sign in / Sign up

Export Citation Format

Share Document