A Biological Data-Driven Mining Technique by Using Hybrid Classifiers With Rough Set

2021 ◽  
Vol 12 (3) ◽  
pp. 123-139
Author(s):  
Linkon Chowdhury ◽  
Md Sarwar Kamal ◽  
Shamim H. Ripon ◽  
Sazia Parvin ◽  
Omar Khadeer Hussain ◽  
...  

Biological data classification and analysis are significant for living organs. A biological data classification is an approach that classifies the organs into a particular group based on their features and characteristics. The objective of this paper is to establish a hybrid approach with naive Bayes, apriori algorithm, and KNN classifier that generates optimal classification rules for finding biological pattern matching. The authors create combined association rules by using naïve Bayes and apriori approach with a rough set for next sequence prediction. First, the large DNA sequence is reduced by using k-nearest approach. They apply association rules by using naïve Bayes and apriori approach for the next sequence pattern. The hybrid approach provides more accuracy than single classifier for biological sequence prediction. The optimized hybrid process needs less execution time for rule generation for massive biological data analysis. The results established that the hybrid approach generally outperforms the other association rule generation approach.

Data Mining ◽  
2013 ◽  
pp. 1019-1042
Author(s):  
Pratibha Rani ◽  
Vikram Pudi

The rapid progress of computational biology, biotechnology, and bioinformatics in the last two decades has led to the accumulation of tremendous amounts of biological data that demands in-depth analysis. Data mining methods have been applied successfully for analyzing this data. An important problem in biological data analysis is to classify a newly discovered sequence like a protein or DNA sequence based on their important features and functions, using the collection of available sequences. In this chapter, we study this problem and present two Bayesian classifiers RBNBC (Rani & Pudi, 2008a) and REBMEC (Rani & Pudi, 2008c). The algorithms used in these classifiers incorporate repeated occurrences of subsequences within each sequence (Rani, 2008). Specifically, Repeat Based Naive Bayes Classifier (RBNBC) uses a novel formulation of Naive Bayes, and the second classifier, Repeat Based Maximum Entropy Classifier (REBMEC) uses a novel framework based on the classical Generalized Iterative Scaling (GIS) algorithm.


2018 ◽  
Vol 7 (4.38) ◽  
pp. 955
Author(s):  
M. Bakri C. Haron ◽  
Siti Z. Z. Abidin ◽  
N. Azmina M. Zamani ◽  
. .

Facebook has become a popular platform in communicating information. People can express their opinions using texts, symbols, pictures and emoticons via Facebook posts and comments. These expressions allow sentiment analysis to be performed by collecting the data to obtain the public’s opinions and emotions toward certain issues. Due to a huge amount of data obtained from Facebook, proper approaches are required to cater the texts and symbols used in the comments. There are also limited amount of dictionary on Malay texts which make it more challenging to process and classify the positive and negative words used in the comments. Thus, hybrid approach is applied during the data processing to visualize the results. In this work, a combination of lexicon-based approach and Naïve Bayes are used. This study focuses on analyzing the public’s sentiments on crime news in Facebook by using word cloud visualization. The visualization displays important words used in a form of a word cloud. Moreover, the percentage of positive and negative words existed in the comments is also shown as part of the visualization results. 


2016 ◽  
Vol 7 (1) ◽  
pp. 283 ◽  
Author(s):  
Elvira Sukma Wahyuni

Tujuan utama penelitian ini adalah untuk meningkatkan peforma klasifikasi pada diagnosis kanker payudara dengan menerapkan seleksi fitur pada beberapa algoritme klasifikasi. Penelitian ini menggunakan database kanker payudara Wisconsin Breast Cancer Database (WBCD). Metode seleksi fitur F-score dan Rough Set akan dipasangkan dengan beberapa algoritme klasifikasi yaitu SMO (Sequential Minimal Optimization), Naive Bayes, Multi layer Perceptron, dan C4.5. Penelitian ini menggunakan 10 fold cross validation sebagai metode evaluasi. Hasil penelitian menunjukkan algoritme klasifikasi MLP dan C4.5 mengalami peningkatan peforma klasifikasi secara signifikan setelah dipasangkan dengan seleksi fitur rough set dan F-score, Naive Bayes menunjukan peforma terbaik ketika dipasangkan dengan metode seleksi fitur F-score saja, sedangkan SMO tidak menunjukkan peningkatan peforma klasifikas ketika dipasangkan pada kedua seleksi fitur. Kata kunci: kanker payudara, seleksi fitur, klasifikasi.


Sign in / Sign up

Export Citation Format

Share Document