A Bayesian Framework for Improving Clustering Accuracy of Protein Sequences Based on Association Rules

Author(s):  
Peng-Yeng yin ◽  
Shyong-Jian Shyu ◽  
Guan-Shieng Huang ◽  
Shuang-Te Liao

With the advent of new sequencing technology for biological data, the number of sequenced proteins stored in public databases has become an explosion. The structural, functional, and phylogenetic analyses of proteins would benefit from exploring databases by using data mining techniques. Clustering algorithms can assign proteins into clusters such that proteins in the same cluster are more similar in homology than those in different clusters. This procedure not only simplifies the analysis task but also enhances the accuracy of the results. Most of the existing protein-clustering algorithms compute the similarity between proteins based on one-to-one pairwise sequence

Author(s):  
Peng-Yeng yin ◽  
Shyong-Jian Shyu ◽  
Guan-Shieng Huang ◽  
Shuang-Te Liao

With the advent of new sequencing technology for biological data, the number of sequenced proteins stored in public databases has become an explosion. The structural, functional, and phylogenetic analyses of proteins would benefit from exploring databases by using data mining techniques. Clustering algorithms can assign proteins into clusters such that proteins in the same cluster are more similar in homology than those in different clusters. This procedure not only simplifies the analysis task but also enhances the accuracy of the results. Most of the existing protein-clustering algorithms compute the similarity between proteins based on one-to-one pairwise sequence


2008 ◽  
pp. 1091-1102
Author(s):  
Peng-Yeng Yin ◽  
Shyong-Jian Shyu ◽  
Guan-Shieng Huang ◽  
Shuang-Te Liao

With the advent of new sequencing technology for biological data, the number of sequenced proteins stored in public databases has become an explosion. The structural, functional, and phylogenetic analyses of proteins would benefit from exploring databases by using data mining techniques. Clustering algorithms can assign proteins into clusters such that proteins in the same cluster are more similar in homology than those in different clusters. This procedure not only simplifies the analysis task but also enhances the accuracy of the results. Most of the existing protein-clustering algorithms compute the similarity between proteins based on one-to-one pairwise sequence


2011 ◽  
pp. 2259-2273
Author(s):  
Peng-Yeng Yin ◽  
Shyong-Jian Shyu ◽  
Guan-Shieng Huang ◽  
Shuang-Te Liao

With the advent of new sequencing technology for biological data, the number of sequenced proteins stored in public databases has become an explosion. The structural, functional, and phylogenetic analyses of proteins would benefit from exploring databases by using data mining techniques. Clustering algorithms can assign proteins into clusters such that proteins in the same cluster are more similar in homology than those in different clusters. This procedure not only simplifies the analysis task but also enhances the accuracy of the results. Most of the existing protein-clustering algorithms compute the similarity between proteins based on one-to-one pairwise sequence alignment instead of multiple sequences alignment; the latter is prohibited due to expensive computation. Hence the accuracy of the clustering result is deteriorated. Further, the traditional clustering methods are ad-hoc and the resulting clustering often converges to local optima. This chapter presents a Bayesian framework for improving clustering accuracy of protein sequences based on association rules. The experimental results manifest that the proposed framework can significantly improve the performance of traditional clustering methods.


Author(s):  
G. Ramadevi ◽  
Srujitha Yeruva ◽  
P. Sravanthi ◽  
P. Eknath Vamsi ◽  
S. Jaya Prakash

In a digitized world, data is growing exponentially and it is difficult to analyze the data and give the results. Data mining techniques play an important role in healthcare sector - BigData. By making use of Data mining algorithms it is possible to analyze, detect and predict the presence of disease which helps doctors to detect the disease early and in decision making. The objective of data mining techniques used is to design an automated tool that notifies the patient’s treatment history disease and medical data to doctors. Data mining techniques are very much useful in analyzing medical data to achieve meaningful and practical patterns. This project works on diabetes medical data, classification and clustering algorithms like (OPTICS, NAIVEBAYES, and BRICH) are implemented and the efficiency of the same is examined.


Author(s):  
Lluís Sanmiquel ◽  
Marc Bascompta ◽  
Josep Ma. Rossell ◽  
Hernán Anticoi ◽  
Eduard Guash

An analysis of workplace accidents in the mining sector has been done using the database from the Spanish administration between the period 2005-2015 and applying data mining techniques. Data has been processed by means of the software Weka. Two scenarios were chosen regarding the accidents database, surface and underground mining. The most important variables involved in occupation accidents and their association rules have been determined. These rules are formed by several predictor variables that cause an accident, defining its characteristics and context. This study exposes the 20 most important association rules of the sector, either surface or underground mining, based on statistical confidence levels of each rule obtained by Weka. The outcomes display the most typical immediate causes with the percentage of accident basis of each association rule. The most typical immediate cause is body movement with physical effort or overexertion and type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident change in both scenarios. Data mining techniques have been proved as a very powerful tool to find out the root of the accidents, apply corrective measures and verify their effectiveness, either for public or private companies.


Knowledge discovery process deals with two essential data mining techniques, association and classification. Classification produces a set of large number of associative classification rules for a given observation. Pruning removes unnecessary class association rules without losing classification accuracy. These processes are very significant but at the same time very challenging. The experimental results and limitations of existing class association rules mining techniques have shown that there is a requirement to consider more pruning parameters so that the size of classifier can be further optimized. Here through this paper we are presenting a survey various strategies for class association rule pruning and study their effects that enables us to extract efficient compact and high confidence class association rule set and we have also proposed a pruning methodology..


Author(s):  
Sujata Mulik

Agriculture sector in India is facing rigorous problem to maximize crop productivity. More than 60 percent of the crop still depends on climatic factors like rainfall, temperature, humidity. This paper discusses the use of various Data Mining applications in agriculture sector. Data Mining is used to solve various problems in agriculture sector. It can be used it to solve yield prediction.  The problem of yield prediction is a major problem that remains to be solved based on available data. Data mining techniques are the better choices for this purpose. Different Data Mining techniques are used and evaluated in agriculture for estimating the future year's crop production. In this paper we have focused on predicting crop yield productivity of kharif & Rabi Crops. 


Sign in / Sign up

Export Citation Format

Share Document