scholarly journals Rat Protein’s Enzyme Class Classification Using Machine Learning

In the current era, bioinformatics has been an emerging research area in the context of protein enzyme classification from the unknown protein data. In bioinformatics, the prime goal is to manipulate the protein data and develop a computational technique to classify and predict the appropriate features for function predictions. In this context, several machine learning and statistical technique have been designed for classification of data. The classification of protein data is one the challenging task and generally the classification of protein data has been done on human protein data. In this article, we have considered rat enzyme class for classification and predictions. Here we have used like CRT, CHAID, C5.0, NEURAL, SVM, and Bayesian for classification of protein data and to measure the performance of the model, the accuracy, specificity, sensitivity, precision, recall, f-measures and MCC have been used. The experimental result highlights that the some of the protein data are imbalance that affects the performance. In this experiment, the Lyases, Isomerases and Ligases class of data are imbalanced and affect the performance of the models. The experimental results highlight that the C5.0 gives 91.5% accuracy and takes only 4 second for computation and can be used for protein classification and prediction of protein data.

Author(s):  
Chhote Lal Prasad Gupta ◽  
Anand Bihari ◽  
Sudhakar Tripathi

Background: In recent era prediction of enzyme class from an unknown protein is one of the challenging tasks in bioinformatics. Day to day the number of proteins increases that causes difficulties in clinical verification and classification; as a result, the prediction of enzyme class gives a new opportunity to bioinformatics scholars. The machine learning classification technique helps in protein classification and predictions. But it is imperative to know which classification technique is more suited for protein classification. This study used human proteins data that is extracted from UniProtKB databank. Total 4368 protein data with 45 identified features has been used for experimental analysis. Objective: The prime objective of this article is to find an appropriate classification technique to classify the reviewed as well as un-reviewed human enzyme class of protein data. Also find the significance of different features in protein classification and prediction. Method: In this article, the ten most significant classification techniques such as CRT, QUEST, CHAID, C5.0, ANN, SVM, Bayesian, Random Forest, XgBoost and CatBoost has been used to classify the data and know the importance of features. To validate the result of different classification technique, the accuracy, precision, recall, F-measures, sensitivity, specificity, MCC, ROC and AUROC has been used. All experiment has been done with the help of SPSS Clementine and Python. Result: Above discussed classification techniques give different results and found that the data are imbalanced for class C4, C5, and C6. As a result, all of the classification technique gives acceptable accuracy above of 60% for these classes of data, but their precision value is very less or negligible. The experimental results highlight that the Random forest gives highest accuracy as well as AUROC among all, i.e., 96.84% and 0.945 respectively. And also have high precision and recall value. Conclusion: The experiment conducted and analyzed in this article highlight that the Random Forest classification technique can be used for protein of human enzyme classification and predictions.


2019 ◽  
Vol 8 (3) ◽  
pp. 35-37
Author(s):  
R. Ravikumar ◽  
M. Babu Reddy

In machine learning as the dimensionality of the data rises, the amount of data required to provide a reliable analysis grows exponentially. To perform dimensionality reduction on high-dimensional micro array data, many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. Analyzing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. Feature selection has been an active and fruitful field of research area in pattern recognition, machine learning, statistics and data mining communities. The main objective of this paper is feature selection is to choose a subset of input variables by eliminating features.


2019 ◽  
Vol 8 (2) ◽  
pp. 3591-3599

In the field of computational biology, to gauge the meaningful and accurate feature for protein function predications, either the profile-based protein data or sequence-based data has been used. As we know that the prediction of enzyme class from an unknown protein is most interacted research in the current era. In this context, machine learning and statistical classification technique has been used. In this article, we have use six different machine learning and statistical classification technique such as CRT, QUEST, CHAID, C5.0, ANN and SVM for classification of 4314 number of human protein sequence data. These data are extracted form UniprotKB databank with the help of PROFEAT server. The extracted data are categorized in seven different classes. To manipulate the high dimensional protein sequence data with some missing value, the SPSS has been used for classification and estimation of the performance of classification technique. The experimental results highlight that the class C4, C5, C6 and C7 data are imbalanced that affect the overall performance of classification technique. This article provides an extensive comparative analysis of different classification technique on sequence-based protein data. The experimental analysis highlights that the SVM and C5.0 classification technique gives better result than others and can be used for protein classification and predictions.


2017 ◽  
Vol 29 (06) ◽  
pp. 1750047
Author(s):  
Amita Das ◽  
S. S. Panda ◽  
Sukanta Sabut

The paper proposes a modified approach of delineation and classification of two different types of liver cancers viz. Hepatocellular Carcinoma (HCC) and Metastatic Carcinoma (MET) from different slices of computed tomography (CT) scans images. A combined framework of reorganization and extraction of region of interest (ROI), texture feature extraction followed by texture classification by different machine learning approaches has been presented. Initially, adaptive thresholding has been applied to segment the liver region from CT images. Level set algorithm has been used for detecting the region of cancer tissues. In the classification stage, the delineated output lesions have been extracted with 38 features to build up the dataset. Two machine learning classifiers, support vector machine (SVM) and random forest (RF), have been used to train the dataset for correct prediction of cancer classes. Ten-fold cross-validation has been used to evaluate the performance of two classifiers. The efficiency of the proposed algorithm is tested in terms of accuracy, where the RF classifier achieved a higher accuracy of 95% compared to SVM classifier of 87%. The experimental result proves the superiority of RF classifier compared to SVM classifier with level-set features.


2020 ◽  
Author(s):  
Apiwat Sangphukieo ◽  
Teeraphan Laomettachit ◽  
Marasri Ruengjitchatchawalya

AbstractIdentification of novel photosynthetic proteins is important for understanding and improving photosynthetic efficiency. Synergistically, genomic context such as genome neighborhood can provide additional useful information to identify the photosynthetic proteins. We, therefore, expected that applying the computational approach, particularly machine learning (ML) with the genome neighborhood-based feature should facilitate the photosynthetic function assignment. Our results revealed a functional relationship between photosynthetic genes and their genomic neighbors, indicating the possibility to assign functions from their genome neighborhood profile. Therefore, we created a new method for extracting the patterns based on genome neighborhood network (GNN) and applied for the photosynthetic protein classification using ML algorithms. Random forest (RF) classifier using genome neighborhood-based features achieved the highest accuracy up to 94% in the classification of photosynthetic proteins and also showed better performance (Mathew’s correlation coefficient = 0.852) than other available tools including the sequence similarity search (0.497) and ML-based method (0.512). Furthermore, we demonstrated the ability of our model to identify novel photosynthetic proteins comparing to the other methods. Our classifier is available at http://bicep.kmutt.ac.th/photomod_standalone, https://bit.ly/2S0I2Ox and DockerHub: https://hub.docker.com/r/asangphukieo/photomod


Author(s):  
Padmavathi .S ◽  
M. Chidambaram

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.


Author(s):  
Hyeuk Kim

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.


Author(s):  
Sumit Kaur

Abstract- Deep learning is an emerging research area in machine learning and pattern recognition field which has been presented with the goal of drawing Machine Learning nearer to one of its unique objectives, Artificial Intelligence. It tries to mimic the human brain, which is capable of processing and learning from the complex input data and solving different kinds of complicated tasks well. Deep learning (DL) basically based on a set of supervised and unsupervised algorithms that attempt to model higher level abstractions in data and make it self-learning for hierarchical representation for classification. In the recent years, it has attracted much attention due to its state-of-the-art performance in diverse areas like object perception, speech recognition, computer vision, collaborative filtering and natural language processing. This paper will present a survey on different deep learning techniques for remote sensing image classification. 


Sign in / Sign up

Export Citation Format

Share Document