LARGE SCALE EXPERIMENTS WITH NAIVE BAYES AND DECISION TREES FOR FUNCTION TAGGING

2008 ◽  
Vol 17 (03) ◽  
pp. 483-499
Author(s):  
MIHAI LINTEAN ◽  
VASILE RUS

This paper describes the use of two machine learning techniques, naive Bayes and decision trees, to address the task of assigning function tags to nodes in a syntactic parse tree. Function tags are extra functional information, such as logical subject or predicate, that can be added to certain nodes in syntactic parse trees. We model the function tags assignment problem as a classification problem. Each function tag is regarded as a class and the task is to find what class/tag a given node in a parse tree belongs to from a set of predefined classes/tags. The paper offers the first systematic comparison of the two techniques, naive Bayes and decision trees, for the task of function tags assignment. The comparison is based on a standardized data set, the Penn Treebank, a collection of sentences annotated with syntactic information including function tags. We found out that decision trees generally outperform naive Bayes for the task of function tagging. Furthermore, this is the first large scale evaluation of decision trees based solutions to the task of functional tagging.

2019 ◽  
Vol 8 (2S11) ◽  
pp. 2684-2687 ◽  

The Web is one of the richest sources for gathering of consumer reviews and opinions. There are many websites which contains opinions of the customers in the form of reviews, blogs, discussion groups, and forums. This project focuses on customer reviews on the restaurants. It predicts whether the given comment is either a positive or negative using supervised machine learning techniques. The project makes use of a dataset from Kaggle website. The dataset consists of comment and the type of comment (i.e., either positive or negative). This project makes a study on classification algorithm and text mining approaches to identify the type of comment. Firstly, the data set which is taken is made free from duplicates. That is duplicates are removed then it is followed by text pre-processing that involves removal of punctuation marks, stop word removal and then conversion of the whole text into vector format would takes place. The conversion from text to vector is an essential step because the English cannot be directly used for the analysis as we are working with linear algebra. So, as to work with this data, it has to be converted to vector format and we are using CountVectorizer to convert the data to the vector format. And finally comes the classification part. We are using Naive Bayes algorithm for this classification. This classification makes the data set into two parts as mentioned above. Here we are taking 70 percent of the data to be train data set and 30 percent of the data to be test data set


Author(s):  
V Umarani ◽  
A Julian ◽  
J Deepa

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.


Author(s):  
S. Prasanthi ◽  
S.Durga Bhavani ◽  
T. Sobha Rani ◽  
Raju S. Bapi

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.


2021 ◽  
Vol 23 (04) ◽  
pp. 356-372
Author(s):  
Manpreet Kaur ◽  
◽  
Dr. Dinesh Kumar ◽  

The classification techniques based on various machine learning techniques are having use for the Big data analysis. This will be useful in identifying the classification and then finally the prediction which will be useful for the decision managers for having quality decisions. There are various types of supervised and unsupervised learning techniques which are having capabilities in the terms of driving the analysis. This analysis will be useful for having identification of relationship between the various attributes which is required to device the analysis. There are various supervised learning techniques which are useful to drive the analysis. These techniques are SVM, Logistic regression, KNN, Naïve Bayes, Tree, Neural network. The relative comparison of this technique is done in the terms of various parameters for example AUC, CA, F1, Recall and precision. The accuracy in the terms of AUC, CA is highest for the Naïve Bayes. This shows the Naïve Bayes is having higher true positives, true negative ratio. The proposed technique is having higher accuracy of 81% which is far above than all the remaining techniques. The confusion matrix for the Naïve Bayes is having true positive count as 729, true negative at 103. This shows that the true positive and true negative count is far above for this technique compared to the other techniques.


Nowadays people share their views and opinions in twitter and other social media platforms, the way of recognizing sentiments and speculation in tweets is Twitter Sentiment Analysis. Determining the contradiction or sentiment of the tweets and then listing them into positive, negative and neutral tweets is the main classifying step in this process. The issue related to sentiment analysis is the naming of the correct congruous sentiment classifier algorithm to list the tweets. The foundation classifier techniques like Logistic regression, Naive Bayes classifier, Random Forest and SVMs are normally used. In this paper, the Naïve Bayes classifier and Logistic Regression has been used to perform sentiment analysis and classify based on the better accuracy of catagorizing Technique. The outcome shows that Naive Bayes classifier works better for this approach. Data pre-processing and feature extraction is realized as a portion of task.


2013 ◽  
pp. 937-947
Author(s):  
S. Prasanthi ◽  
S.Durga Bhavani ◽  
T. Sobha Rani ◽  
Raju S. Bapi

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.


Author(s):  
Jothikumar R. ◽  
Vijay Anand R. ◽  
Visu P. ◽  
Kumar R. ◽  
Susi S. ◽  
...  

Sentiment evaluation alludes to separate the sentiments from the characteristic language and to perceive the mentality about the exact theme. Novel corona infection, a harmful malady ailment, is spreading out of the blue through the quarter, which thought processes respiratory tract diseases that can change from gentle to extraordinary levels. Because of its quick nature of spreading and no conceived cure, it ushered in a vibe of stress and pressure. In this chapter, a framework perusing principally based procedure is utilized to discover the musings of the tweets related to COVID and its effect lockdown. The chapter examines the tweets identified with the hash tags of crown infection and lockdown. The tweets were marked fabulous, negative, or fair, and a posting of classifiers has been utilized to investigate the precision and execution. The classifiers utilized have been under the four models which incorporate decision tree, regression, helpful asset vector framework, and naïve Bayes forms.


Interminable Kidney Disease (CKD) proposes the realm of kidney chance which may even crumble by means of time and through implying the factors. If it continues finishing all the more dreadful Dialysis is and most desperate conclusive outcomes believable it'd flash off kidney misery (End-Stage Renal Disease). Area of CKD in a starting period should help in filtering by means of the complexities and harm.In the pastwork portrayal applied are SVM and Naïve Bayes, it happened that the execution time took by methods for Naïve Bayes is irrelevant appeared differently in relation to SVM, confused events are substantially less with SVM that results in less request execution of Naïve Bayes, inferable from gentle exactness distinction. It can be corrected by methods for taking less improvements. Unsuspecting Bayes is a probabilistic classifier a fundamental count by utilizing Bayes Theorem with a prohibitive independence supposition. The artistic creations for the most segment brings around growing symptomatic exactness and decrease commitment time, this is the guideline factor. An undertaking is made to develop a form evaluating CKD data collected from a particular course of action of people. From the model data, recognizing verification should be conceivable. This work has enchanted on developing up a system relying upon gathering procedures: SVM, Naïve Bayes, glomerular filtration rate (GFR) is the best pointer of how well the kidneys are working.CKD has got no cure but it can be treated based on symptoms to reduce complicationsand


2021 ◽  
Vol 7 (1) ◽  
pp. 1
Author(s):  
Ripto Sudiyarno ◽  
Arief Setyanto ◽  
Emha Taufiq Luthfi

Intrusion detection systems (IDS) atau Sistem pendeteksian intrusi dikenal sebagai teknik yang sangat menonjol dan terkemuka untuk menemukan malicious activities pada jaringan komputer, tidak seperti firewall konvensional, IDS berbeda dalam hal pengidentifikasian serangan secara cerdas dengan pendekatan analitik seperti data mining dan teknik machine learning. Dalam beberapa dekade terakhir, ensemble learning sangat memajukan penelitian pada machine learning dan klasifikasi pola, serta menunjukan peningkatan hasil kinerja dibandingkan single classifier. Pada Penelitian ini dilakukan percobaan peningkatan nilai akurasi terhadap sistem pendeteksian anomali, pertama dilakukan klasifikasi menggunakan single classifier untuk didapati hasil nilai akurasi yang nantinya dibandingkan dengan hasil dari ensemble learning dan feature selection. Penggunaan ensemble learning bertujuan untuk mendapatkan nilai akurasi yang terbaik dari single classifier. Hasil didapatkan dari nilai confusion matrix dan akan dilakukan pengujian dengan cara membandingkan nilai kedua metode diatas. Penelitian berhasil mendapatkan nilai akurasi single classifier (naïve bayes) yaitu 77,4% dan nilai ensemble learning 96,8%. Kata Kunci— ensemble learning, nsl-kdd, naïve bayes, anomali, feature selectionIntrusion detection systems (IDS) are known as very prominent and leading techniques for finding malicious activities on computer networks, unlike conventional firewalls, IDS differs in terms of identifying attacks intelligently with analytic approaches such as machine learning techniques. In the last few decades, ensemble learning has greatly advanced research in machine learning and pattern classification it has shown an improve in performance results compared to a single classifier. In this study an attempt was made to increase the accuracy of anomalous detection systems, first by classification using a single classifier to find the results of accuracy which will be compared with the results of ensemble learning and feature selection. The use of ensemble learning aims to get the best accuracy value from a single classifier. The results are obtained from the value of the confusion matrix and will be tested by comparing the values of the two methods above. The research succeeded in getting a single classifier accuracy value of 77,4% and ensemble learning 96,8%. Keywords— ensemble learning, nsl-kdd, naïve bayes, anomali, feature selection


In today’s world there is rapid increase in the information which makes addressing of security issues more important. Malware detection is an important area for research in effective and secure functioning of computer networks. Research efforts are required to protect the systems from various security attacks. In this paper, we analyze usefulness of Soft Computing and Machine Learning Techniques for network malware detection. Hamamoto et al. [1] used combination of Genetic Algorithm and Fuzzy logic for implementation of network anomaly detection. The research work proposed in this paper extends the concepts discussed in [1]. The proposed work explores use of various Machine Learning algorithms such as K-Nearest Neighbor, Naïve Bayes and Decision Tree for network anomaly detection. The experimental observations are conducted on CIDDS (Coburg Intrusion Detection Data Set) dataset [14]. It is observed that Decision Tree approach gave better results as compared to KNN and Naïve Bayes techniques. Decision Tree technique gives 99% of accuracy and precision of 1 and recall of 1.


Sign in / Sign up

Export Citation Format

Share Document