scholarly journals Naive Bayes classification model for isotopologue detection in LC-HRMS data

Author(s):  
Denice van Herwerden ◽  
Jake O'Brien ◽  
Phil Choi ◽  
Kevin Thomas ◽  
Peter Schoenmakers ◽  
...  

Isotopologue identification or removal is a necessary step to reduce the number of features that need to be identified in samples analyzed with non-targeted analysis. Currently available approaches rely on either predicted isotopic patterns or an arbitrary mass tolerance, requiring information on the molecular formula or instrumental error, respectively. Therefore, a Naive Bayes isotopologue classification model was developed that does not depend on any thresholds or molecular formula information. This classification model uses elemental mass defects of six elemental ratios and can successfully identify isotopologues in both theoretical isotopic patterns and wastewater influent samples, outperforming one of the most commonly used approaches (i.e., 1.0033 Da mass difference method - CAMERA).

Healthcare ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 884
Author(s):  
Antonio García-Domínguez ◽  
Carlos E. Galván-Tejada ◽  
Ramón F. Brena ◽  
Antonio A. Aguileta ◽  
Jorge I. Galván-Tejada ◽  
...  

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.


2021 ◽  
Vol 5 (3) ◽  
pp. 527-533
Author(s):  
Yoga Religia ◽  
Amali Amali

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.


Author(s):  
Han-joon Kim

This chapter introduces two practical techniques for improving Naïve Bayes text classifiers that are widely used for text classification. The Naïve Bayes has been evaluated to be a practical text classification algorithm due to its simple classification model, reasonable classification accuracy, and easy update of classification model. Thus, many researchers have a strong incentive to improve the Naïve Bayes by combining it with other meta-learning approaches such as EM (Expectation Maximization) and Boosting. The EM approach is to combine the Naïve Bayes with the EM algorithm and the Boosting approach is to use the Naïve Bayes as a base classifier in the AdaBoost algorithm. For both approaches, a special uncertainty measure fit for Naïve Bayes learning is used. In the Naïve Bayes learning framework, these approaches are expected to be practical solutions to the problem of lack of training documents in text classification systems.


Energies ◽  
2020 ◽  
Vol 13 (8) ◽  
pp. 2067
Author(s):  
Nilsa Duarte da Silva Lima ◽  
Irenilza de Alencar Nääs ◽  
João Gilberto Mendes dos Reis ◽  
Raquel Baracat Tosi Rodrigues da Silva

The present study aimed to assess and classify energy-environmental efficiency levels to reduce greenhouse gas emissions in the production, commercialization, and use of biofuels certified by the Brazilian National Biofuel Policy (RenovaBio). The parameters of the level of energy-environmental efficiency were standardized and categorized according to the Energy-Environmental Efficiency Rating (E-EER). The rating scale varied between lower efficiency (D) and high efficiency + (highest efficiency A+). The classification method with the J48 decision tree and naive Bayes algorithms was used to predict the models. The classification of the E-EER scores using a decision tree using the J48 algorithm and Bayesian classifiers using the naive Bayes algorithm produced decision tree models efficient at estimating the efficiency level of Brazilian ethanol producers and importers certified by the RenovaBio. The rules generated by the models can assess the level classes (efficiency scores) according to the scale discretized into high efficiency (Classification A), average efficiency (Classification B), and standard efficiency (Classification C). These results might generate an ethanol energy-environmental efficiency label for the end consumers and resellers of the product, to assist in making a purchase decision concerning its performance. The best classification model was naive Bayes, compared to the J48 decision tree. The classification of the Energy Efficiency Note levels using the naive Bayes algorithm produced a model capable of estimating the efficiency level of Brazilian ethanol to create labels.


Nowadays, In Bangladesh, the dropout rate at post-graduation level or incompletion of the post-graduation degree is considered as a serious problem in the education sector. This work can be used to support for identifying the specific individuals as well as the institutional factors which may next lead to the enrollment or drop out at the post-graduation degree. The real dataset is used to accomplish this work. Here, seven classification algorithms namely Naïve Bayes, Multilayer Perceptron, Logistic, Locally Weighted Learning (LWL), Random Forest, Random Tree, and Part are applied in this context. A confusion matrix is calculated for each classification model. Then, we computed all the seven performance evaluation metrics (accuracy, sensitivity, precision, specificity, F1 score, FPR, and FNR). Each classifier's performances are analyzed and measured from the computed performance evaluation metrics. Naïve Bayes, LWL, and Part classifier perform better than all other working classifiers attaining 86.36% accuracy and on the contrary, Random Tree classifier performs worst achieving 74.24% accuracy. After further analyzing of the result based on performance evaluation metrics, it is observed that LWL classifier performed best in this context among all the classifiers.


2020 ◽  
Vol 12 (3) ◽  
Author(s):  
Nanda Yonda Hutama ◽  
Kemas Muslim Lhaksmana ◽  
Isman Kurniawan

Employees' qualities affect companies' performances and with a large number of applicants, it's difficult to find suitable applicants. To help with it, companies carry out psychological tests to know applicants' personalities, since personality's considered to have a relationship with work performances. But psychological testing requires a lot of effort, cost, and human resources. Thus with a system that can classify personalities through text can help reduce the effort needed. Similar studies carried out with the big five personalities as the theoretical basis and used one of the personality traits, namely using the k-NN method with 65% accuracy. Based on these studies, accuracy can improve by finding the best parameters using all of the big five personalities. This research is conducted based on the big five personality traits and related traits, namely consciousness and agreeableness. The data used is text data that's been labelled, pre-processed and feature selected. The clean text data is used to create a classification model using multinomial Naive Bayes and decision trees. There are 6 models built based on 3 work cultures, decision tree with an accuracy of 33%, 66%, 80%, and multinomial naïve Bayes with an accuracy of 83%, 50%, 60%, which resulted as better performance.


Author(s):  
Oman Somantri ◽  
Dyah Apriliani

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>


Author(s):  
Alaa Khudhair Abbas ◽  
Ali Khalil Salih ◽  
Harith A. Hussein ◽  
Qasim Mohammed Hussein ◽  
Saba Alaa Abdulwahhab

Twitter social media data generally uses ambiguous text that can cause difficulty in identifying positive or negative sentiments. There are more than one billion social media messages that need to be stored in a proper database and processed correctly to analyze them. In this paper, an ensemble majority vote classifier to enhance sentiment classification performance and accuracy is proposed. The proposed classification model is combined with four classifiers, using varying techniques—naive Bayes, decision trees, multilayer perceptron and logistic regression—to form a single ensemble classifier. In addition to these, a comparison is drawn among the four classifiers to evaluate the performance of the individual classifiers. The result shows that in terms of an individual classifier, the naive Bayes classifier is optimal as compared to the others. However, for comparing the proposed ensemble majority vote classifier with the four individual classifiers, the result illustrates that the performance of the proposed classifier is better than the independent one.


Sign in / Sign up

Export Citation Format

Share Document