Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data

High-throughput analysis of biomass is necessary to ensure consistent and uniform feedstocks for agricultural and bioenergy applications and is needed to inform genomics and systems biology models. Pyrolysis followed by mass spectrometry such as molecular beam mass spectrometry (py-MBMS) analyses are becoming increasingly popular for the rapid analysis of biomass cell wall composition and typically require the use of different data analysis tools depending on the need and application. Here, the authors report the py-MBMS analysis of several types of lignocellulosic biomass to gain an understanding of spectral patterns and variation with associated biomass composition and use machine learning approaches to classify, differentiate, and predict biomass types on the basis of py-MBMS spectra. Py-MBMS spectra were also corrected for instrumental variance using generalized linear modeling (GLM) based on the use of select ions relative abundances as spike-in controls. Machine learning classification algorithms e.g., random forest, k-nearest neighbor, decision tree, Gaussian Naïve Bayes, gradient boosting, and multilayer perceptron classifiers were used. The k-nearest neighbors (k-NN) classifier generally performed the best for classifications using raw spectral data, and the decision tree classifier performed the worst. After normalization of spectra to account for instrumental variance, all the classifiers had comparable and generally acceptable performance for predicting the biomass types, although the k-NN and decision tree classifiers were not as accurate for prediction of specific sample types. Gaussian Naïve Bayes (GNB) and extreme gradient boosting (XGB) classifiers performed better than the k-NN and the decision tree classifiers for the prediction of biomass mixtures. The data analysis workflow reported here could be applied and extended for comparison of biomass samples of varying types, species, phenotypes, and/or genotypes or subjected to different treatments, environments, etc. to further elucidate the sources of spectral variance, patterns, and to infer compositional information based on spectral analysis, particularly for analysis of data without a priori knowledge of the feedstock composition or identity.

Download Full-text

Penerapan Klasifikasi Kueri untuk Meningkatkan Efektivitas Mesin Pencari

Seminar Nasional Official Statistics ◽

10.34123/semnasoffstat.v2021i1.914 ◽

2021 ◽

Vol 2021 (1) ◽

pp. 1012-1018

Author(s):

Handy Geraldy ◽

Lutfi Rahmatuti Maghfiroh

Keyword(s):

Machine Learning ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Gradient Boosting

Dalam menjalankan peran sebagai penyedia data, Badan Pusat Statistik (BPS) memberikan layanan akses data BPS bagi masyarakat. Salah satu layanan tersebut adalah fitur pencarian di website BPS. Namun, layanan pencarian yang diberikan belum memenuhi harapan konsumen. Untuk memenuhi harapan konsumen, salah satu upaya yang dapat dilakukan adalah meningkatkan efektivitas pencarian agar lebih relevan dengan maksud pengguna. Oleh karena itu, penelitian ini bertujuan untuk membangun fungsi klasifikasi kueri pada mesin pencari dan menguji apakah fungsi tersebut dapat meningkatkan efektivitas pencarian. Fungsi klasifikasi kueri dibangun menggunakan model machine learning. Kami membandingkan lima algoritma yaitu SVM, Random Forest, Gradient Boosting, KNN, dan Naive Bayes. Dari lima algoritma tersebut, model terbaik diperoleh pada algoritma SVM. Kemudian, fungsi tersebut diimplementasikan pada mesin pencari yang diukur efektivitasnya berdasarkan nilai precision dan recall. Hasilnya, fungsi klasifikasi kueri dapat mempersempit hasil pencarian pada kueri tertentu, sehingga meningkatkan nilai precision. Namun, fungsi klasifikasi kueri tidak memengaruhi nilai recall.

Download Full-text

Successful Case Study of Machine Learning Application to Streamline and Improve History Matching Process for Complex Gas-Condensate Reservoirs in Hai Thach Field, Offshore Vietnam

10.2118/204835-ms ◽

2021 ◽

Author(s):

Son Hoang ◽

Tung Tran ◽

Tan Nguyen ◽

Tu Truong ◽

Duy Pham ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

History Matching ◽

Dynamic Models ◽

Naive Bayes ◽

Naïve Bayes ◽

Gas Condensate ◽

Decision Tree Classifier ◽

Matching Process ◽

Tree Classifier

Abstract This paper reports a successful case study of applying machine learning to improve the history matching process, making it easier, less time-consuming, and more accurate, by determining whether Local Grid Refinement (LGR) with transmissibility multiplier is needed to history match gas-condensate wells producing from geologically complex reservoirs as well as determining the required LGR setup to history match those gas-condensate producers. History matching Hai Thach gas-condensate production wells is extremely challenging due to the combined effect of condensate banking, sub-seismic fault network, complex reservoir distribution and connectivity, uncertain HIIP, and lack of PVT data for most reservoirs. In fact, for some wells, many trial simulation runs were conducted before it became clear that LGR with transmissibility multiplier was required to obtain good history matching. In order to minimize this time-consuming trial-and-error process, machine learning was applied in this study to analyze production data using synthetic samples generated by a very large number of compositional sector models so that the need for LGR could be identified before the history matching process begins. Furthermore, machine learning application could also determine the required LGR setup. The method helped provide better models in a much shorter time, and greatly improved the efficiency and reliability of the dynamic modeling process. More than 500 synthetic samples were generated using compositional sector models and divided into separate training and test sets. Multiple classification algorithms such as logistic regression, Gaussian Naive Bayes, Bernoulli Naive Bayes, multinomial Naive Bayes, linear discriminant analysis, support vector machine, K-nearest neighbors, and Decision Tree as well as artificial neural networks were applied to predict whether LGR was used in the sector models. The best algorithm was found to be the Decision Tree classifier, with 100% accuracy on the training set and 99% accuracy on the test set. The LGR setup (size of LGR area and range of transmissibility multiplier) was also predicted best by the Decision Tree classifier with 91% accuracy on the training set and 88% accuracy on the test set. The machine learning model was validated using actual production data and the dynamic models of history-matched wells. Finally, using the machine learning prediction on wells with poor history matching results, their dynamic models were updated and significantly improved.

Download Full-text

Future Prediction of Diabetics using XG Booster Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5144.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2128-2132

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

The Body ◽

Machine Learning Algorithms ◽

Support Vector ◽

Common Disease ◽

Data Set ◽

Glucose Content

Diabetes is a most common disease that occurs to most of the humans now a day. The predictions for this disease are proposed through machine learning techniques. Through this method the risk factors of this disease are identified and can be prevented from increasing. Early prediction in such disease can be controlled and save human’s life. For the early predictions of this disease we collect data set having 8 attributes diabetic of 200 patients. The patients’ sugar level in the body is tested by the features of patient’s glucose content in the body and according to the age. The main Machine learning algorithms are Support vector machine (SVM), naive bayes (NB), K nearest neighbor (KNN) and Decision Tree (DT). In the exiting the Naive Bayes the accuracy levels are 66% but in the Decision tree the accuracy levels are 70 to 71%. The accuracy levels of the patients are not proper in range. But in XG boost classifiers even after the Naïve Bayes 74 Percentage and in Decision tree the accuracy levels are 89 to 90%. In the proposed system the accuracy ranges are shown properly and this is only used mostly. A dataset of 729 patients can be stored in Mongo DB and in that 129 patients repots are taken for the prediction purpose and the remaining are used for training. The training datasets are used for the prediction purposes.

Download Full-text

Finding Donors for CharityML using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36048 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 4988-4996

Author(s):

P. Chandra Sandeep

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Naive Bayes ◽

Scaling Up ◽

Naïve Bayes ◽

Initial Values ◽

Type Form

CharityML is a fictional non-earnings company created for the only motive of the usage of for this project. Many non-earnings groups try at the donations they get hold of and specifically they need to be very choosy in whom to reach for the donations. In our project, we used numerous supervised algorithms of our concern to as it should be model the individuals' profits with the usage of records accumulated from the 1994 U.S. Census. You will then select the first-rate set of rules from the initial values and then by using the initial values optimize this set of rules for better prediction. Your purpose with this implementation is to assemble a version that asit should be predicts whether or not a man or woman makes extra than 50,000 dollars. This type form undertakings are going to help in a non-earnings company setup, wherein groups live on donations. Understanding a character's profits can assist non-earnings company higher apprehend how huge of a grant to request, or whether or not no longer they need to attain out to start with. While it is able to be hard to decide a character's standard profits bracket form the known sources, we will infer this price from different publicly to be had features. The dataset for this assignment originates from the UCI Machine Learning Repository. The dataset become donated with the aid of using Ron Kohavi and Barry Becker, after being posted withinside the article "Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid". The records we inspect right here includes few modifications to the raw dataset, which include disposing of the 'hgtre' attribute and information with lacking or ill-formatted fields.

Download Full-text

Prediction on Cardiovascular disease using Decision tree and Naïve Bayes classifiers

Journal of Physics Conference Series ◽

10.1088/1742-6596/2161/1/012015 ◽

2022 ◽

Vol 2161 (1) ◽

pp. 012015

Author(s):

V Sai Krishna Reddy ◽

P Meghana ◽

N V Subba Reddy ◽

B Ashwath Rao

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Risk Factors ◽

Heart Failure ◽

Cardiovascular Disease ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Correct Decision ◽

Medical Field

Abstract Machine Learning is an application of Artificial Intelligence where the method begins with observations on data. In the medical field, it is very important to make a correct decision within less time while treating a patient. Here ML techniques play a major role in predicting the disease by considering the vast amount of data that is produced by the healthcare field. In India, heart disease is the major cause of death. According to WHO, it can predict and prevent stroke by timely actions. In this paper, the study is useful to predict cardiovascular disease with better accuracy by applying ML techniques like Decision Tree and Naïve Bayes and also with the help of risk factors. The dataset that we considered is the Heart Failure Dataset which consists of 13 attributes. In the process of analyzing the performance of techniques, the collected data should be pre-processed. Later, it should follow by feature selection and reduction.

Download Full-text

Count Vectorized Spam and Ham Discernment of Short Message Service using Machine Learning Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7287.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 557-561

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Gradient Boosting ◽

Svm Classifier ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Machine Learning Classification ◽

Tree Classifier

With the growing volume and the amount of spam message, the demand for identifying the effective method for spam detection is in claim. The growth of mobile phone and Smartphone has led to the drastic increase in the SMS spam messages. The advancement and the clean process of mobile message servicing channel have attracted the hackers to perform their hacking through SMS messages. This leads to the fraud usage of other accounts and transaction that result in the loss of service and profit to the owners. With this background, this paper focuses on predicting the Spam SMS messages. The SMS Spam Message Detection dataset from KAGGLE machine learning Repository is used for prediction analysis. The analysis of Spam message detection is achieved in four ways. Firstly, the distribution of the target variable Spam Type the dataset is identified and represented by the graphical notations. Secondly, the top word features for the Spam and Ham messages in the SMS messages is extracted using Count Vectorizer and it is displayed using spam and Ham word cloud. Thirdly, the extracted Counter vectorized feature importance SMS Spam Message detection dataset is fitted to various classifiers like KNN classifier, Random Forest classifier, Linear SVM classifier, Ada Boost classifier, Kernel SVM classifier, Logistic Regression classifier, Gaussian Naive Bayes classifier, Decision Tree classifier, Extra Tree classifier, Gradient Boosting classifier and Multinomial Naive Bayes classifier. Performance analysis is done by analyzing the performance metrics like Accuracy, FScore, Precision and Recall. The implementation is done by python in Anaconda Spyder Navigator. Experimental Results shows that the Multinomial Naive Bayes classifier have achieved the effective prediction with the precision of 0.98, recall of 0.98, FScore of 0.98 , and Accuracy of 98.20%..

Download Full-text

Ammonoid Taxonomy with Supervised and Unsupervised Machine Learning Algorithms

10.31233/osf.io/ewkx9 ◽

2021 ◽

Author(s):

Floe Foxon

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Learning Algorithms ◽

Clustering Algorithms ◽

Measurement Data ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Unsupervised Machine Learning

Ammonoid identification is crucial to biostratigraphy, systematic palaeontology, and evolutionary biology, but may prove difficult when shell features and sutures are poorly preserved. This necessitates novel approaches to ammonoid taxonomy. This study aimed to taxonomize ammonoids by their conch geometry using supervised and unsupervised machine learning algorithms. Ammonoid measurement data (conch diameter, whorl height, whorl width, and umbilical width) were taken from the Paleobiology Database (PBDB). 11 species with ≥50 specimens each were identified providing N=781 total unique specimens. Naive Bayes, Decision Tree, Random Forest, Gradient Boosting, K-Nearest Neighbours, and Support Vector Machine classifiers were applied to the PBDB data with a 5x5 nested cross-validation approach to obtain unbiased generalization performance estimates across a grid search of algorithm parameters. All supervised classifiers achieved ≥70% accuracy in identifying ammonoid species, with Naive Bayes demonstrating the least over-fitting. The unsupervised clustering algorithms K-Means, DBSCAN, OPTICS, Mean Shift, and Affinity Propagation achieved Normalized Mutual Information scores of ≥0.6, with the centroid-based methods having most success. This presents a reasonably-accurate proof-of-concept approach to ammonoid classification which may assist identification in cases where more traditional methods are not feasible.

Download Full-text

A Machine Learning Approach for Improving the Movement of Humanoid NAO’s Gaits

Wireless Communications and Mobile Computing ◽

10.1155/2021/1496364 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Fatmah Abdulrahman Baothman

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Average Velocity ◽

Walking Speed ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Models ◽

Ann Model ◽

The Real ◽

Optimal Average

A humanoid robot’s development requires an incredible combination of interdisciplinary work from engineering to mathematics, software, and machine learning. NAO is a humanoid bipedal robot designed to participate in football competitions against humans by 2050, and speed is crucial for football sports. Therefore, the focus of the paper is on improving NAO speed. This paper is aimed at testing the hypothesis of whether the humanoid NAO walking speed can be improved without changing its physical configuration. The applied research method compares three classification techniques: artificial neural network (ANN), Naïve Bayes, and decision tree to measure and predict NAO’s best walking speed, then select the best method, and enhance it to find the optimal average velocity speed. According to Aldebaran documentation, the real NAO’s robot default walking speed is 9.52 cm/s. The proposed work was initiated by studying NAO hardware platform limitations and selecting Nao’s gait 12 parameters to measure the accuracy metrics implemented in the three classification models design. Five experiments were designed to model and trace the changes for the 12 parameters. The preliminary NAO’s walking datasets open-source available at GitHub, the NAL, and RoboCup datasheets are implemented. All generated gaits’ parameters for both legs and feet in the experiments were recorded using the Choregraphe software. This dataset was divided into 30% for training and 70% for testing each model. The recorded gaits’ parameters were then fed to the three classification models to measure and predict NAO’s walking best speed. After 500 training cycles for the Naïve Bayes, the decision tree, and ANN, the RapidMiner scored 48.20%, 49.87%, and 55.12%, walking metric speed rate, respectively. Next, the emphasis was on enhancing the ANN model to reach the optimal average velocity walking speed for the real NAO. With 12 attributes, the maximum accuracy metric rate of 65.31% was reached with only four hidden layers in 500 training cycles with a 0.5 learning rate for the best walking learning process, and the ANN model predicted the optimal average velocity speed of 51.08% without stiffness: V 1 = 22.62 cm / s , V 2 = 40 cm / s , and V = 30 cm / s . Thus, the tested hypothesis holds with the ANN model scoring the highest accuracy rate for predicting NAO’s robot walking state speed by taking both legs to gauge joint 12 parameter values.

Download Full-text

Data Driven Approach for Eye Disease Classification with Machine Learning

Applied Sciences ◽

10.3390/app9142789 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2789 ◽

Cited By ~ 3

Author(s):

Sadaf Malik ◽

Nadia Kanwal ◽

Mamoona Naveed Asghar ◽

Mohammad Ali A. Sadiq ◽

Irfan Karamat ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Multiple Features ◽

Standard Format ◽

Free Data

Medical health systems have been concentrating on artificial intelligence techniques for speedy diagnosis. However, the recording of health data in a standard form still requires attention so that machine learning can be more accurate and reliable by considering multiple features. The aim of this study is to develop a general framework for recording diagnostic data in an international standard format to facilitate prediction of disease diagnosis based on symptoms using machine learning algorithms. Efforts were made to ensure error-free data entry by developing a user-friendly interface. Furthermore, multiple machine learning algorithms including Decision Tree, Random Forest, Naive Bayes and Neural Network algorithms were used to analyze patient data based on multiple features, including age, illness history and clinical observations. This data was formatted according to structured hierarchies designed by medical experts, whereas diagnosis was made as per the ICD-10 coding developed by the American Academy of Ophthalmology. Furthermore, the system is designed to evolve through self-learning by adding new classifications for both diagnosis and symptoms. The classification results from tree-based methods demonstrated that the proposed framework performs satisfactorily, given a sufficient amount of data. Owing to a structured data arrangement, the random forest and decision tree algorithms’ prediction rate is more than 90% as compared to more complex methods such as neural networks and the naïve Bayes algorithm.

Download Full-text