Classification of Musical Timbre Using Bayesian Networks

In this article, we explore the use of Bayesian networks for identifying the timbre of musical instruments. Peak spectral amplitude in ten frequency windows is extracted for each of 20 time windows to be used as features. Over a large data set of 24,000 audio examples covering the full musical range of 24 different common orchestral instruments, four different Bayesian network structures, including naive Bayes, are examined and compared with two support vector machines and a k-nearest neighbor classifier. Classification accuracy is examined by instrument, instrument family, and data set size. Bayesian networks with conditional dependencies in the time and frequency dimensions achieved 98 percent accuracy in the instrument classification task and 97 percent accuracy in the instrument family identification task. These results demonstrate a significant improvement over the previous approaches in the literature on this data set. Additionally, we tested our Bayesian approach on the widely used Iowa musical instrument data set, with similar results.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Evaluating Annotated Dataset of Customer Reviews for Aspect Based Sentiment Analysis

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2122 ◽

2021 ◽

Author(s):

Dimple Chehal ◽

Parul Gupta ◽

Payal Gulati

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Supervised Machine Learning ◽

Support Vector ◽

Product Reviews ◽

K Nearest Neighbor ◽

Customer Reviews ◽

Percent Accuracy

Sentiment analysis of product reviews on e-commerce platforms aids in determining the preferences of customers. Aspect-based sentiment analysis (ABSA) assists in identifying the contributing aspects and their corresponding polarity, thereby allowing for a more detailed analysis of the customer’s inclination toward product aspects. This analysis helps in the transition from the traditional rating-based recommendation process to an improved aspect-based process. To automate ABSA, a labelled dataset is required to train a supervised machine learning model. As the availability of such dataset is limited due to the involvement of human efforts, an annotated dataset has been provided here for performing ABSA on customer reviews of mobile phones. The dataset comprising of product reviews of Apple-iPhone11 has been manually annotated with predefined aspect categories and aspect sentiments. The dataset’s accuracy has been validated using state-of-the-art machine learning techniques such as Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, K-Nearest Neighbor and Multi Layer Perceptron, a sequential model built with Keras API. The MLP model built through Keras Sequential API for classifying review text into aspect categories produced the most accurate result with 67.45 percent accuracy. K- nearest neighbor performed the worst with only 49.92 percent accuracy. The Support Vector Machine had the highest accuracy for classifying review text into aspect sentiments with an accuracy of 79.46 percent. The model built with Keras API had the lowest 76.30 percent accuracy. The contribution is beneficial as a benchmark dataset for ABSA of mobile phone reviews.

Download Full-text

Identification of Leukemia Subtypes from Microscopic Images Using Convolutional Neural Network

Diagnostics ◽

10.3390/diagnostics9030104 ◽

2019 ◽

Vol 9 (3) ◽

pp. 104 ◽

Cited By ~ 11

Author(s):

Ahmed ◽

Yigit ◽

Isik ◽

Alpkocak

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Leukemia Data

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.

Download Full-text

Feature Selection Algorithm for Hyperlipidemia Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.110 ◽

2014 ◽

Vol 701-702 ◽

pp. 110-113

Author(s):

Qi Rui Zhang ◽

He Xian Wang ◽

Jiang Wei Qin

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Classification Systems ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Document Frequency ◽

Selection Algorithms ◽

Term Weights

This paper reports a comparative study of feature selection algorithms on a hyperlipimedia data set. Three methods of feature selection were evaluated, including document frequency (DF), information gain (IG) and aχ2 statistic (CHI). The classification systems use a vector to represent a document and use tfidfie (term frequency, inverted document frequency, and inverted entropy) to compute term weights. In order to compare the effectives of feature selection, we used three classification methods: Naïve Bayes (NB), k Nearest Neighbor (kNN) and Support Vector Machines (SVM). The experimental results show that IG and CHI outperform significantly DF, and SVM and NB is more effective than KNN when macro-averagingF1 measure is used. DF is suitable for the task of large text classification.

Download Full-text

Comparative analysis on bayesian classification for breast cancer problem

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v8i4.1628 ◽

2019 ◽

Vol 8 (4) ◽

Author(s):

Wan Nor Liyana Wan Hassan Ibeni ◽

Mohd Zaki Mohd Salikon ◽

Aida Mustapha ◽

Saiful Adli Daud ◽

Mohd Najib Mohd Salleh

Keyword(s):

Breast Cancer ◽

Bayesian Networks ◽

Nearest Neighbor ◽

Naive Bayes ◽

Likelihood Estimation ◽

Predictive Distribution ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor

The problem of imbalanced class distribution or small datasets is quite frequent in certain fields especially in medical domain. However, the classical Naive Bayes approach in dealing with uncertainties within medical datasets face with the difficulties in selecting prior distributions, whereby parameter estimation such as the maximum likelihood estimation (MLE) and maximum a posteriori (MAP) often hurt the accuracy of predictions. This paper presents the full Bayesian approach to assess the predictive distribution of all classes using three classifiers; naïve bayes (NB), bayesian networks (BN), and tree augmented naïve bayes (TAN) with three datasets; Breast cancer, breast cancer wisconsin, and breast tissue dataset. Next, the prediction accuracies of bayesian approaches are also compared with three standard machine learning algorithms from the literature; K-nearest neighbor (K-NN), support vector machine (SVM), and decision tree (DT). The results showed that the best performance was the bayesian networks (BN) algorithm with accuracy of 97.281%. The results are hoped to provide as base comparison for further research on breast cancer detection. All experiments are conducted in WEKA data mining tool.

Download Full-text

What factors determine reviewer credibility?

Kybernetes ◽

10.1108/k-08-2019-0537 ◽

2019 ◽

Vol 49 (10) ◽

pp. 2547-2567 ◽

Cited By ~ 1

Author(s):

Himanshu Sharma ◽

Anu G. Aggarwal

Keyword(s):

Nearest Neighbor ◽

Source Credibility ◽

Personal Information ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Content Type ◽

Linear Discriminant ◽

Travel And Tourism ◽

Hotel Booking

Purpose The experiential nature of travel and tourism services has popularized the importance of electronic word-of-mouth (EWOM) among potential customers. EWOM has a significant influence on hotel booking intention of customers as they tend to trust EWOM more than the messages spread by marketers. Amid abundant reviews available online, it becomes difficult for travelers to identify the most significant ones. This questions the credibility of reviewers as various online businesses allow reviewers to post their feedback using nickname or email address rather than using real name, photo or other personal information. Therefore, this study aims to determine the factors leading to reviewer credibility. Design/methodology/approach The paper proposes an econometric model to determine the variables that affect the reviewer’s credibility in the hospitality and tourism sector. The proposed model uses quantifiable variables of reviewers and reviews to estimate reviewer credibility, defined in terms of proportion of number of helpful votes received by a reviewer to the number of total reviews written by him. This covers both aspects of source credibility i.e. trustworthiness and expertness. The authors have used the data set of TripAdvisor.com to validate the models. Findings Regression analysis significantly validated the econometric models proposed here. To check the predictive efficiency of the models, predictive modeling using five commonly used classifiers such as random forest (RF), linear discriminant analysis, k-nearest neighbor, decision tree and support vector machine is performed. RF gave the best accuracy for the overall model. Practical implications The findings of this research paper suggest various implications for hoteliers and managers to help retain credible reviewers in the online travel community. This will help them to achieve long term relationships with the clients and increase their trust in the brand. Originality/value To the best of authors’ knowledge, this study performs an econometric modeling approach to find determinants of reviewer credibility, not conducted in previous studies. Moreover, the study contracts from earlier works by considering it to be an endogenous variable, rather than an exogenous one.

Download Full-text

Variable Interaction Networks in Medical Data

International Journal of Privacy and Health Information Management ◽

10.4018/ijphim.2013070101 ◽

2013 ◽

Vol 1 (2) ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Stephan M. Winkler ◽

Gabriel Kronberger ◽

Michael Affenzeller ◽

Herbert Stekel

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Nearest Neighbor ◽

Influence Factors ◽

Medical Data ◽

Interaction Networks ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Variable Interaction

In this paper the authors describe the identification of variable interaction networks based on the analysis of medical data. The main goal is to generate mathematical models for medical parameters using other available parameters in this data set. For each variable the authors identify those features that are most relevant for modeling it; the relevance of a variable can in this context be defined via the frequency of its occurrence in models identified by evolutionary machine learning methods or via the decrease in modeling quality after removing it from the data set. Several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected continuous as well as discrete medical variables and cancer diagnoses: Genetic programming, linear regression, k-nearest-neighbor regression, support vector machines (optimized using evolutionary algorithms), and random forests. In the empirical section of this paper the authors describe interaction networks identified for a medical data base storing data of more than 600 patients. The authors see that whatever modeling approach is used, it is possible to identify the most important influence factors and display those in interaction networks which can be interpreted without domain knowledge in machine learning or informatics in general.

Download Full-text

Identification of Cherry Leaf Disease Infected by Podosphaera Pannosa via Convolutional Neural Network

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2019040105 ◽

2019 ◽

Vol 10 (2) ◽

pp. 98-110 ◽

Cited By ~ 3

Author(s):

Keke Zhang ◽

Lei Zhang ◽

Qiufeng Wu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Nearest Neighbor ◽

Early Stage ◽

Back Propagation ◽

Support Vector ◽

Automatic Identification ◽

K Nearest Neighbor ◽

Data Set ◽

Leaf Disease

The cherry leaves infected by Podosphaera pannosa will suffer powdery mildew, which is a serious disease threatening the cherry production industry. In order to identify the diseased cherry leaves in early stage, the authors formulate the cherry leaf disease infected identification as a classification problem and propose a fully automatic identification method based on convolutional neural network (CNN). The GoogLeNet is used as backbone of the CNN. Then, transferred learning techniques are applied to fine-tune the CNN from pre-trained GoogLeNet on ImageNet dataset. This article compares the proposed method against three traditional machine learning methods i.e., support vector machine (SVM), k-nearest neighbor (KNN) and back propagation (BP) neural network. Quantitative evaluations conducted on a data set of 1,200 images collected by smart phones, demonstrates that the CNN achieves best precise performance in identifying diseased cherry leaves, with the testing accuracy of 99.6%. Thus, a CNN can be used effectively in identifying the diseased cherry leaves.

Download Full-text

Big Data-oriented Wheel Position and Geometry Calculation for Cutting Tool Groove Manufacturing based on AI Algorithms

10.21203/rs.3.rs-1029477/v1 ◽

2021 ◽

Author(s):

Li Guochao ◽

Zhigang Liu ◽

Jie Lu ◽

Honggen Zhou ◽

Li Sun

Keyword(s):

Big Data ◽

High Performance ◽

Nearest Neighbor ◽

Cutting Tools ◽

Support Vector ◽

Grinding Wheel ◽

K Nearest Neighbor ◽

Data Set ◽

Data Resource ◽

Grinding Machine

Abstract Groove is a key structure of high-performance integral cutting tools. It has to be manufactured by 5-axis grinding machine due to its complex spatial geometry and hard materials. The crucial manufacturing parameters (CMP) are grinding wheel positions and geometries. However, it is a challenging problem to solve the CMP for the designed groove. The traditional trial-and-error or analytical methods have defects such as time-consuming, limited-applying and low accuracy. In this study, the problem is translated into a multiple output regression model of groove manufacture (MORGM) based on the big data technology and AI algorithms. The input are 34 groove geometry features and the output are 5 CMP. Firstly, two groove machining big data sets with different range are established, each of which is includes 46656 records. They are used as data resource for MORGM. Secondly, 7 AI algorithms, including linear regression, k nearest-neighbor regression, decision trees, random forest regression, support vector regression and ANN algorithms are discussed to build the model. Then, 28 experiments are carried out to test the big data set and algorithms. Finally, the best MORGM is built by ANN algorithm and the big data set with a larger range. The results show that CMP can be calculated accurately and conveniently by the built MORGM.

Download Full-text

Machine Learning Application for Classification Prediction of Household’s Welfare Status

Journal on Information Technology and Computer Engineering ◽

10.25077/jitce.4.02.72-82.2020 ◽

2020 ◽

Vol 4 (02) ◽

pp. 72-82

Author(s):

Nofriani Nofriani

Keyword(s):

Machine Learning ◽

Random Forest ◽

Social Welfare ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

The Government ◽

Classification Prediction

Various approaches have been attempted by the Government of Indonesia to eradicate poverty throughout the country, one of which is equitable distribution of social assistance for target households according to their classification of social welfare status. This research aims to re-evaluate the prior evaluation of five well-known machine learning techniques; Naïve Bayes, Random Forest, Support Vector Machines, K-Nearest Neighbor, and C4.5 Algorithm; on how well they predict the classifications of social welfare statuses. Afterwards, the best-performing one is implemented into an executable machine learning application that may predict the user’s social welfare status. Other objectives are to analyze the reliability of the chosen algorithm in predicting new data set, and generate a simple classification-prediction application. This research uses Python Programming Language, Scikit-Learn Library, Jupyter Notebook, and PyInstaller to perform all the methodology processes. The results shows that Random Forest Algorithm is the best machine learning technique for predicting household’s social welfare status with classification accuracy of 74.20% and the resulted application based on it could correctly predict 60.00% of user’s social welfare status out of 40 entries.

Download Full-text