Efficient Approach of Automatic Speech Emotion Recognition (ASR) Using Mutual Information

2021 ◽  
Vol 9 (1) ◽  
pp. 595-603
Author(s):  
Shivangi Srivastav, Rajiv Ranjan Tewari

Speech is a significant quality for distinguishing a person in daily human to human interaction/ communication. Like other biometric measures, such as face, iris and fingerprints, voice can therefore be used as a biometric measure for perceiving or identifying the person. Speaker recognition is almost the same as a kind of voice recognition in which the speaker is identified from the expression instead of the message. Automatic Speaker Recognition (ASR) is the way to identify people who rely on highlights that are omitted from speech expressions. Speech signals are awesome correspondence media that constantly pass on rich and useful knowledge, such as a speaker's feeling, sexual orientation, complement, and other interesting attributes. In any speaker identification, the essential task is to delete helpful highlights and allow for significant examples of speaker models. Hypothetical description, organization of the full state of feeling and the modalities of articulation of feeling are added. A SER framework is developed to conduct this investigation, in view of different classifiers and different techniques for extracting highlights. In this work various machine learning algorithms are investigated to identify decision boundary in feature space of audio signals. Moreover novelty of this art lies in improving the performance of classical machine learning algorithms using information theory based feature selection methods. The higher accuracy retrieved is 96 percent using Random forest algorithm incorporated with Joint Mutual information feature selection method.

2021 ◽  
Author(s):  
Ravi Arkalgud ◽  
◽  
Andrew McDonald ◽  
Ross Brackenridge ◽  
◽  
...  

Automation is becoming an integral part of our daily lives as technology and techniques rapidly develop. Many automation workflows are now routinely being applied within the geoscience domain. The basic structure of automation and its success of modelling fundamentally hinges on the appropriate choice of parameters and speed of processing. The entire process demands that the data being fed into any machine learning model is essentially of good quality. The technological advances in well logging technology over decades have enabled the collection of vast amounts of data across wells and fields. This poses a major issue in automating petrophysical workflows. It necessitates to ensure that, the data being fed is appropriate and fit for purpose. The selection of features (logging curves) and parameters for machine learning algorithms has therefore become a topic at the forefront of related research. Inappropriate feature selections can lead erroneous results, reduced precision and have proved to be computationally expensive. Experienced Eye (EE) is a novel methodology, derived from Domain Transfer Analysis (DTA), which seeks to identify and elicit the optimum input curves for modelling. During the EE solution process, relationships between the input variables and target variables are developed, based on characteristics and attributes of the inputs instead of statistical averages. The relationships so developed between variables can then be ranked appropriately and selected for modelling process. This paper focuses on three distinct petrophysical data scenarios where inputs are ranked prior to modelling: prediction of continuous permeability from discrete core measurements, porosity from multiple logging measurements and finally the prediction of key geomechanical properties. Each input curve is ranked against a target feature. For each case study, the best ranked features were carried forward to the modelling stage, and the results are validated alongside conventional interpretation methods. Ranked features were also compared between different machine learning algorithms: DTA, Neural Networks and Multiple Linear Regression. Results are compared with the available data for various case studies. The use of the new feature selection has been proven to improve accuracy and precision of prediction results from multiple modelling algorithms.


Author(s):  
Durmuş Özkan Şahin ◽  
Erdal Kılıç

In this study, the authors give both theoretical and experimental information about text mining, which is one of the natural language processing topics. Three different text mining problems such as news classification, sentiment analysis, and author recognition are discussed for Turkish. They aim to reduce the running time and increase the performance of machine learning algorithms. Four different machine learning algorithms and two different feature selection metrics are used to solve these text classification problems. Classification algorithms are random forest (RF), logistic regression (LR), naive bayes (NB), and sequential minimal optimization (SMO). Chi-square and information gain metrics are used as the feature selection method. The highest classification performance achieved in this study is 0.895 according to the F-measure metric. This result is obtained by using the SMO classifier and information gain metric for news classification. This study is important in terms of comparing the performances of classification algorithms and feature selection methods.


2019 ◽  
Vol 20 (9) ◽  
pp. 2185 ◽  
Author(s):  
Xiaoyong Pan ◽  
Lei Chen ◽  
Kai-Yan Feng ◽  
Xiao-Hua Hu ◽  
Yu-Hang Zhang ◽  
...  

Small nucleolar RNAs (snoRNAs) are a new type of functional small RNAs involved in the chemical modifications of rRNAs, tRNAs, and small nuclear RNAs. It is reported that they play important roles in tumorigenesis via various regulatory modes. snoRNAs can both participate in the regulation of methylation and pseudouridylation and regulate the expression pattern of their host genes. This research investigated the expression pattern of snoRNAs in eight major cancer types in TCGA via several machine learning algorithms. The expression levels of snoRNAs were first analyzed by a powerful feature selection method, Monte Carlo feature selection (MCFS). A feature list and some informative features were accessed. Then, the incremental feature selection (IFS) was applied to the feature list to extract optimal features/snoRNAs, which can make the support vector machine (SVM) yield best performance. The discriminative snoRNAs included HBII-52-14, HBII-336, SNORD123, HBII-85-29, HBII-420, U3, HBI-43, SNORD116, SNORA73B, SCARNA4, HBII-85-20, etc., on which the SVM can provide a Matthew’s correlation coefficient (MCC) of 0.881 for predicting these eight cancer types. On the other hand, the informative features were fed into the Johnson reducer and repeated incremental pruning to produce error reduction (RIPPER) algorithms to generate classification rules, which can clearly show different snoRNAs expression patterns in different cancer types. The analysis results indicated that extracted discriminative snoRNAs can be important for identifying cancer samples in different types and the expression pattern of snoRNAs in different cancer types can be partly uncovered by quantitative recognition rules.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


2020 ◽  
Vol 9 (9) ◽  
pp. 507
Author(s):  
Sanjiwana Arjasakusuma ◽  
Sandiaga Swahyu Kusuma ◽  
Stuart Phinn

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.


Sign in / Sign up

Export Citation Format

Share Document