scholarly journals An Efficeint Feature Selection from Hetrogenous Data with Reduced Data Complexity

Highlight choice might be significant as data is made ceaselessly and at a consistently developing charge, it decreases the extreme dimensionality of certain issues. Highlight decision as a pre-preparing venture to gadget acing, is groundbreaking in bringing down repetition, getting rid of unessential records, developing picking up learning of exactness, and improving final product fathom ability. This work offers far reaching strategy to work decision inside the extent of classification issues, clarifying the principles, genuine application issues, etc inside the setting of over the top dimensional records. To begin with, we consideration on the possibility of trademark decision gives an examination on history and essential standards. We advocate quick sub sampling calculations to effectually rough the most extreme shot gauge in strategic relapse. We initially build up consistency and asymptotic ordinariness of the estimator from a well known sub sampling calculation, and afterward determine choicest sub sampling probabilities that limit the asymptotic suggest squared blunder of the subsequent estimator. An open door minimization standard is additionally proposed to additionally diminish the computational esteem. The best sub sampling chances rely on the all out data gauge, so we increment a - step set of guidelines to inexact the perfect sub sampling strategy. This arrangement of guidelines is computationally effective and has a gigantic markdown in figuring time contrasted with the entire insights technique. Consistency and asymptotic typicality of the estimator from a two-advance arrangement of principles are likewise mounted. Fake and real data units are utilized to assess the pragmatic generally execution of the proposed system.




Author(s):  
Sarfaraz Masood ◽  
Khwaja Wisal ◽  
Om Pal ◽  
Chanchal Kumar

Parkinson’s disease (PD) is a highly common neurological disease affecting a large population worldwide. Several studies revealed that the degradation of voice is one of its initial symptoms, which is also known as dysarthria. In this work, we attempt to explore and harness the correlation between various features in the voice samples observed in PD subjects. To do so, a novel two-level ensemble-based feature selection method has been proposed, whose results were combined with an MLP based classifier using K-fold cross-validation as the re-sampling strategy. Three separate benchmark datasets of voice samples were used for the experimentation work. Results strongly suggest that the proposed feature selection framework helps in identifying an optimal set of features which further helps in highly accurate identification of PD patients using a Multi-Layer Perceptron from their voice samples. The proposed model achieves an overall accuracy of 98.3%, 95.1% and 100% on the three selected datasets respectively. These results are significantly better than those achieved by a non-feature selection based option, and even the recently proposed chi-square based feature selection option.



2021 ◽  
Author(s):  
Mikhail Kanevski

<p>Nowadays a wide range of methods and tools to study and forecast time series is available. An important problem in forecasting concerns embedding of time series, i.e. construction of a high dimensional space where forecasting problem is considered as a regression task. There are several basic linear and nonlinear approaches of constructing such space by defining an optimal delay vector using different theoretical concepts. Another way is to consider this space as an input feature space – IFS, and to apply machine learning feature selection (FS) algorithms to optimize IFS according to the problem under study (analysis, modelling or forecasting). Such approach is an empirical one: it is based on data and depends on the FS algorithms applied. In machine learning features are generally classified as relevant, redundant and irrelevant. It gives a reach possibility to perform advanced multivariate time series exploration and development of interpretable predictive models.</p><p>Therefore, in the present research different FS algorithms are used to analyze fundamental properties of time series from empirical point of view. Linear and nonlinear simulated time series are studied in detail to understand the advantages and drawbacks of the proposed approach. Real data case studies deal with air pollution and wind speed times series. Preliminary results are quite promising and more research is in progress.</p>





Author(s):  
HUI ZHANG ◽  
Q. M. JONATHAN WU ◽  
THANH MINH NGUYEN

In this paper, we propose a novel algorithm for feature selection and model detection using Student's t-distribution based on the variational Bayesian (VB) approach. First, our method is based on the Student's t-mixture model (SMM) which has heavier tail than the Gaussian distribution and is therefore less sensitive to small numbers of data points and consequent precision-estimates of the components number. Second, the number of components, the local feature saliency and the parameters of the mixture model are simultaneously estimated by Bayesian variational learning. Experimental results using synthetic and real data demonstrate the improved robustness of our approach.





2020 ◽  
Vol 10 (6) ◽  
pp. 2141 ◽  
Author(s):  
Su Xie ◽  
Ke Li ◽  
Mingming Xiao ◽  
Le Zhang ◽  
Wanlin Li

In this paper, the prediction of over-the-top service quality is discussed, which is a promising way for mobile network engineers to tackle service deterioration as early as possible. Currently, traditional mobile network operation often takes appropriate remedial measures, when receiving customers’ complaints about service problems. With the popularity of over-the-top services, this problem has become increasingly serious. Based on the service perception data crowd-sensed from massive smartphones in the mobile network, we first investigated the application of multi-label ReliefF, a well-known method of feature selection, in determining the feature weights of the perception data and propose a unified multi-label ReliefF (UML-ReliefF) algorithm. Then a feature-weighted multi-label k-nearest neighbor (ML-kNN) algorithm is proposed for the key quality indicators (KQI) prediction, by combining the UML-ReliefF and ML-kNN together in the learning. The experimental results for web browsing service show that UML-ReliefF can effectively identify the most influential features of the data and thus, lead to better performance for KQI prediction. The experiments also show that the feature-weighted KQI prediction is superior to its unweighted counterpart, since the former takes full advantage of all the features in the learning. Although there is still much room of improvement in the precision of the prediction, the proposed method is highly potential for network engineers to find the deterioration of service quality promptly and take measures before it is too late.



Sign in / Sign up

Export Citation Format

Share Document