features selection
Recently Published Documents


TOTAL DOCUMENTS

615
(FIVE YEARS 289)

H-INDEX

22
(FIVE YEARS 9)

Author(s):  
Nsiri Benayad ◽  
Zayrit Soumaya ◽  
Belhoussine Drissi Taoufiq ◽  
Ammoumou Abdelkrim

<span lang="EN-US">Among the several ways followed for detecting Parkinson's disease, there is the one based on the speech signal, which is a symptom of this disease. In this paper focusing on the signal analysis, a data of voice records has been used. In these records, the patients were asked to utter vowels “a”, “o”, and “u”. Discrete wavelet transforms (DWT) applied to the speech signal to fetch the variable resolution that could hide the most important information about the patients. From the approximation a3 obtained by Daubechies wavelet at the scale 2 level 3, 21 features have been extracted: a <a name="_Hlk88480766"></a>linear predictive coding (LPC), energy, zero-crossing rate (ZCR), mel frequency cepstral coefficient (MFCC), and wavelet Shannon entropy. Then for the classification, the K-nearest neighbour (KNN) has been used. The KNN is a type of instance-based learning that can make a decision based on approximated local functions, besides the ensemble learning. However, through the learning process, the choice of the training features can have a significant impact on overall the process. So, here it stands out the role of the genetic algorithm (GA) to select the best training features that give the best accurate classification.</span>


Author(s):  
Suraya Masrom ◽  
◽  
Norhayati Baharun ◽  
Nor Faezah Mohamad Razi ◽  
Rahayu Abdul Rahman ◽  
...  

Particle Swarm Optimization is a metaheuristics algorithm widely used for optimization problems. This paper presents the research design and implementation of using Particle Swarm Optimization to automate the features selections in the machine learning models for Airbnb price prediction. Today, Airbnb is changing the business models of the hospitality industry globally. While a bigger impact has been given by the Airbnb community to the local economic development of each country, there has been very little effort that investigates on Airbnb pricing issue with machine learning techniques. Focusing on Airbnb Singapore, the main problem on the dataset is the low correlation of the independent variables to the hospitality price. Choosing the best combination of the independent variables is essential, which can be achieved through features selection optimization. Particle Swarm Optimization is useful to optimize the best variables combination for automating the features selection in machine learning models. By comparing the magnitude of change of the R squared values before and after the use of PSO feature selection, the result showed that the automated features selection has improved the results of all the machine learning algorithms mainly in the linear-based machine learning (Linear Regression, Lasso, Ridge). Keywords—Machine Learning, Automated Features Selection, Particle Swarm Optimization, Airbnb


Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 149
Author(s):  
Waqar Khan ◽  
Lingfu Kong ◽  
Brekhna Brekhna ◽  
Ling Wang ◽  
Huigui Yan

Streaming feature selection has always been an excellent method for selecting the relevant subset of features from high-dimensional data and overcoming learning complexity. However, little attention is paid to online feature selection through the Markov Blanket (MB). Several studies based on traditional MB learning presented low prediction accuracy and used fewer datasets as the number of conditional independence tests is high and consumes more time. This paper presents a novel algorithm called Online Feature Selection Via Markov Blanket (OFSVMB) based on a statistical conditional independence test offering high accuracy and less computation time. It reduces the number of conditional independence tests and incorporates the online relevance and redundant analysis to check the relevancy between the upcoming feature and target variable T, discard the redundant features from Parents-Child (PC) and Spouses (SP) online, and find PC and SP simultaneously. The performance OFSVMB is compared with traditional MB learning algorithms including IAMB, STMB, HITON-MB, BAMB, and EEMB, and Streaming feature selection algorithms including OSFS, Alpha-investing, and SAOLA on 9 benchmark Bayesian Network (BN) datasets and 14 real-world datasets. For the performance evaluation, F1, precision, and recall measures are used with a significant level of 0.01 and 0.05 on benchmark BN and real-world datasets, including 12 classifiers keeping a significant level of 0.01. On benchmark BN datasets with 500 and 5000 sample sizes, OFSVMB achieved significant accuracy than IAMB, STMB, HITON-MB, BAMB, and EEMB in terms of F1, precision, recall, and running faster. It finds more accurate MB regardless of the size of the features set. In contrast, OFSVMB offers substantial improvements based on mean prediction accuracy regarding 12 classifiers with small and large sample sizes on real-world datasets than OSFS, Alpha-investing, and SAOLA but slower than OSFS, Alpha-investing, and SAOLA because these algorithms only find the PC set but not SP. Furthermore, the sensitivity analysis shows that OFSVMB is more accurate in selecting the optimal features.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Belal Alsinglawi ◽  
Osama Alshari ◽  
Mohammed Alorjani ◽  
Omar Mubin ◽  
Fady Alnajjar ◽  
...  

AbstractThis work introduces a predictive Length of Stay (LOS) framework for lung cancer patients using machine learning (ML) models. The framework proposed to deal with imbalanced datasets for classification-based approaches using electronic healthcare records (EHR). We have utilized supervised ML methods to predict lung cancer inpatients LOS during ICU hospitalization using the MIMIC-III dataset. Random Forest (RF) Model outperformed other models and achieved predicted results during the three framework phases. With clinical significance features selection, over-sampling methods (SMOTE and ADASYN) achieved the highest AUC results (98% with CI 95%: 95.3–100%, and 100% respectively). The combination of Over-sampling and under-sampling achieved the second-highest AUC results (98%, with CI 95%: 95.3–100%, and 97%, CI 95%: 93.7–100% SMOTE-Tomek, and SMOTE-ENN respectively). Under-sampling methods reported the least important AUC results (50%, with CI 95%: 40.2–59.8%) for both (ENN and Tomek- Links). Using ML explainable technique called SHAP, we explained the outcome of the predictive model (RF) with SMOTE class balancing technique to understand the most significant clinical features that contributed to predicting lung cancer LOS with the RF model. Our promising framework allows us to employ ML techniques in-hospital clinical information systems to predict lung cancer admissions into ICU.


Author(s):  
Razana Alwee ◽  
Siti Mariyam Hj Shamsuddin ◽  
Roselina Sallehuddin

Features selection is very important in the multivariate models because the accuracy of forecasting results produced by the model are highly dependent on these selected features. The purpose of this study is to propose grey relational analysis and support vector regression for features selection. The features are economic indicators that are used to forecast property crime rate. Grey relational analysis selects the best data series to represent each economic indicator and rank the economic indicators according to its importance to the property crime rate. Next, the support vector regression is used to select the significant economic indicators where particle swarm optimization estimates the parameters of support vector regression. In this study, we use unemployment rate, consumer price index, gross domestic product and consumer sentiment index as the economic indicators, as well as property crime rate for the United States. From our experiments, we found that the gross domestic product, unemployment rate and consumer price index are the most influential economic indicators. The proposed method is also found to produce better forecasting accuracy as compared to multiple linear regressions.


Water ◽  
2022 ◽  
Vol 14 (2) ◽  
pp. 191
Author(s):  
Shen Chiang ◽  
Chih-Hsin Chang ◽  
Wei-Bo Chen

To better understand the effect and constraint of different data lengths on the data-driven model training for the rainfall-runoff simulation, the support vector regression (SVR) approach was applied to the data-driven model as the core algorithm in the present study. Various features selection strategies and different data lengths were employed in the training phase of the model. The validated results of the SVR were compared with the rainfall-runoff simulation derived from a physically based hydrologic model, the Hydrologic Modeling System (HEC-HMS). The HEC-HMS was considered a conventional approach and was also calibrated with a dataset period identical to the SVR. Our results showed that the SVR and HEC-HMS models could be adopted for short and long periods of rainfall-runoff simulation. However, the SVR model estimated the rainfall-runoff relationship reasonably well even if the observational data of one year or one typhoon event was used. In contrast, the HEC-HMS model needed more parameter optimization and inference processes to achieve the same performance level as the SVR model. Overall, the SVR model was superior to the HEC-HMS model in the performance of the rainfall-runoff simulation.


2022 ◽  
Vol 8 (1) ◽  
pp. 50
Author(s):  
Rifki Indra Perwira ◽  
Bambang Yuwono ◽  
Risya Ines Putri Siswoyo ◽  
Febri Liantoni ◽  
Hidayatulah Himawan

State universities have a library as a facility to support students’ education and science, which contains various books, journals, and final assignments. An intelligent system for classifying documents is needed to ease library visitors in higher education as a form of service to students. The documents that are in the library are generally the result of research. Various complaints related to the imbalance of data texts and categories based on irrelevant document titles and words that have the ambiguity of meaning when searching for documents are the main reasons for the need for a classification system. This research uses k-Nearest Neighbor (k-NN) to categorize documents based on study interests with information gain features selection to handle unbalanced data and cosine similarity to measure the distance between test and training data. Based on the results of tests conducted with 276 training data, the highest results using the information gain selection feature using 80% training data and 20% test data produce an accuracy of 87.5% with a parameter value of k=5. The highest accuracy results of 92.9% are achieved without information gain feature selection, with the proportion of training data of 90% and 10% test data and parameters k=5, 7, and 9. This paper concludes that without information gain feature selection, the system has better accuracy than using the feature selection because every word in the document title is considered to have an essential role in forming the classification.


2022 ◽  
Vol 14 (1) ◽  
pp. 524
Author(s):  
Rezzy Eko Caraka ◽  
Maengseok Noh ◽  
Youngjo Lee ◽  
Toni Toharudin ◽  
Yusra ◽  
...  

Background: In this paper, we examine how social media influencers can influence visit intention, especially in the case of Raffi Ahmad and Nagita Slavina, a top influencer who by 2 September 2021 had reached 21.3 M subscribers on YouTube and 54.9 m followers on Instagram with an engagement rate of 0.42%. The focus of this study is Generation Y or Millennials (born 1981–1996) and Generation Z (born 1997–2012). Design/methodology/approach: Snowball sampling was performed to arrive at a representative group of Millennials. Data analysis was performed using hierarchical likelihood via structural equation modeling. Findings: The study results are helpful for a comprehensive understanding of factors affecting visit intention. Effects of the study results summary, tourists from Generations Y and Z are thriving within the internet of things and the digital age, an era in which information can be accessed via various forms of technology across multiple platforms. Practical implications: We discuss and identify the relative importance of each factor through the use of logistics with variational approximation and structural equation models using hierarchical likelihood. Originality: The technique we use is an integrated and extended version of the structural equation model with hierarchical likelihood estimation and features selection using logistics variational approximation.


2022 ◽  
Vol 70 (1) ◽  
pp. 1875-1891
Author(s):  
Talha Imran ◽  
Muhammad Attique Khan ◽  
Muhammad Sharif ◽  
Usman Tariq ◽  
Yu-Dong Zhang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document