feature selection method
Recently Published Documents


TOTAL DOCUMENTS

1671
(FIVE YEARS 784)

H-INDEX

46
(FIVE YEARS 13)

2022 ◽  
Vol 2022 ◽  
pp. 1-10
Author(s):  
Ruizhong Du ◽  
Jingze Wang ◽  
Shuang Li

Internet of Things (IoT) device identification is a key step in the management of IoT devices. The devices connected to the network must be controlled by the manager. For this purpose, many schemes are proposed to identify IoT devices, especially the schemes working on the gateway. However, almost all researchers do not pay close attention to the cost. Thus, considering the gateway’s limited storage and computational resources, a new lightweight IoT device identification scheme is proposed. First, the DFI (deep/dynamic flow inspection) technology is utilized to efficiently extract flow-related statistical features based on in-depth studies. Then, combined with symmetric uncertainty and correlation coefficient, we proposed a novel filter feature selection method based on NSGA-III to select effective features for IoT device identification. We evaluate our proposed method by using a real smart home IoT data set and three different ML algorithms. The experimental results showed that our proposed method is lightweight and the feature selection algorithm is also effective, only using 6 features can achieve 99.5% accuracy with a 3-minute time interval.


2022 ◽  
Vol 12 ◽  
Author(s):  
Qingxia Yang ◽  
Yaguo Gong

Thyroid nodules are present in upto 50% of the population worldwide, and thyroid malignancy occurs in only 5–15% of nodules. Until now, fine-needle biopsy with cytologic evaluation remains the diagnostic choice to determine the risk of malignancy, yet it fails to discriminate as benign or malignant in one-third of cases. In order to improve the diagnostic accuracy and reliability, molecular testing based on transcriptomic data has developed rapidly. However, gene signatures of thyroid nodules identified in a plenty of transcriptomic studies are highly inconsistent and extremely difficult to be applied in clinical application. Therefore, it is highly necessary to identify consistent signatures to discriminate benign or malignant thyroid nodules. In this study, five independent transcriptomic studies were combined to discover the gene signature between benign and malignant thyroid nodules. This combined dataset comprises 150 malignant and 93 benign thyroid samples. Then, there were 279 differentially expressed genes (DEGs) discovered by the feature selection method (Student’s t test and fold change). And the weighted gene co-expression network analysis (WGCNA) was performed to identify the modules of highly co-expressed genes, and 454 genes in the gray module were discovered as the hub genes. The intersection between DEGs by the feature selection method and hub genes in the WGCNA model was identified as the key genes for thyroid nodules. Finally, four key genes (ST3GAL5, NRCAM, MT1F, and PROS1) participated in the pathogenesis of malignant thyroid nodules were validated using an independent dataset. Moreover, a high-performance classification model for discriminating thyroid nodules was constructed using these key genes. All in all, this study might provide a new insight into the key differentiation of benign and malignant thyroid nodules.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Samane Khoshbakht ◽  
Majid Mokhtari ◽  
Sayyed Sajjad Moravveji ◽  
Sadegh Azimzadeh Jamalkandi ◽  
Ali Masoudi-Nejad

Abstract Background Elucidating the dynamic topological changes across different stages of breast cancer, called stage re-wiring, could lead to identifying key latent regulatory signatures involved in cancer progression. Such dynamic regulators and their functions are mostly unknown. Here, we reconstructed differential co-expression networks for four stages of breast cancer to assess the dynamic patterns of cancer progression. A new computational approach was applied to identify stage-specific subnetworks for each stage. Next, prognostic traits of genes and the efficiency of stage-related groups were evaluated and validated, using the Log-Rank test, SVM classifier, and sample clustering. Furthermore, by conducting the stepwise VIF-feature selection method, a Cox-PH model was developed to predict patients’ risk. Finally, the re-wiring network for prognostic signatures was reconstructed and assessed across stages to detect gain/loss, positive/negative interactions as well as rewired-hub nodes contributing to dynamic cancer progression. Results After having implemented our new approach, we could identify four stage-specific core biological pathways. We could also detect an essential non-coding RNA, AC025034.1, which is not the only antisense to ATP2B1 (cell proliferation regulator), but also revealed a statistically significant stage-descending pattern; Moreover, AC025034.1 revealed both a dynamic topological pattern across stages and prognostic trait. We also identified a high-performance Overall-Survival-Risk model, including 12 re-wired genes to predict patients’ risk (c-index = 0.89). Finally, breast cancer-specific prognostic biomarkers of LINC01612, AC092142.1, and AC008969.1 were identified. Conclusions In summary new scoring method highlighted stage-specific core pathways for early-to-late progressions. Moreover, detecting the significant re-wired hub nodes indicated stage-associated traits, which reflects the importance of such regulators from different perspectives.


2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Lei Chen ◽  
ZhanDong Li ◽  
ShiQi Zhang ◽  
Yu-Hang Zhang ◽  
Tao Huang ◽  
...  

Methylation is one of the most common and considerable modifications in biological systems mediated by multiple enzymes. Recent studies have shown that methylation has been widely identified in different RNA molecules. RNA methylation modifications have various kinds, such as 5-methylcytosine (m5C). However, for individual methylation sites, their functions still remain to be elucidated. Testing of all methylation sites relies heavily on high-throughput sequencing technology, which is expensive and labor consuming. Thus, computational prediction approaches could serve as a substitute. In this study, multiple machine learning models were used to predict possible RNA m5C sites on the basis of mRNA sequences in human and mouse. Each site was represented by several features derived from k -mers of an RNA subsequence containing such site as center. The powerful max-relevance and min-redundancy (mRMR) feature selection method was employed to analyse these features. The outcome feature list was fed into incremental feature selection method, incorporating four classification algorithms, to build efficient models. Furthermore, the sites related to features used in the models were also investigated.


2022 ◽  
Vol 4 (1) ◽  
Author(s):  
Linyang Zhu ◽  
Weiwei Zhang ◽  
Guohua Tu

AbstractFeature selection targets for selecting relevant and useful features, and is a vital challenge in turbulence modeling by machine learning methods. In this paper, a new posterior feature selection method based on validation dataset is proposed, which is an efficient and universal method for complex systems including turbulence. Different from the priori feature importance ranking of the filter method and the exhaustive search for feature subset of the wrapper method, the proposed method ranks the features according to the model performance on the validation dataset, and generates the feature subsets in the order of feature importance. Using the features from the proposed method, a black-box model is built by artificial neural network (ANN) to reproduce the behavior of Spalart-Allmaras (S-A) turbulence model for high Reynolds number (Re) airfoil flows in aeronautical engineering. The results show that compared with the model without feature selection, the generalization ability of the model after feature selection is significantly improved. To some extent, it is also demonstrated that although the feature importance can be reflected by the model parameters during the training process, artificial feature selection is still very necessary.


Author(s):  
Jiucheng Xu ◽  
Kaili Shen ◽  
Lin Sun

AbstractMulti-label feature selection, a crucial preprocessing step for multi-label classification, has been widely applied to data mining, artificial intelligence and other fields. However, most of the existing multi-label feature selection methods for dealing with mixed data have the following problems: (1) These methods rarely consider the importance of features from multiple perspectives, which analyzes features not comprehensive enough. (2) These methods select feature subsets according to the positive region, while ignoring the uncertainty implied by the upper approximation. To address these problems, a multi-label feature selection method based on fuzzy neighborhood rough set is developed in this article. First, the fuzzy neighborhood approximation accuracy and fuzzy decision are defined in the fuzzy neighborhood rough set model, and a new multi-label fuzzy neighborhood conditional entropy is designed. Second, a mixed measure is proposed by combining the fuzzy neighborhood conditional entropy from information view with the approximate accuracy of fuzzy neighborhood from algebra view, to evaluate the importance of features from different views. Finally, a forward multi-label feature selection algorithm is proposed for removing redundant features and decrease the complexity of multi-label classification. The experimental results illustrate the validity and stability of the proposed algorithm in multi-label fuzzy neighborhood decision systems, when compared with related methods on ten multi-label datasets.


2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Hamid Nasiri ◽  
Seyed Ali Alavi

Background and Objective. The new coronavirus disease (known as COVID-19) was first identified in Wuhan and quickly spread worldwide, wreaking havoc on the economy and people’s everyday lives. As the number of COVID-19 cases is rapidly increasing, a reliable detection technique is needed to identify affected individuals and care for them in the early stages of COVID-19 and reduce the virus’s transmission. The most accessible method for COVID-19 identification is Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR); however, it is time-consuming and has false-negative results. These limitations encouraged us to propose a novel framework based on deep learning that can aid radiologists in diagnosing COVID-19 cases from chest X-ray images. Methods. In this paper, a pretrained network, DenseNet169, was employed to extract features from X-ray images. Features were chosen by a feature selection method, i.e., analysis of variance (ANOVA), to reduce computations and time complexity while overcoming the curse of dimensionality to improve accuracy. Finally, selected features were classified by the eXtreme Gradient Boosting (XGBoost). The ChestX-ray8 dataset was employed to train and evaluate the proposed method. Results and Conclusion. The proposed method reached 98.72% accuracy for two-class classification (COVID-19, No-findings) and 92% accuracy for multiclass classification (COVID-19, No-findings, and Pneumonia). The proposed method’s precision, recall, and specificity rates on two-class classification were 99.21%, 93.33%, and 100%, respectively. Also, the proposed method achieved 94.07% precision, 88.46% recall, and 100% specificity for multiclass classification. The experimental results show that the proposed framework outperforms other methods and can be helpful for radiologists in the diagnosis of COVID-19 cases.


2022 ◽  
Vol 2022 ◽  
pp. 1-12
Author(s):  
Yuan Tang ◽  
Zining Zhao ◽  
Shaorong Zhang ◽  
Zhi Li ◽  
Yun Mo ◽  
...  

Feature extraction and selection are important parts of motor imagery electroencephalogram (EEG) decoding and have always been the focus and difficulty of brain-computer interface (BCI) system research. In order to improve the accuracy of EEG decoding and reduce model training time, new feature extraction and selection methods are proposed in this paper. First, a new spatial-frequency feature extraction method is proposed. The original EEG signal is preprocessed, and then the common spatial pattern (CSP) is used for spatial filtering and dimensionality reduction. Finally, the filter bank method is used to decompose the spatially filtered signals into multiple frequency subbands, and the logarithmic band power feature of each frequency subband is extracted. Second, to select the subject-specific spatial-frequency features, a hybrid feature selection method based on the Fisher score and support vector machine (SVM) is proposed. The Fisher score of each feature is calculated, then a series of threshold parameters are set to generate different feature subsets, and finally, SVM and cross-validation are used to select the optimal feature subset. The effectiveness of the proposed method is validated using two sets of publicly available BCI competition data and a set of self-collected data. The total average accuracy of the three data sets achieved by the proposed method is 82.39%, which is 2.99% higher than the CSP method. The experimental results show that the proposed method has a better classification effect than the existing methods, and at the same time, feature extraction and feature selection time also have greater advantages.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Deepti Sisodia ◽  
Dilip Singh Sisodia

PurposeThe problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.Design/methodology/approachTo overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.FindingsEmpirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.Originality/valueThe FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.


Sign in / Sign up

Export Citation Format

Share Document