feature importance
Recently Published Documents


TOTAL DOCUMENTS

323
(FIVE YEARS 253)

H-INDEX

14
(FIVE YEARS 5)

2022 ◽  
Vol 135 ◽  
pp. 108529
Author(s):  
Yifan Zhao ◽  
Weiwei Zhu ◽  
Panpan Wei ◽  
Peng Fang ◽  
Xiwang Zhang ◽  
...  

2022 ◽  
Vol 31 (1) ◽  
pp. 1-38
Author(s):  
Yingzhe Lyu ◽  
Gopi Krishnan Rajbahadur ◽  
Dayi Lin ◽  
Boyuan Chen ◽  
Zhen Ming (Jack) Jiang

Artificial Intelligence for IT Operations (AIOps) has been adopted in organizations in various tasks, including interpreting models to identify indicators of service failures. To avoid misleading practitioners, AIOps model interpretations should be consistent (i.e., different AIOps models on the same task agree with one another on feature importance). However, many AIOps studies violate established practices in the machine learning community when deriving interpretations, such as interpreting models with suboptimal performance, though the impact of such violations on the interpretation consistency has not been studied. In this article, we investigate the consistency of AIOps model interpretation along three dimensions: internal consistency, external consistency, and time consistency. We conduct a case study on two AIOps tasks: predicting Google cluster job failures and Backblaze hard drive failures. We find that the randomness from learners, hyperparameter tuning, and data sampling should be controlled to generate consistent interpretations. AIOps models with AUCs greater than 0.75 yield more consistent interpretation compared to low-performing models. Finally, AIOps models that are constructed with the Sliding Window or Full History approaches have the most consistent interpretation with the trends presented in the entire datasets. Our study provides valuable guidelines for practitioners to derive consistent AIOps model interpretation.


2022 ◽  
Vol 4 (1) ◽  
Author(s):  
Linyang Zhu ◽  
Weiwei Zhang ◽  
Guohua Tu

AbstractFeature selection targets for selecting relevant and useful features, and is a vital challenge in turbulence modeling by machine learning methods. In this paper, a new posterior feature selection method based on validation dataset is proposed, which is an efficient and universal method for complex systems including turbulence. Different from the priori feature importance ranking of the filter method and the exhaustive search for feature subset of the wrapper method, the proposed method ranks the features according to the model performance on the validation dataset, and generates the feature subsets in the order of feature importance. Using the features from the proposed method, a black-box model is built by artificial neural network (ANN) to reproduce the behavior of Spalart-Allmaras (S-A) turbulence model for high Reynolds number (Re) airfoil flows in aeronautical engineering. The results show that compared with the model without feature selection, the generalization ability of the model after feature selection is significantly improved. To some extent, it is also demonstrated that although the feature importance can be reflected by the model parameters during the training process, artificial feature selection is still very necessary.


Energies ◽  
2022 ◽  
Vol 15 (2) ◽  
pp. 549
Author(s):  
Giuliano Armano ◽  
Paolo Attilio Pegoraro

The design of new monitoring systems for intelligent distribution networks often requires both real-time measurements and pseudomeasurements to be processed. The former are obtained from smart meters, phasor measurement units and smart electronic devices, whereas the latter are predicted using appropriate algorithms—with the typical objective of forecasting the behaviour of power loads and generators. However, depending on the technique used for data encoding, the attempt at making predictions over a period of several days may trigger problems related to the high number of features. To contrast this issue, feature importance analysis becomes a tool of primary importance. This article is aimed at illustrating a technique devised to investigate the importance of features on data deemed relevant for predicting the next hour demand of aggregated, medium-voltage electrical loads. The same technique allows us to inspect the hidden layers of multilayer perceptrons entrusted with making the predictions, since, ultimately, the content of any hidden layer can be seen as an alternative encoding of the input data. The possibility of inspecting hidden layers can give wide support to researchers in a number of relevant tasks, including the appraisal of the generalisation capability reached by a multilayer perceptron and the identification of neurons not relevant for the prediction task.


PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262131
Author(s):  
Adil Aslam Mir ◽  
Kimberlee Jane Kearfott ◽  
Fatih Vehbi Çelebi ◽  
Muhammad Rafique

A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested on soil radon gas concentration (SRGC) data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from 1st March 2017 to the 11th of May 2018, including 4 seismic activities that have taken place during the data collection time.


2022 ◽  
pp. 1-11
Author(s):  
Joshua E. Curtiss ◽  
Emily E. Bernstein ◽  
Sabine Wilhelm ◽  
Katharine A. Phillips

Abstract Background Serotonin-reuptake inhibitors (SRIs) are first-line pharmacotherapy for the treatment of body dysmorphic disorder (BDD), a common and severe disorder. However, prior research has not focused on or identified definitive predictors of SRI treatment outcomes. Leveraging precision medicine techniques such as machine learning can facilitate the prediction of treatment outcomes. Methods The study used 10-fold cross-validation support vector machine (SVM) learning models to predict three treatment outcomes (i.e. response, partial remission, and full remission) for 97 patients with BDD receiving up to 14-weeks of open-label treatment with the SRI escitalopram. SVM models used baseline clinical and demographic variables as predictors. Feature importance analyses complemented traditional SVM modeling to identify which variables most successfully predicted treatment response. Results SVM models indicated acceptable classification performance for predicting treatment response with an area under the curve (AUC) of 0.77 (sensitivity = 0.77 and specificity = 0.63), partial remission with an AUC of 0.75 (sensitivity = 0.67 and specificity = 0.73), and full remission with an AUC of 0.79 (sensitivity = 0.70 and specificity = 0.79). Feature importance analyses supported constructs such as better quality of life and less severe depression, general psychopathology symptoms, and hopelessness as more predictive of better treatment outcome; demographic variables were least predictive. Conclusions The current study is the first to demonstrate that machine learning algorithms can successfully predict treatment outcomes for pharmacotherapy for BDD. Consistent with precision medicine initiatives in psychiatry, the current study provides a foundation for personalized pharmacotherapy strategies for patients with BDD.


2022 ◽  
Vol 12 (1) ◽  
pp. 453
Author(s):  
Cheng-Lin Wu ◽  
Hsun-Ping Hsieh ◽  
Jiawei Jiang ◽  
Yi-Chieh Yang ◽  
Chris Shei ◽  
...  

To alleviate the impact of fake news on our society, predicting the popularity of fake news posts on social media is a crucial problem worthy of study. However, most related studies on fake news emphasize detection only. In this paper, we focus on the issue of fake news influence prediction, i.e., inferring how popular a fake news post might become on social platforms. To achieve our goal, we propose a comprehensive framework, MUFFLE, which captures multi-modal dynamics by encoding the representation of news-related social networks, user characteristics, and content in text. The attention mechanism developed in the model can provide explainability for social or psychological analysis. To examine the effectiveness of MUFFLE, we conducted extensive experiments on real-world datasets. The experimental results show that our proposed method outperforms both state-of-the-art methods of popularity prediction and machine-based baselines in top-k NDCG and hit rate. Through the experiments, we also analyze the feature importance for predicting fake news influence via the explainability provided by MUFFLE.


2022 ◽  
Vol 23 (1) ◽  
pp. 95-115
Author(s):  
Wan Nurhidayah Ibrahim ◽  
Mohd Syahid Anuar ◽  
Ali Selamat ◽  
Ondrej Krejcar

Botnet is a significant cyber threat that continues to evolve. Botmasters continue to improve the security framework strategy for botnets to go undetected. Newer botnet source code runs attack detection every second, and each attack demonstrates the difficulty and robustness of monitoring the botnet. In the conventional network botnet detection model that uses signature-analysis, the patterns of a botnet concealment strategy such as encryption & polymorphic and the shift in structure from centralized to decentralized peer-to-peer structure, generate challenges. Behavior analysis seems to be a promising approach for solving these problems because it does not rely on analyzing the network traffic payload. Other than that, to predict novel types of botnet, a detection model should be developed. This study focuses on using flow-based behavior analysis to detect novel botnets, necessary due to the difficulties of detecting existing patterns in a botnet that continues to modify the signature in concealment strategy. This study also recommends introducing Independent Component Analysis (ICA) and data pre-processing standardization to increase data quality before classification. With and without ICA implementation, we compared the percentage of significant features. Through the experiment, we found that the results produced from ICA show significant improvements.  The highest F-score was 83% for Neris bot. The average F-score for a novel botnet sample was 74%. Through the feature importance test, the feature importance increased from 22% to 27%, and the training model false positive rate also decreased from 1.8% to 1.7%. ABSTRAK: Botnet merupakan ancaman siber yang sentiasa berevolusi. Pemilik bot sentiasa memperbaharui strategi keselamatan bagi botnet agar tidak dapat dikesan. Setiap saat, kod-kod sumber baru botnet telah dikesan dan setiap serangan dilihat menunjukkan tahap kesukaran dan ketahanan dalam mengesan bot. Model pengesanan rangkaian botnet konvensional telah menggunakan analisis berdasarkan tanda pengenalan bagi mengatasi halangan besar dalam mengesan corak botnet tersembunyi seperti teknik penyulitan dan teknik polimorfik. Masalah ini lebih bertumpu pada perubahan struktur berpusat kepada struktur bukan berpusat seperti rangkaian rakan ke rakan (P2P). Analisis tingkah laku ini seperti sesuai bagi menyelesaikan masalah-masalah tersebut kerana ianya tidak bergantung kepada analisis rangkaian beban muatan trafik. Selain itu, bagi menjangka botnet baru, model pengesanan harus dibangunkan. Kajian ini bertumpu kepada penggunaan analisa tingkah-laku berdasarkan aliran bagi mengesan botnet baru yang sukar dikesan pada corak pengenalan botnet sedia-ada yang sentiasa berubah dan menggunakan strategi tersembunyi. Kajian ini juga mencadangkan penggunakan Analisis Komponen Bebas (ICA) dan pra-pemprosesan data yang standard bagi meningkatkan kualiti data sebelum pengelasan. Peratusan ciri-ciri penting telah dibandingkan dengan dan tanpa menggunakan ICA. Dapatan kajian melalui eksperimen menunjukkan dengan penggunaan ICA, keputusan adalah jauh lebih baik. Skor F tertinggi ialah 83% bagi bot Neris. Purata skor F bagi sampel botnet baru adalah 74%. Melalui ujian kepentingan ciri, kepentingan ciri meningkat dari 22% kepada 27%, dan kadar positif model latihan palsu juga berkurangan dari 1.8% kepada 1.7%.


Sign in / Sign up

Export Citation Format

Share Document