Feature Importances: A Tool to Explain Radio Propagation and Reduce Model Complexity

Sotirios P. Sotiroudis; Sotirios K. Goudos; Katherine Siakavara

doi:10.3390/telecom1020009

Feature Importances: A Tool to Explain Radio Propagation and Reduce Model Complexity

Telecom ◽

10.3390/telecom1020009 ◽

2020 ◽

Vol 1 (2) ◽

pp. 114-125

Author(s):

Sotirios P. Sotiroudis ◽

Sotirios K. Goudos ◽

Katherine Siakavara

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Response Times ◽

Path Loss ◽

Radio Propagation ◽

Model Complexity ◽

Learning Models ◽

Reduced Complexity ◽

Machine Learning Models

Machine learning models have been widely deployed to tackle the problem of radio propagation. In addition to helping in the estimation of path loss, they can also be used to better understand the details of various propagation scenarios. Our current work exploits the inherent ranking of feature importances provided by XGBoost and Random Forest as a means of indicating the contribution of the underlying propagation mechanisms. A comparison between two different transmitter antenna heights, revealing the associated propagation profiles, is made. Feature selection is then implemented, leading to models with reduced complexity, and consequently reduced training and response times, based on the previously calculated importances.

Download Full-text

Neural Feature Selection for Learning to Rank

Lecture Notes in Computer Science - Advances in Information Retrieval ◽

10.1007/978-3-030-72240-1_34 ◽

2021 ◽

pp. 342-349

Author(s):

Alberto Purpura ◽

Karolina Buchner ◽

Gianmaria Silvello ◽

Gian Antonio Susto

Keyword(s):

Machine Learning ◽

Feature Selection ◽

System Performance ◽

Large Scale ◽

Learning To Rank ◽

Research Area ◽

Model Complexity ◽

Learning Models ◽

Model Size ◽

Machine Learning Models

AbstractLEarning TO Rank (LETOR) is a research area in the field of Information Retrieval (IR) where machine learning models are employed to rank a set of items. In the past few years, neural LETOR approaches have become a competitive alternative to traditional ones like LambdaMART. However, neural architectures performance grew proportionally to their complexity and size. This can be an obstacle for their adoption in large-scale search systems where a model size impacts latency and update time. For this reason, we propose an architecture-agnostic approach based on a neural LETOR model to reduce the size of its input by up to 60% without affecting the system performance. This approach also allows to reduce a LETOR model complexity and, therefore, its training and inference time up to 50%.

Download Full-text

Detection of Osteosarcoma on Bone Radiographs Using Convolutional Neural Networks

10.21528/cbic2021-16 ◽

2021 ◽

Author(s):

Larissa Asito ◽

Hélcio Pereira ◽

Marcello Nogueira-Barbosa ◽

Renato Tinós

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Feature Selection ◽

Random Forest ◽

Decision Tree ◽

Convolutional Neural Networks ◽

Learning Models ◽

Computer Aided ◽

Aided Diagnosis ◽

Machine Learning Models

We propose a computer-aided diagnosis system based on convolutional neural networks (CNNs) for the identification of osteosarcoma on bone radiographs. The CNN should indicate regions of the image that may contain tumors. In order to indicate these regions on the image, we propose to split the image in windows and individually classify them by using a CNN. Techniques for pre-processing, such as window exclusion and labeling, are proposed. Two CNNs are compared in the proposed system. The first one is trained from scratch, while the second one is a pre-trained CNN (VGG16). The CNNs are compared to four machine learning models that use features extracted from the image windows as inputs: multilayer perceptron (MLP), decision tree, random forest, and MLP with feature selection. In the experiments, the best performance was obtained by the pre-trained CNN.

Download Full-text

Machine Learning Models for the Prediction of Postpartum Depression: Application and Comparison Based on a Cohort Study (Preprint)

10.2196/preprints.15516 ◽

2019 ◽

Author(s):

Weina Zhang ◽

Han Liu ◽

Vincent Michael Bernard Silenzio ◽

Peiyuan Qiu ◽

Wenjie Gong

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Postpartum Depression ◽

Support Vector ◽

Selection Methods ◽

Learning Models ◽

Expert Consultation ◽

Using Data ◽

Machine Learning Models

BACKGROUND Postpartum depression (PPD) is a serious public health problem. Building a predictive model for PPD using data during pregnancy can facilitate earlier identification and intervention. OBJECTIVE The aims of this study are to compare the effects of four different machine learning models using data during pregnancy to predict PPD and explore which factors in the model are the most important for PPD prediction. METHODS Information on the pregnancy period from a cohort of 508 women, including demographics, social environmental factors, and mental health, was used as predictors in the models. The Edinburgh Postnatal Depression Scale score within 42 days after delivery was used as the outcome indicator. Using two feature selection methods (expert consultation and random forest-based filter feature selection [FFS-RF]) and two algorithms (support vector machine [SVM] and random forest [RF]), we developed four different machine learning PPD prediction models and compared their prediction effects. RESULTS There was no significant difference in the effectiveness of the two feature selection methods in terms of model prediction performance, but 10 fewer factors were selected with the FFS-RF than with the expert consultation method. The model based on SVM and FFS-RF had the best prediction effects (sensitivity=0.69, area under the curve=0.78). In the feature importance ranking output by the RF algorithm, psychological elasticity, depression during the third trimester, and income level were the most important predictors. CONCLUSIONS In contrast to the expert consultation method, FFS-RF was important in dimension reduction. When the sample size is small, the SVM algorithm is suitable for predicting PPD. In the prevention of PPD, more attention should be paid to the psychological resilience of mothers.

Download Full-text

Machine Learning Models for the Prediction of Postpartum Depression: Application and Comparison Based on a Cohort Study

JMIR Medical Informatics ◽

10.2196/15516 ◽

2020 ◽

Vol 8 (4) ◽

pp. e15516

Author(s):

Weina Zhang ◽

Han Liu ◽

Vincent Michael Bernard Silenzio ◽

Peiyuan Qiu ◽

Wenjie Gong

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Postpartum Depression ◽

Support Vector ◽

Selection Methods ◽

Learning Models ◽

Expert Consultation ◽

Using Data ◽

Machine Learning Models

Background Postpartum depression (PPD) is a serious public health problem. Building a predictive model for PPD using data during pregnancy can facilitate earlier identification and intervention. Objective The aims of this study are to compare the effects of four different machine learning models using data during pregnancy to predict PPD and explore which factors in the model are the most important for PPD prediction. Methods Information on the pregnancy period from a cohort of 508 women, including demographics, social environmental factors, and mental health, was used as predictors in the models. The Edinburgh Postnatal Depression Scale score within 42 days after delivery was used as the outcome indicator. Using two feature selection methods (expert consultation and random forest-based filter feature selection [FFS-RF]) and two algorithms (support vector machine [SVM] and random forest [RF]), we developed four different machine learning PPD prediction models and compared their prediction effects. Results There was no significant difference in the effectiveness of the two feature selection methods in terms of model prediction performance, but 10 fewer factors were selected with the FFS-RF than with the expert consultation method. The model based on SVM and FFS-RF had the best prediction effects (sensitivity=0.69, area under the curve=0.78). In the feature importance ranking output by the RF algorithm, psychological elasticity, depression during the third trimester, and income level were the most important predictors. Conclusions In contrast to the expert consultation method, FFS-RF was important in dimension reduction. When the sample size is small, the SVM algorithm is suitable for predicting PPD. In the prevention of PPD, more attention should be paid to the psychological resilience of mothers.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Voice Feature Selection to Improve Performance of Machine Learning Models for Voice Production Inversion

Journal of Voice ◽

10.1016/j.jvoice.2021.03.004 ◽

2021 ◽

Author(s):

Zhaoyan Zhang

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Models ◽

Improve Performance ◽

Voice Production ◽

Machine Learning Models

Download Full-text

Explainable AI: A Review of Machine Learning Interpretability Methods

Entropy ◽

10.3390/e23010018 ◽

2020 ◽

Vol 23 (1) ◽

pp. 18

Author(s):

Pantelis Linardatos ◽

Vasilis Papastefanopoulos ◽

Sotiris Kotsiantis

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Black Box ◽

Learning Systems ◽

Model Complexity ◽

Learning Models ◽

New Methods ◽

Industrial Adoption ◽

Machine Learning Models ◽

The Way

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption, with machine learning systems demonstrating superhuman performance in a significant number of tasks. However, this surge in performance, has often been achieved through increased model complexity, turning such systems into “black box” approaches and causing uncertainty regarding the way they operate and, ultimately, the way that they come to decisions. This ambiguity has made it problematic for machine learning systems to be adopted in sensitive yet critical domains, where their value could be immense, such as healthcare. As a result, scientific interest in the field of Explainable Artificial Intelligence (XAI), a field that is concerned with the development of new methods that explain and interpret machine learning models, has been tremendously reignited over recent years. This study focuses on machine learning interpretability methods; more specifically, a literature review and taxonomy of these methods are presented, as well as links to their programming implementations, in the hope that this survey would serve as a reference point for both theorists and practitioners.

Download Full-text

Random forest and long short-term memory based machine learning models for classification of ion mobility spectrometry spectra

Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE) Sensing XXII ◽

10.1117/12.2585829 ◽

2021 ◽

Author(s):

Patrick C. Riley ◽

Samir V. Deshpande ◽

Brian S. Ince ◽

Brian C. Hauck ◽

Kyle P. O'Donnell ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ion Mobility ◽

Short Term Memory ◽

Learning Models ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Machine Learning Models

Download Full-text

A Novel Method for Performance Measurement of Public Educational Institutions Using Machine Learning Models

Applied Sciences ◽

10.3390/app11199296 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9296

Author(s):

Talha Mahboob Alam ◽

Mubbashar Mushtaq ◽

Kamran Shaukat ◽

Ibrahim A. Hameed ◽

Muhammad Umer Sarwar ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Feature Selection ◽

Artificial Neural Networks ◽

Performance Indicators ◽

Key Performance Indicators ◽

Educational Institutions ◽

Institutional Performance ◽

Learning Models ◽

Machine Learning Models

Lack of education is a major concern in underdeveloped countries because it leads to poor human and economic development. The level of education in public institutions varies across all regions around the globe. Current disparities in access to education worldwide are mostly due to systemic regional differences and the distribution of resources. Previous research focused on evaluating students’ academic performance, but less has been done to measure the performance of educational institutions. Key performance indicators for the evaluation of institutional performance differ from student performance indicators. There is a dire need to evaluate educational institutions’ performance based on their disparities and academic results on a large scale. This study proposes a model to measure institutional performance based on key performance indicators through data mining techniques. Various feature selection methods were used to extract the key performance indicators. Several machine learning models, namely, J48 decision tree, support vector machines, random forest, rotation forest, and artificial neural networks were employed to build an efficient model. The results of the study were based on different factors, i.e., the number of schools in a specific region, teachers, school locations, enrolment, and availability of necessary facilities that contribute to school performance. It was also observed that urban regions performed well compared to rural regions due to the improved availability of educational facilities and resources. The results showed that artificial neural networks outperformed other models and achieved an accuracy of 82.9% when the relief-F based feature selection method was used. This study will help support efforts in governance for performance monitoring, policy formulation, target-setting, evaluation, and reform to address the issues and challenges in education worldwide.

Download Full-text

Comparison of Resampling Algorithms to Address Class Imbalance when Developing Machine Learning Models to Predict Foodborne Pathogen Presence in Agricultural Water

Frontiers in Environmental Science ◽

10.3389/fenvs.2021.701288 ◽

2021 ◽

Vol 9 ◽

Author(s):

Daniel Lowell Weller ◽

Tanzy M. T. Love ◽

Martin Wiedmann

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Training Data ◽

Agricultural Water ◽

Learning Models ◽

Safety Hazards ◽

E Coli ◽

Resampling Method ◽

Machine Learning Models

Recent studies have shown that predictive models can supplement or provide alternatives to E. coli-testing for assessing the potential presence of food safety hazards in water used for produce production. However, these studies used balanced training data and focused on enteric pathogens. As such, research is needed to determine 1) if predictive models can be used to assess Listeria contamination of agricultural water, and 2) how resampling (to deal with imbalanced data) affects performance of these models. To address these knowledge gaps, this study developed models that predict nonpathogenic Listeria spp. (excluding L. monocytogenes) and L. monocytogenes presence in agricultural water using various combinations of learner (e.g., random forest, regression), feature type, and resampling method (none, oversampling, SMOTE). Four feature types were used in model training: microbial, physicochemical, spatial, and weather. “Full models” were trained using all four feature types, while “nested models” used between one and three types. In total, 45 full (15 learners*3 resampling approaches) and 108 nested (5 learners*9 feature sets*3 resampling approaches) models were trained per outcome. Model performance was compared against baseline models where E. coli concentration was the sole predictor. Overall, the machine learning models outperformed the baseline E. coli models, with random forests outperforming models built using other learners (e.g., rule-based learners). Resampling produced more accurate models than not resampling, with SMOTE models outperforming, on average, oversampling models. Regardless of resampling method, spatial and physicochemical water quality features drove accurate predictions for the nonpathogenic Listeria spp. and L. monocytogenes models, respectively. Overall, these findings 1) illustrate the need for alternatives to existing E. coli-based monitoring programs for assessing agricultural water for the presence of potential food safety hazards, and 2) suggest that predictive models may be one such alternative. Moreover, these findings provide a conceptual framework for how such models can be developed in the future with the ultimate aim of developing models that can be integrated into on-farm risk management programs. For example, future studies should consider using random forest learners, SMOTE resampling, and spatial features to develop models to predict the presence of foodborne pathogens, such as L. monocytogenes, in agricultural water when the training data is imbalanced.

Download Full-text