information theoretic criteria
Recently Published Documents


TOTAL DOCUMENTS

94
(FIVE YEARS 7)

H-INDEX

20
(FIVE YEARS 1)

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Farideh Jalali-najafabadi ◽  
Michael Stadler ◽  
Nick Dand ◽  
Deepak Jadon ◽  
Mehreen Soomro ◽  
...  

AbstractIn view of the growth of clinical risk prediction models using genetic data, there is an increasing need for studies that use appropriate methods to select the optimum number of features from a large number of genetic variants with a high degree of redundancy between features due to linkage disequilibrium (LD). Filter feature selection methods based on information theoretic criteria, are well suited to this challenge and will identify a subset of the original variables that should result in more accurate prediction. However, data collected from cohort studies are often high-dimensional genetic data with potential confounders presenting challenges to feature selection and risk prediction machine learning models. Patients with psoriasis are at high risk of developing a chronic arthritis known as psoriatic arthritis (PsA). The prevalence of PsA in this patient group can be up to 30% and the identification of high risk patients represents an important clinical research which would allow early intervention and a reduction of disability. This also provides us with an ideal scenario for the development of clinical risk prediction models and an opportunity to explore the application of information theoretic criteria methods. In this study, we developed the feature selection and psoriatic arthritis (PsA) risk prediction models that were applied to a cross-sectional genetic dataset of 1462 PsA cases and 1132 cutaneous-only psoriasis (PsC) cases using 2-digit HLA alleles imputed using the SNP2HLA algorithm. We also developed stratification method to mitigate the impact of potential confounder features and illustrate that confounding features impact the feature selection. The mitigated dataset was used in training of seven supervised algorithms. 80% of data was randomly used for training of seven supervised machine learning methods using stratified nested cross validation and 20% was selected randomly as a holdout set for internal validation. The risk prediction models were then further validated in UK Biobank dataset containing data on 1187 participants and a set of features overlapping with the training dataset.Performance of these methods has been evaluated using the area under the curve (AUC), accuracy, precision, recall, F1 score and decision curve analysis(net benefit). The best model is selected based on three criteria: the ‘lowest number of feature subset’ with the ‘maximal average AUC over the nested cross validation’ and good generalisability to the UK Biobank dataset. In the original dataset, with over 100 different bootstraps and seven feature selection (FS) methods, HLA_C_*06 was selected as the most informative genetic variant. When the dataset is mitigated the single most important genetic features based on rank was identified as HLA_B_*27 by the seven different feature selection methods, consistent with previous analyses of this data using regression based methods. However, the predictive accuracy of these single features in post mitigation was found to be moderate (AUC= 0.54 (internal cross validation), AUC=0.53 (internal hold out set), AUC=0.55(external data set)). Sequentially adding additional HLA features based on rank improved the performance of the Random Forest classification model where 20 2-digit features selected by Interaction Capping (ICAP) demonstrated (AUC= 0.61 (internal cross validation), AUC=0.57 (internal hold out set), AUC=0.58 (external dataset)). The stratification method for mitigation of confounding features and filter information theoretic feature selection can be applied to a high dimensional dataset with the potential confounders.


Entropy ◽  
2020 ◽  
Vol 22 (5) ◽  
pp. 512
Author(s):  
Barouch Matzliach ◽  
Irad Ben-Gal ◽  
Evgeny Kagan

The paper considers the detection of multiple targets by a group of mobile robots that perform under uncertainty. The agents are equipped with sensors with positive and non-negligible probabilities of detecting the targets at different distances. The goal is to define the trajectories of the agents that can lead to the detection of the targets in minimal time. The suggested solution follows the classical Koopman’s approach applied to an occupancy grid, while the decision-making and control schemes are conducted based on information-theoretic criteria. Sensor fusion in each agent and over the agents is implemented using a general Bayesian scheme. The presented procedures follow the expected information gain approach utilizing the “center of view” and the “center of gravity” algorithms. These methods are compared with a simulated learning method. The activity of the procedures is analyzed using numerical simulations.


Algorithms ◽  
2019 ◽  
Vol 12 (9) ◽  
pp. 178 ◽  
Author(s):  
Bogdan Dumitrescu ◽  
Ciprian Doru Giurcăneanu

Finding the size of the dictionary is an open issue in dictionary learning (DL). We propose an algorithm that adapts the size during the learning process by using Information Theoretic Criteria (ITC) specialized to the DL problem. The algorithm is built on top of Approximate K-SVD (AK-SVD) and periodically removes the less used atoms or adds new random atoms, based on ITC evaluations for a small number of candidate sub-dictionaries. Numerical experiments on synthetic data show that our algorithm not only finds the true size with very good accuracy, but is also able to improve the representation error in comparison with AK-SVD knowing the true size.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3473 ◽  
Author(s):  
Guangpu Zhang ◽  
Ce Zheng ◽  
Sibo Sun ◽  
Guolong Liang ◽  
Yifeng Zhang

In this paper, we study the problem of the joint detection and direction-of-arrival (DOA) tracking of a single moving source which can randomly appear or disappear from the surveillance volume. Firstly, the Bernoulli random finite set (RFS) is employed to characterize the randomness of the state process, i.e., the dynamics of the source motion and the source appearance. To increase the performance of the detection and DOA tracking in low signal-to-noise ratio (SNR) scenarios, the measurements are obtained directly from an array of sensors and allow multiple snapshots. A track-before-detect (TBD) Bernoulli filter is proposed for tracking a randomly on/off switching single dynamic system. Secondly, since the variances of the stochastic signal and measurement noise are unknown in practical applications, these nuisance parameters are marginalized by defining an uninformative prior, and the likelihood function is compensated by using the information theoretic criteria. The simulation results demonstrate the performance of the filter.


Sign in / Sign up

Export Citation Format

Share Document