conditional entropy
Recently Published Documents





Jiucheng Xu ◽  
Kaili Shen ◽  
Lin Sun

AbstractMulti-label feature selection, a crucial preprocessing step for multi-label classification, has been widely applied to data mining, artificial intelligence and other fields. However, most of the existing multi-label feature selection methods for dealing with mixed data have the following problems: (1) These methods rarely consider the importance of features from multiple perspectives, which analyzes features not comprehensive enough. (2) These methods select feature subsets according to the positive region, while ignoring the uncertainty implied by the upper approximation. To address these problems, a multi-label feature selection method based on fuzzy neighborhood rough set is developed in this article. First, the fuzzy neighborhood approximation accuracy and fuzzy decision are defined in the fuzzy neighborhood rough set model, and a new multi-label fuzzy neighborhood conditional entropy is designed. Second, a mixed measure is proposed by combining the fuzzy neighborhood conditional entropy from information view with the approximate accuracy of fuzzy neighborhood from algebra view, to evaluate the importance of features from different views. Finally, a forward multi-label feature selection algorithm is proposed for removing redundant features and decrease the complexity of multi-label classification. The experimental results illustrate the validity and stability of the proposed algorithm in multi-label fuzzy neighborhood decision systems, when compared with related methods on ten multi-label datasets.

PRX Quantum ◽  
2022 ◽  
Vol 3 (1) ◽  
Gabriel T. Landi ◽  
Mauro Paternostro ◽  
Alessio Belenchia

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Danyu Jin ◽  
Ping Zhu

The prediction of protein subcellular localization not only is important for the study of protein structure and function but also can facilitate the design and development of new drugs. In recent years, feature extraction methods based on protein evolution information have attracted much attention and made good progress. Based on the protein position-specific score matrix (PSSM) obtained by PSI-BLAST, PSSM-GSD method is proposed according to the data distribution characteristics. In order to reflect the protein sequence information as much as possible, AAO method, PSSM-AAO method, and PSSM-GSD method are fused together. Then, conditional entropy-based classifier chain algorithm and support vector machine are used to locate multilabel proteins. Finally, we test Gpos-mPLoc and Gneg-mPLoc datasets, considering the severe imbalance of data, and select SMOTE algorithm to expand a few sample; the experiment shows that the AAO + PSSM ∗ method in the paper achieved 83.1% and 86.8% overall accuracy, respectively. After experimental comparison of different methods, AAO + PSSM ∗ has good performance and can effectively predict protein subcellular location.

Entropy ◽  
2021 ◽  
Vol 24 (1) ◽  
pp. 67
Xiyu Shi ◽  
Varuna De-Silva ◽  
Yusuf Aslan ◽  
Erhan Ekmekcioglu ◽  
Ahmet Kondoz

Deep learning has proven to be an important element of modern data processing technology, which has found its application in many areas such as multimodal sensor data processing and understanding, data generation and anomaly detection. While the use of deep learning is booming in many real-world tasks, the internal processes of how it draws results is still uncertain. Understanding the data processing pathways within a deep neural network is important for transparency and better resource utilisation. In this paper, a method utilising information theoretic measures is used to reveal the typical learning patterns of convolutional neural networks, which are commonly used for image processing tasks. For this purpose, training samples, true labels and estimated labels are considered to be random variables. The mutual information and conditional entropy between these variables are then studied using information theoretical measures. This paper shows that more convolutional layers in the network improve its learning and unnecessarily higher numbers of convolutional layers do not improve the learning any further. The number of convolutional layers that need to be added to a neural network to gain the desired learning level can be determined with the help of theoretic information quantities including entropy, inequality and mutual information among the inputs to the network. The kernel size of convolutional layers only affects the learning speed of the network. This study also shows that where the dropout layer is applied to has no significant effects on the learning of networks with a lower dropout rate, and it is better placed immediately after the last convolutional layer with higher dropout rates.

Entropy ◽  
2021 ◽  
Vol 24 (1) ◽  
pp. 26
Hongjian Xiao ◽  
Danilo P. Mandic

Entropy-based methods have received considerable attention in the quantification of structural complexity of real-world systems. Among numerous empirical entropy algorithms, conditional entropy-based methods such as sample entropy, which are associated with amplitude distance calculation, are quite intuitive to interpret but require excessive data lengths for meaningful evaluation at large scales. To address this issue, we propose the variational embedding multiscale sample entropy (veMSE) method and conclusively demonstrate its ability to operate robustly, even with several times shorter data than the existing conditional entropy-based methods. The analysis reveals that veMSE also exhibits other desirable properties, such as the robustness to the variation in embedding dimension and noise resilience. For rigor, unlike the existing multivariate methods, the proposed veMSE assigns a different embedding dimension to every data channel, which makes its operation independent of channel permutation. The veMSE is tested on both stimulated and real world signals, and its performance is evaluated against the existing multivariate multiscale sample entropy methods. The proposed veMSE is also shown to exhibit computational advantages over the existing amplitude distance-based entropy methods.

2021 ◽  
Camilo E. Valderrama ◽  
Daniel J. Niven ◽  
Henry T. Stelfox ◽  
Joon Lee

BACKGROUND Redundancy in laboratory blood tests is common in intensive care units (ICU), affecting patients' health and increasing healthcare expenses. Medical communities have made recommendations to order laboratory tests more judiciously. Wise selection can rely on modern data-driven approaches that have been shown to help identify redundant laboratory blood tests in ICUs. However, most of these works have been developed for highly selected clinical conditions such as gastrointestinal bleeding. Moreover, features based on conditional entropy and conditional probability distribution have not been used to inform the need for performing a new test. OBJECTIVE We aimed to address the limitations of previous works by adapting conditional entropy and conditional probability to extract features to predict abnormal laboratory blood test results. METHODS We used an ICU dataset collected across Alberta, Canada which included 55,689 ICU admissions from 48,672 patients with different diagnoses. We investigated conditional entropy and conditional probability-based features by comparing the performances of two machine learning approaches to predict normal and abnormal results for 18 blood laboratory tests. Approach 1 used patients' vitals, age, sex, admission diagnosis, and other laboratory blood test results as features. Approach 2 used the same features plus the new conditional entropy and conditional probability-based features. RESULTS Across the 18 blood laboratory tests, both Approach 1 and Approach 2 achieved a median F1-score, AUC, precision-recall AUC, and Gmean above 80%. We found that the inclusion of the new features statistically significantly improved the capacity to predict abnormal laboratory blood test results in between ten and fifteen laboratory blood tests depending on the machine learning model. CONCLUSIONS Our novel approach with promising prediction results can help reduce over-testing in ICUs, as well as risks for patients and healthcare systems. CLINICALTRIAL N/A

Universe ◽  
2021 ◽  
Vol 7 (11) ◽  
pp. 429
Yang Liu ◽  
Liming Wu ◽  
Tianqi Sun ◽  
Pengfei Zhang ◽  
Xi Fang ◽  

The light curve period of an asteroid plays an important role in determining the rotation period, the collision evolution and the YORP effect. There are many period extraction algorithms used to find the light curve period of asteroids with long term observation, which are mainly based on the frequency, time and time–frequency domains. This paper presents a comprehensive and unparalleled comparison of the popular algorithms based on the DAMIT (Database of Asteroid Models from Inversion Techniques) data set to show the statistical results. Considering the quoted period, absolute magnitude, diameter, albedo, time span and number of observations, we analyze the accuracy of five popular methods using the light curve data of 2902 asteroids. We find that although the performance of all the algorithms varies little, Phase Dispersion Minimization (PDM) performs better, followed by Lomb-Scargle (LS), while Conditional Entropy (CE) is not better than the others under certain conditions. We also analyze the cases which are more suitable for searching by frequencies or by periods.

Sign in / Sign up

Export Citation Format

Share Document