Iterative Shannon Entropy - a Methodology to Quantify the Information Content of Value Range Dependent Data Distributions. Application to Descriptor and Compound Selectivity Profiling

2010 ◽  
Vol 29 (5) ◽  
pp. 432-440 ◽  
Author(s):  
Anne Mai Wassermann ◽  
Martin Vogt ◽  
Jürgen Bajorath
2012 ◽  
Vol 518-523 ◽  
pp. 1586-1591
Author(s):  
Hao Zhang ◽  
Ze Meng Zhao ◽  
Ahmet Palazoglu ◽  
Wei Sun

Surface ozone in the air boundary layer is one of the most harmful air pollutants produced by photochemical reaction between nitrogen oxides and volatile hydrocarbons, which causes great damage to human beings and environment. The prediction of surface ozone levels plays an important role in the control and the reduction of air pollutants. As model-driven statistical prediction models, hidden Markov Models (HMMs) are rich in mathematical structure and work well in many important applications. Due to the complex structure of HMM, long observation sequences would increase computational load by geometric ratio. In order to reduce training time, wavelet decomposition is used to compress the original observations into shorter ones. During compression step, observation sequences compressed by different wavelet basis functions keep different information content. This may have impact on prediction results. In this paper, ozone prediction performance of HMM based on different wavelet basis functions are discussed. Shannon entropy is employed to measure how much information content is kept in the new sequence compared to the original one. Data from Houston Metropolitan Area, TX are used in this paper. Results show that wavelet basis functions used in data compression step can affect the HMM model performance significantly. The new sequence with the maximum Shannon entropy generates the best prediction result.


2020 ◽  
Vol 19 (04) ◽  
pp. 2050043 ◽  
Author(s):  
Hamidreza Namazi

In this paper, we employ the information theory to analyze the development of brain as the newborn ages. We compute the Shannon entropy of Electroencephalography (EEG) signal during sleep for 10 groups of newborns who are aged 36 weeks to 45 weeks (first to the last group). Based on the obtained results, EEG signals for newborns in 36 weeks have the lowest information content, whereas EEG signals for newborns in 45 weeks show the greatest information content. Therefore, we concluded that the information content of EEG signal increases as the age of newborn increases. Th result of statistical analysis demonstrated that the influence of increment of age of newborn on the variations of informant content of their EEG signals was significant.


2019 ◽  
Vol 1 ◽  
pp. 1-1 ◽  
Author(s):  
Peichao Gao ◽  
Hong Zhang ◽  
Zhilin Li

<p><strong>Abstract.</strong> Entropy is an important concept that originated in thermodynamics. It is the subject of the famous Second Law of Thermodynamics, which states that “the entropy of a closed system increases continuously and irrevocably toward a maximum” (Huettner 1976, 102) or “the disorder in the universe always increases” (Framer and Cook 2013, 21). Accordingly, it has been widely regarded as an ideal measure of disorder. Its computation can be theoretically performed according to the Boltzmann equation, which was proposed by the Austrian physicist Ludwig Boltzmann in 1872. In practice, however, the Boltzmann equation involves two problems that are difficult to solve, that is the definition of the macrostate of a system and the determination of the number of possible microstates in the microstate. As noted by the American sociologist Kenneth Bailey, “when the notion of entropy is extended beyond physics, researchers may not be certain how to specify and measure the macrostate/microstate relations” (Bailey 2009, 151). As a result, this entropy (also referred to as Boltzmann entropy and thermodynamic entropy) has remained largely at a conceptual level.</p><p> In practice, the widely used entropy is actually proposed by the American mathematician, electrical engineer, and cryptographer Claude Elwood Shannon in 1948, hence the term Shannon entropy. Shannon entropy was proposed to quantify the statistical disorder of telegraph messages in the area of communications. The quantification result was interpreted as the information content of a telegraph message, hence also the term information entropy. This entropy has served as the cornerstone of information theory and was introduced to various fields including chemistry, biology, and geography. It has been widely utilized to quantify the information content of geographic data (or spatial data) in either a vector format (i.e., vector data) or a raster format (i.e., raster data). However, only the statistical information of spatial data can be quantified by using Shannon entropy. The spatial information is ignored by Shannon entropy; for example, a grey image and its corresponding error image share the same Shannon entropy.</p><p> Therefore, considerable efforts have been made to improve the suitability of Shannon entropy for spatial data, and a number of improved Shannon entropies have been put forward. Rather than further improving Shannon entropy, this study introduces a novel strategy, namely shifting back from Shannon entropy to Boltzmann entropy. There are two advantages of employing Boltzmann entropy. First, as previously mentioned, Boltzmann entropy is the ideal, standard measure of disorder or information. It is theoretically capable of quantifying not only the statistical information but also the spatial information of a data set. Second, Boltzmann entropy can serve as the bridge between spatial patterns and thermodynamic interpretations. In this sense, the Boltzmann entropy of spatial data may have wider applications. In this study, Boltzmann entropy is employed to quantify the spatial information of raster data, such as images, raster maps, digital elevation models, landscape mosaics, and landscape gradients. To this end, the macrostate of raster data is defined, and the number of all possible microstates in the macrostate is determined. To demonstrate the usefulness of Boltzmann entropy, it is applied to satellite remote sensing image processing, and a comparison is made between its performance and that of Shannon entropy.</p>


2020 ◽  
pp. 2150028
Author(s):  
Hamidreza Namazi ◽  
Ondrej Krejcar

One of the crucial areas of pregnancy research is to analyze the pregnancy development. For this purpose, scientists analyze the different conditions of fetuses to understand their development. In this paper, we conducted complexity and information-based analyses on Phonocardiogram (PCG) signals to investigate pregnancy development. We calculated the fractal dimension, approximate entropy, and sample entropy as the measures of complexity and the Shannon entropy as the measure of the information content of signals for 24 fetuses in four ranges of gestational weeks. Based on the obtained results, increasing the gestational age of fetuses is reflected on the increment of the complexity of their PCG signals. We also observed similar findings in the case of the information content of PCG signals. Among all calculated measures, the fractal dimension of PCG signals showed significant variations among different gestational weeks. The method of analysis can be used to evaluate the alterations of other biomedical signals of fetuses (e.g., heart rate) to investigate their development.


2020 ◽  
Vol 19 (04) ◽  
pp. 2050033 ◽  
Author(s):  
Hamidreza Namazi

Analysis of the brain activity is the major research area in human neuroscience. Besides many works that have been conducted on analysis of brain activity in case of healthy subjects, investigation of brain activity in case of patients with different brain disorders also has aroused the attention of many researchers. An interesting category of works belong to the comparison of brain activity between healthy subjects and patients with brain disorders. In this research, for the first time, we compare the brain activity between adolescents with symptoms of schizophrenia and healthy subjects, by information-based analysis of their Electroencephalography (EEG) signals. For this purpose, we benefit from the Shannon entropy as the indicator of information content. Based on the results of analysis, EEG signal in case of healthy subjects contains more information than EEG signal in case of subjects with schizophrenia. The result of statistical analysis showed the significant variation in the Shannon entropy of EEG signal between healthy adolescents and adolescents with symptoms of schizophrenia in case of P3, O1 and O2 channels. The employed method of analysis in this research can be further extended in order to investigate the variations in the information content of EEG signal in case of subjects with other brain disorders versus healthy subjects.


2016 ◽  
Author(s):  
Michael Kahnert ◽  
Emma Andersson

Abstract. We theoretically and numerically investigate the problem of assimilating lidar observations of extinction and backscattering coefficients of aerosols into a chemical transport model. More specifically, we consider the inverse problem of determining the chemical composition of aerosols from these observations. The main questions are how much information the observations contain to constrain the particles' chemical composition, and how one can optimise a chemical data assimilation system to make maximum use of the available information. We first quantify the information content of the measurements by computing the singular values of the observation operator. From the singular values we can compute the number of signal degrees of freedom and the reduction in Shannon entropy. For an observation standard deviation of 10 %, it is found that simultaneous measurements of extinction and backscattering allows us to constrain twice as many model variables as extinction measurements alone. The same holds for measurements at two wavelengths compared to measurements at a single wavelength. However, when we extend the set of measurements from two to three wavelengths then we observe only a small increase in the number of signal degrees of freedom, and a minor change in the Shannon entropy. The information content is strongly sensitive to the observation error; both the number of signal degrees of freedom and the reduction in Shannon entropy steeply decrease as the observation standard deviation increases in the range between 1 and 100 %. The right singular vectors of the observation operator can be employed to transform the model variables into a new basis in which the components of the state vector can be divided into signal-related and noise-related components. We incorporate these results in a chemical data assimilation algorithm by introducing weak constraints that restrict the assimilation algorithm to acting on the signal-related model variables only. This ensures that the information contained in the measurements is fully exploited, but not over-used. Numerical experiments confirm that the constrained data assimilation algorithm solves the inverse problem in a way that automatises the choice of control variables, and that restricts the minimisation of the costfunction to the signal-related model variables.


Fractals ◽  
2021 ◽  
Author(s):  
JANARTHANAN RAMADOSS ◽  
NORAZRYANA MAT DAWI ◽  
KARTHIKEYAN RAJAGOPAL ◽  
HAMIDREZA NAMAZI

In this paper, we analyzed the variations in brain activation between different activities. Since Electroencephalogram (EEG) signals as an indicator of brain activation contain information and have complex structures, we employed complexity and information-based analysis. Specifically, we used fractal theory and Shannon entropy for our analysis. Eight subjects performed three different activities (standing, walking, and walking with a brain–computer interface) while their EEG signals were recorded. Based on the results, the complexity and information content of EEG signals have the greatest and smallest values in walking and standing, respectively. Complexity and information-based analysis can be applied to analyze the activations of other organs in different conditions.


Sign in / Sign up

Export Citation Format

Share Document