minimum description length
Recently Published Documents


TOTAL DOCUMENTS

412
(FIVE YEARS 53)

H-INDEX

32
(FIVE YEARS 3)

PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0262244
Author(s):  
Geon Lee ◽  
Se-eun Yoon ◽  
Kijung Shin

Given a sequence of epidemic events, can a single epidemic model capture its dynamics during the entire period? How should we divide the sequence into segments to better capture the dynamics? Throughout human history, infectious diseases (e.g., the Black Death and COVID-19) have been serious threats. Consequently, understanding and forecasting the evolving patterns of epidemic events are critical for prevention and decision making. To this end, epidemic models based on ordinary differential equations (ODEs), which effectively describe dynamic systems in many fields, have been employed. However, a single epidemic model is not enough to capture long-term dynamics of epidemic events especially when the dynamics heavily depend on external factors (e.g., lockdown and the capability to perform tests). In this work, we demonstrate that properly dividing the event sequence regarding COVID-19 (specifically, the numbers of active cases, recoveries, and deaths) into multiple segments and fitting a simple epidemic model to each segment leads to a better fit with fewer parameters than fitting a complex model to the entire sequence. Moreover, we propose a methodology for balancing the number of segments and the complexity of epidemic models, based on the Minimum Description Length principle. Our methodology is (a) Automatic: not requiring any user-defined parameters, (b) Model-agnostic: applicable to any ODE-based epidemic models, and (c) Effective: effectively describing and forecasting the spread of COVID-19 in 70 countries.


2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Wenjin Xu ◽  
Shaokang Dong

With the development of the wireless network, location-based services (e.g., the place of interest recommendation) play a crucial role in daily life. However, the data acquired is noisy, massive, it is difficult to mine it by artificial intelligence algorithm. One of the fundamental problems of trajectory knowledge discovery is trajectory segmentation. Reasonable segmentation can reduce computing resources and improvement of storage effectiveness. In this work, we propose an unsupervised algorithm for trajectory segmentation based on multiple motion features (TS-MF). The proposed algorithm consists of two steps: segmentation and mergence. The segmentation part uses the Pearson coefficient to measure the similarity of adjacent trajectory points and extract the segmentation points from a global perspective. The merging part optimizes the minimum description length (MDL) value by merging local sub-trajectories, which can avoid excessive segmentation and improve the accuracy of trajectory segmentation. To demonstrate the effectiveness of the proposed algorithm, experiments are conducted on two real datasets. Evaluations of the algorithm’s performance in comparison with the state-of-the-art indicate the proposed method achieves the highest harmonic average of purity and coverage.


Entropy ◽  
2021 ◽  
Vol 24 (1) ◽  
pp. 59
Author(s):  
Baihan Lin

Inspired by the adaptation phenomenon of neuronal firing, we propose the regularity normalization (RN) as an unsupervised attention mechanism (UAM) which computes the statistical regularity in the implicit space of neural networks under the Minimum Description Length (MDL) principle. Treating the neural network optimization process as a partially observable model selection problem, the regularity normalization constrains the implicit space by a normalization factor, the universal code length. We compute this universal code incrementally across neural network layers and demonstrate the flexibility to include data priors such as top-down attention and other oracle information. Empirically, our approach outperforms existing normalization methods in tackling limited, imbalanced and non-stationary input distribution in image classification, classic control, procedurally-generated reinforcement learning, generative modeling, handwriting generation and question answering tasks with various neural network architectures. Lastly, the unsupervised attention mechanisms is a useful probing tool for neural networks by tracking the dependency and critical learning stages across layers and recurrent time steps of deep networks.


2021 ◽  
Author(s):  
Mathias Sablé-Meyer ◽  
Kevin Ellis ◽  
Joshua Tenenbaum ◽  
Stanislas Dehaene

Why do geometric shapes such as lines, circles, zig-zags or spirals appear in all human cultures, but are never produced by other animals? Here, we formalize and test the hypothesis that all humans possess a compositional language of thought that can produce line drawings as recursive combinations of a minimal set of geometric primitives. We present a programming language, similar to Logo, that combines discrete numbers and continuous integration in higher-level structures based on repetition, concatenation and embedding, and show that the simplest programs in this language generate the fundamental geometric shapes observed in human cultures. On the perceptual side, we propose that shape perception in humans involves searching for the shortest program that correctly draws the image (program induction). A consequence of this framework is that the mental difficulty of remembering a shape should depend on its minimum description length (MDL) in the proposed language. In two experiments, we show that encoding and processing of geometric shapes is well predicted by MDL. Furthermore, our hypotheses predict additive laws for the psychological complexity of repeated, concatenated or embedded shapes, which are experimentally validated.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Ezer Rasin ◽  
Iddo Berger ◽  
Nur Lan ◽  
Itamar Shefi ◽  
Roni Katzir

A linguistic theory reaches explanatory adequacy if it arrives at a linguistically-appropriate grammar based on the kind of input available to children. In phonology, we assume that children can succeed even when the input consists of surface evidence alone, with no corrections or explicit paradigmatic information – that is, in learning from distributional evidence. We take the grammar to include both a lexicon of underlying representations and a mapping from the lexicon to surface forms. Moreover, this mapping should be able to express optionality and opacity, among other textbook patterns. This learning challenge has not yet been addressed in the literature. We argue that the principle of Minimum Description Length (MDL) offers the right kind of guidance to the learner – favoring generalizations that are neither overly general nor overly specific – and can help the learner overcome the learning challenge. We illustrate with an implemented MDL learner that succeeds in learning various linguistically-relevant patterns from small corpora.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kenji Yamanishi ◽  
Linchuan Xu ◽  
Ryo Yuki ◽  
Shintaro Fukushima ◽  
Chuan-hao Lin

AbstractWe are concerned with the issue of detecting changes and their signs from a data stream. For example, when given time series of COVID-19 cases in a region, we may raise early warning signals of an epidemic by detecting signs of changes in the data. We propose a novel methodology to address this issue. The key idea is to employ a new information-theoretic notion, which we call the differential minimum description length change statistics (D-MDL), for measuring the scores of change sign. We first give a fundamental theory for D-MDL. We then demonstrate its effectiveness using synthetic datasets. We apply it to detecting early warning signals of the COVID-19 epidemic using time series of the cases for individual countries. We empirically demonstrate that D-MDL is able to raise early warning signals of events such as significant increase/decrease of cases. Remarkably, for about $$64\%$$ 64 % of the events of significant increase of cases in studied countries, our method can detect warning signals as early as nearly six days on average before the events, buying considerably long time for making responses. We further relate the warning signals to the dynamics of the basic reproduction number R0 and the timing of social distancing. The results show that our method is a promising approach to the epidemic analysis from a data science viewpoint.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 997
Author(s):  
Pham Thuc Hung ◽  
Kenji Yamanishi

In this paper, we propose a novel information criteria-based approach to select the dimensionality of the word2vec Skip-gram (SG). From the perspective of the probability theory, SG is considered as an implicit probability distribution estimation under the assumption that there exists a true contextual distribution among words. Therefore, we apply information criteria with the aim of selecting the best dimensionality so that the corresponding model can be as close as possible to the true distribution. We examine the following information criteria for the dimensionality selection problem: the Akaike’s Information Criterion (AIC), Bayesian Information Criterion (BIC), and Sequential Normalized Maximum Likelihood (SNML) criterion. SNML is the total codelength required for the sequential encoding of a data sequence on the basis of the minimum description length. The proposed approach is applied to both the original SG model and the SG Negative Sampling model to clarify the idea of using information criteria. Additionally, as the original SNML suffers from computational disadvantages, we introduce novel heuristics for its efficient computation. Moreover, we empirically demonstrate that SNML outperforms both BIC and AIC. In comparison with other evaluation methods for word embedding, the dimensionality selected by SNML is significantly closer to the optimal dimensionality obtained by word analogy or word similarity tasks.


2021 ◽  
Author(s):  
Vishnu Kumar Mishra ◽  
Megha Mishra ◽  
Bhupesh Kumar Dewangan ◽  
Tanupriya Choudhury

Abstract This paper highlighted Moving and Trajectory Object Cluster (MOTRACLUS) algorithm and analysis the sub-trajectories and real-trajectories algorithm for moving data and suggest a new approach of moving elements. This paper evaluates the Hurricane data measure and mass less data measure entropy of trajectories objects of moving data of Chhattisgarh location. The paper covered Prediction Generation with their distance cluster minimum description length (MDL) algorithm and others corresponding distance cluster (CLSTR) algorithm. This paper highlighted the k-nearest algorithm with least cluster section (LCSS) model and dimensional Euclidean of MDL algorithm. Our algorithm consists of two parts that is partitioning and grouping phase. This paper develops and enhances a cluster of trajectory object and calculates the actual distance of moving object. This algorithm works on the CLSTR algorithm and calculates Trajectory movement of object. In this we evaluate the entropy of moving object by consideration the heuristic parameter.


2021 ◽  
Author(s):  
Wes Bonifay

Traditional statistical model evaluation typically relies on goodness-of-fit testing and quantifying model complexity by counting parameters. Both of these practices may result in overfitting and have thereby contributed to the generalizability crisis. The information-theoretic principle of minimum description length addresses both of these concerns by filtering noise from the observed data and consequently increasing generalizability to unseen data.


2021 ◽  
Vol 70 ◽  
pp. 597-630
Author(s):  
Alex Mattenet ◽  
Ian Davidson ◽  
Siegfried Nijssen ◽  
Pierre Schaus

Block modeling has been used extensively in many domains including social science, spatial temporal data analysis and even medical imaging. Original formulations of the problem modeled it as a mixed integer programming problem, but were not scalable. Subsequent work relaxed the discrete optimization requirement, and showed that adding constraints is not straightforward in existing approaches. In this work, we present a new approach based on constraint programming, allowing discrete optimization of block modeling in a manner that is not only scalable, but also allows the easy incorporation of constraints. We introduce a new constraint filtering algorithm that outperforms earlier approaches, in both constrained and unconstrained settings, for an exhaustive search and for a type of local search called Large Neighborhood Search. We show its use in the analysis of real datasets. Finally, we show an application of the CP framework for model selection using the Minimum Description Length principle.


Sign in / Sign up

Export Citation Format

Share Document