Log Likelihood Spectral Distance, Entropy Rate Power, and Mutual Information with Applications to Speech Coding

While the study of idiolect is a neglected area in corpus linguistics, the present article suggests that it can be fruitful. As a tool for studying an idiolect, the three-million-word Tony Blair Corpus is introduced. The maximiser collocations occurring in this corpus are compared to those in the BNC in order to identify those which are truly typical of the individual. A quantitative analysis involving three measures of collocational strength (normalised frequency, Mutual Information and log-likelihood) provide candidate collocations which strongly diverge in the two corpora. These are then subjected to tests of synonym preference and register specificity, resulting in a small number of collocations which can count as Blairisms.

Download Full-text

Automatic Term Recognition Using Hybrid Method Based on Rewriting and Statistic

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1049-1050.1544 ◽

2014 ◽

Vol 1049-1050 ◽

pp. 1544-1549

Author(s):

Wen Xiong

Keyword(s):

Mutual Information ◽

Hybrid Method ◽

Deep Processing ◽

Patent Data ◽

Two Phase ◽

Statistical Measures ◽

Log Likelihood ◽

C Value ◽

Automatic Term Recognition ◽

Significant Application

Machine aided human translation (MAHT) for the abstract of patent texts is an important step to the deep processing of the patent data, where the terms have significant application value. This paper investigates the automatic term recognition (ATR), and proposes a new hybrid method based on two-phase analysis and statistic to generate English candidate terms. The segments including stop words were not simply discarded; instead, a rewriting method using beginning patterns, ending patterns, and inner patterns on the phase two was employed for the processing of the segments. In the meantime, generalized statistical measures were used for the evaluation of the candidates such as the generalized mutual information (MI), Log-Likelihood Ratio (LLR), and C-value to filter the low score’s candidate terms and to attain the intersection set of them. The experiments on the patent abstract texts extracted randomly show the availability of the method.

Download Full-text

Maximum Mutual Information Vector Quantization of Log-Likelihood Ratios for Memory Efficient HARQ Implementations

2010 Data Compression Conference ◽

10.1109/dcc.2010.98 ◽

2010 ◽

Cited By ~ 11

Author(s):

Matteo Danieli ◽

Søren Forchhammer ◽

Jakob Dahl Andersen ◽

Lars P. B. Christensen ◽

Søren Skovgaard Christensen

Keyword(s):

Mutual Information ◽

Vector Quantization ◽

Likelihood Ratios ◽

Log Likelihood ◽

Maximum Mutual Information ◽

Information Vector ◽

Memory Efficient

Download Full-text

On “A mutual information estimator with exponentially decaying bias” by Zhang and Zheng

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2018-0005 ◽

2018 ◽

Vol 17 (2) ◽

Cited By ~ 2

Author(s):

Jialin Zhang ◽

Chen Chen

Keyword(s):

Mutual Information ◽

Asymptotic Normality ◽

Ratio Test ◽

Nonparametric Estimator ◽

Chi Square ◽

Testing Independence ◽

Log Likelihood ◽

Random Elements ◽

Chi Square Test ◽

Do So

Abstract Zhang, Z. and Zheng, L. (2015): “A mutual information estimator with exponentially decaying bias,” Stat. Appl. Genet. Mol. Biol., 14, 243–252, proposed a nonparametric estimator of mutual information developed in entropic perspective, and demonstrated that it has much smaller bias than the plugin estimator yet with the same asymptotic normality under certain conditions. However it is incorrectly suggested in their article that the asymptotic normality could be used for testing independence between two random elements on a joint alphabet. When two random elements are independent, the asymptotic distribution of $\sqrt{n}$-normed estimator degenerates and therefore the claimed normality does not hold. This article complements Zhang and Zheng by establishing a new chi-square test using the same entropic statistics for mutual information being zero. The three examples in Zhang and Zheng are re-worked using the new test. The results turn out to be much more sensible and further illustrate the advantage of the entropic perspective in statistical inference on alphabets. More specifically in Example 2, when a positive mutual information is known to exist, the new test detects it but the log likelihood ratio test fails to do so.

Download Full-text