pointwise mutual information
Recently Published Documents


TOTAL DOCUMENTS

51
(FIVE YEARS 26)

H-INDEX

6
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Sefa Kucuk ◽  
Seniha Esen Yuksel

Sparse unmixing (SU) aims to express the observed image signatures as a linear combination of pure spectra known a priori and has become a very popular technique with promising results in analyzing hyperspectral images (HSI) over the past ten years. In SU, utilizing the spatial-contextual information allows for more realistic abundance estimation. To make full use of the spatial-spectral information, in this letter, we propose a pointwise mutual information (PMI) based graph Laplacian regularization for SU. Specifically, we construct the affinity matrices via PMI by modeling the association between neighboring image features through a statistical framework, and then we use them in the graph Laplacian regularizer. We also adopt a double reweighted $\ell_{1}$ norm minimization scheme to promote the sparsity of fractional abundances. Experimental results on simulated and real data sets prove the effectiveness of the proposed method and its superiority over competing algorithms in the literature.


2021 ◽  
Author(s):  
Sefa Kucuk ◽  
Seniha Esen Yuksel

Sparse unmixing (SU) aims to express the observed image signatures as a linear combination of pure spectra known a priori and has become a very popular technique with promising results in analyzing hyperspectral images (HSI) over the past ten years. In SU, utilizing the spatial-contextual information allows for more realistic abundance estimation. To make full use of the spatial-spectral information, in this letter, we propose a pointwise mutual information (PMI) based graph Laplacian regularization for SU. Specifically, we construct the affinity matrices via PMI by modeling the association between neighboring image features through a statistical framework, and then we use them in the graph Laplacian regularizer. We also adopt a double reweighted $\ell_{1}$ norm minimization scheme to promote the sparsity of fractional abundances. Experimental results on simulated and real data sets prove the effectiveness of the proposed method and its superiority over competing algorithms in the literature.


2021 ◽  
Author(s):  
Søren Wichmann

The present work is aimed at (1) developing a search machine adapted to the large DReaM corpus of linguistic descriptive literature and (2) getting insights into how a data-driven ontology of linguistic terminology might be built. Starting from close to 20,000 text documents from the literature of language descriptions, from documents either born digitally or scanned and OCR’d, we extract keywords and pass them through a pruning pipeline where mainly keywords that can be considered as belonging to linguistic terminology survive. Subsequently we quantify relations among those terms using Normalized Pointwise Mutual Information (NPMI) and use the resulting measures, in conjunction with the Google Page Rank (GPR), to build networks of linguistic terms.


2021 ◽  
Author(s):  
Kun Sun

This study intends to clarify syntactic problems concerning dangling topic construction in Mandarin Chinese by using probabilistic approaches. The relationship between the topic and the rest of the construction is often analyzed syntactically in studies of topic construction in Mandarin Chinese. However, dangling topic construction, wherein the topic in the construction seems to dangle, is not clarified by the syntactic approaches. Accordingly, previous studies had to implicitly make use of other approaches (such as pragmatics or semantic-pragmatics) to advance their arguments. The difficulty is that the concepts of pragmatics used in these studies are very vague and subjective. In order to tackle this problem, this paper explicitly computes the relation in dangling topic construction in Chinese using pointwise mutual information and Bayesian models. We also collected experiential data using human ratings of the acceptance degree for a set of dangling topic constructions. The result demonstrates that the pointwise mutual information and Bayesian model are well predicative of the data on the human rating of these dangling topic constructions. This approach is likely to shed more light on the notion of topic construction and to help in understanding how Chinese speakers comprehend and process sentences and come to understand their meaning. More importantly, this study creates a novel, effective and practical computational approach to sentence processing, syntactic analysis and pragmatics studies.


Author(s):  
Thomas Bos ◽  
Flavius Frasincar

AbstractFinancial investors make trades based on available information. Previous research has proved that microblogs are a useful source for supporting stock market decisions. However, the financial domain lacks specific sentiment lexicons that could be utilized to extract the sentiment from these microblogs. In this research, we investigate automatic approaches that can be used to build financial sentiment lexicons. We introduce weighted versions of the Pointwise Mutual Information approaches to build sentiment lexicons automatically. Furthermore, existing sentiment lexicons often neglect negation while building the sentiment lexicons. In this research, we also propose two methods (Negated Word and Flip Sentiment) to extend the sentiment building approaches to take into account negation when constructing a sentiment lexicon. We build the financial sentiment lexicons by leveraging 200,000 messages from StockTwits. We evaluate the constructed financial sentiment lexicons in two different sentiment classification tasks (unsupervised and supervised). In addition, the created financial sentiment lexicons are compared with each other and with other existing sentiment lexicons. The best performing financial sentiment lexicon is built by combining our Weighted Normalized Pointwise Mutual Information approach with the Negated Word approach. It outperforms all the other sentiment lexicons in the two sentiment classification tasks. In the unsupervised sentiment classification task, it has, on average, a balanced accuracy of 69.4%, and in the supervised setting, a balanced accuracy of 75.1%. Moreover, the various sentiment classification tasks confirm that the sentiment lexicons could be improved by taking into account negation while building the sentiment lexicons. The improvement could be made by using one of the proposed methods to incorporate negation in the sentiment lexicon construction process.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
I. Čmelo ◽  
M. Voršilák ◽  
D. Svozil

AbstractPointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize several publicly available databases (DrugBank, ChEMBL, PubChem and ZINC) in terms of association strength between compound structural features resulting in database PMI interrelation profiles. As structural features, substructure fragments obtained by coding individual compounds as MACCS, PubChemKey and ECFP fingerprints are used. The analysis of publicly available databases reveals, in accord with other studies, unusual properties of DrugBank compounds which further confirms the validity of PMI profiling approach. Z-standardized relative feature tightness (ZRFT), a PMI-derived measure that quantifies how well the given compound’s feature combinations fit these in a particular compound set, is applied for the analysis of compound synthetic accessibility (SA), as well as for the classification of compounds as easy (ES) and hard (HS) to synthesize. ZRFT value distributions are compared with these of SYBA and SAScore. The analysis of ZRFT values of structurally complex compounds in the SAVI database reveals oligopeptide structures that are mispredicted by SAScore as HS, while correctly predicted by ZRFT and SYBA as ES. Compared to SAScore, SYBA and random forest, ZRFT predictions are less accurate, though by a narrow margin (AccZRFT = 94.5%, AccSYBA = 98.8%, AccSAScore = 99.0%, AccRF = 97.3%). However, ZRFT ability to distinguish between ES and HS compounds is surprisingly high considering that while SYBA, SAScore and random forest are dedicated SA models, ZRFT is a generic measurement that merely quantifies the strength of interrelations between structural feature pairs. The results presented in the current work indicate that structural feature co-occurrence, quantified by PMI or ZRFT, contains a significant amount of information relevant to physico-chemical properties of organic compounds.


Sign in / Sign up

Export Citation Format

Share Document