Statistical Methods

Mapping Intimacies ◽

10.1093/oxfordhb/9780199276349.013.0019 ◽

2012 ◽

Author(s):

Christer Samuelsson

Keyword(s):

Language Learning ◽

Maximum Entropy ◽

Statistical Methods ◽

Language Processing ◽

Mathematical Statistics ◽

Semantic Interpretation ◽

Entropy Model ◽

Heterogeneous Information ◽

Part Of Speech ◽

Maximum Entropy Modelling

Statistical methods now belong to mainstream natural language processing. They have been successfully applied to virtually all tasks within language processing and neighbouring fields, including part-of-speech tagging, syntactic parsing, semantic interpretation, lexical acquisition, machine translation, information retrieval, and information extraction and language learning. This article reviews mathematical statistics and applies it to language modelling problems, leading up to the hidden Markov model and maximum entropy model. The real strength of maximum-entropy modelling lies in combining evidence from several rules, each one of which alone might not be conclusive, but which taken together dramatically affect the probability. Maximum-entropy modelling allows combining heterogeneous information sources to produce a uniform probabilistic model where each piece of information is formulated as a feature. The key ideas of mathematical statistics are simple and intuitive, but tend to be buried in a sea of mathematical technicalities. Finally, the article provides mathematical detail related to the topic of discussion.

Download Full-text

Generation of Cross-Lingual Word Vectors for Low-Resourced Languages Using Deep Learning and Topological Metrics in a Data-Efficient Way

Electronics ◽

10.3390/electronics10121372 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1372

Author(s):

Sanjanasri JP ◽

Vijay Krishna Menon ◽

Soman KP ◽

Rajendran S ◽

Agnieszka Wolk

Keyword(s):

Deep Learning ◽

Language Processing ◽

Semantic Space ◽

Semantic Interpretation ◽

Learning Approaches ◽

Qualitative Comparison ◽

Bilingual Dictionary ◽

Pos Tagging ◽

Part Of Speech ◽

Cross Lingual

Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.

Download Full-text

Part-of-speech tagger based on maximum entropy model

2009 2nd IEEE International Conference on Computer Science and Information Technology ◽

10.1109/iccsit.2009.5234787 ◽

2009 ◽

Cited By ~ 3

Author(s):

Heyan Huang ◽

Xiaofei Zhang

Keyword(s):

Maximum Entropy ◽

Maximum Entropy Model ◽

Entropy Model ◽

Part Of Speech

Download Full-text

A Modified Markov Based Maximum-entropy Model for POS Tagging of Odia Text

International Journal of Decision Support System Technology ◽

10.4018/ijdsst.286690 ◽

2022 ◽

Vol 14 (1) ◽

pp. 0-0

Keyword(s):

Maximum Entropy ◽

Language Processing ◽

Conditional Random Field ◽

Entropy Model ◽

Text Corpus ◽

Parts Of Speech ◽

Pos Tagging ◽

Linguistic Rules ◽

The Rich ◽

Pos Tagger

POS (Parts of Speech) tagging, a vital step in diverse Natural Language Processing (NLP) tasks has not drawn much attention in case of Odia a computationally under-developed language. The proposed hybrid method suggests a robust POS tagger for Odia. Observing the rich morphology of the language and unavailability of sufficient annotated text corpus a combination of machine learning and linguistic rules is adopted in the building of the tagger. The tagger is trained on tagged text corpus from the domain of tourism and is capable of obtaining a perceptible improvement in the result. Also an appreciable performance is observed for news articles texts of varied domains. The performance of proposed algorithm experimenting on Odia language shows its manifestation in dominating over existing methods like rule based, hidden Markov model (HMM), maximum entropy (ME) and conditional random field (CRF).

Download Full-text

Fusion of Word Clustering Features for Tibetan Part of Speech Tagging Based on Maximum Entropy Model

International Journal of Simulation Systems Science & Technology ◽

10.5013/ijssst.a.17.08.19 ◽

2016 ◽

Author(s):

Special Issues Editor

Keyword(s):

Maximum Entropy ◽

Maximum Entropy Model ◽

Entropy Model ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Word Clustering ◽

Speech Tagging

Download Full-text

A New Method of the Automatically Marked Chinese Part of Speech Based on Gaussian Prior Smoothing Maximum Entropy Model

Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) ◽

10.1109/fskd.2007.86 ◽

2007 ◽

Cited By ~ 2

Author(s):

Wei Zhao ◽

Faxing Zhao ◽

Wenhui Li

Keyword(s):

Maximum Entropy ◽

New Method ◽

Maximum Entropy Model ◽

Entropy Model ◽

Part Of Speech ◽

Gaussian Prior

Download Full-text

Part-of-Speech Tagging and PP Attachment Disambiguation Using a Boosted Maximum Entropy Model

PRICAI 2004: Trends in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-28633-2_99 ◽

2004 ◽

pp. 930-931

Author(s):

Seong-Bae Park ◽

Jangmin O ◽

Sang-Jo Lee

Keyword(s):

Maximum Entropy ◽

Maximum Entropy Model ◽

Entropy Model ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Eye-tracking as a proxy for coherence and complexity of texts

PLoS ONE ◽

10.1371/journal.pone.0260236 ◽

2021 ◽

Vol 16 (12) ◽

pp. e0260236

Author(s):

Débora Torres ◽

Wagner R. Sena ◽

Humberto A. Carmona ◽

André A. Moreira ◽

Hernán A. Makse ◽

...

Keyword(s):

Eye Tracking ◽

Maximum Entropy ◽

Language Processing ◽

Entropy Method ◽

Internet Survey ◽

Entropy Model ◽

The Gaze ◽

Attention Focus ◽

Average Magnetization ◽

High Level

Reading is a complex cognitive process that involves primary oculomotor function and high-level activities like attention focus and language processing. When we read, our eyes move by primary physiological functions while responding to language-processing demands. In fact, the eyes perform discontinuous twofold movements, namely, successive long jumps (saccades) interposed by small steps (fixations) in which the gaze “scans” confined locations. It is only through the fixations that information is effectively captured for brain processing. Since individuals can express similar as well as entirely different opinions about a given text, it is therefore expected that the form, content and style of a text could induce different eye-movement patterns among people. A question that naturally arises is whether these individuals’ behaviours are correlated, so that eye-tracking while reading can be used as a proxy for text subjective properties. Here we perform a set of eye-tracking experiments with a group of individuals reading different types of texts, including children stories, random word generated texts and excerpts from literature work. In parallel, an extensive Internet survey was conducted for categorizing these texts in terms of their complexity and coherence, considering a large number of individuals selected according to different ages, gender and levels of education. The computational analysis of the fixation maps obtained from the gaze trajectories of the subjects for a given text reveals that the average “magnetization” of the fixation configurations correlates strongly with their complexity observed in the survey. Moreover, we perform a thermodynamic analysis using the Maximum-Entropy Model and find that coherent texts were closer to their corresponding “critical points” than non-coherent ones, as computed from the Pairwise Maximum-Entropy method, suggesting that different texts may induce distinct cohesive reading activities.

Download Full-text

Chinese Word Sense Disambiguation Based on Maximum Entropy Model with Feature Selection

Journal of Software ◽

10.3724/sp.j.1001.2010.03591 ◽

2010 ◽

Vol 21 (6) ◽

pp. 1287-1295 ◽

Cited By ~ 7

Author(s):

Jing-Zhou HE ◽

Hou-Feng WANG

Keyword(s):

Feature Selection ◽

Maximum Entropy ◽

Word Sense Disambiguation ◽

Word Sense ◽

Chinese Word ◽

Maximum Entropy Model ◽

Entropy Model ◽

Sense Disambiguation

Download Full-text

An Automatic Question Generation System using Rule-Based Approach in Bloom’s Taxonomy

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666191113143335 ◽

2019 ◽

Vol 13 ◽

Author(s):

G Deena ◽

K Raja ◽

K Kannan

Keyword(s):

Language Processing ◽

Learning Process ◽

Question Generation ◽

Test Question ◽

Rule Based ◽

Part Of Speech ◽

Core Idea ◽

Rule Based Approach ◽

Teaching Learning ◽

Automatic Question Generation

: In this competing world, education has become part of everyday life. The process of imparting the knowledge to the learner through education is the core idea in the Teaching-Learning Process (TLP). An assessment is one way to identify the learner’s weak spot of the area under discussion. An assessment question has higher preferences in judging the learner's skill. In manual preparation, the questions are not assured in excellence and fairness to assess the learner’s cognitive skill. Question generation is the most important part of the teaching-learning process. It is clearly understood that generating the test question is the toughest part. Methods: Proposed an Automatic Question Generation (AQG) system which automatically generates the assessment questions dynamically from the input file. Objective: The Proposed system is to generate the test questions that are mapped with blooms taxonomy to determine the learner’s cognitive level. The cloze type questions are generated using the tag part-of-speech and random function. Rule-based approaches and Natural Language Processing (NLP) techniques are implemented to generate the procedural question of the lowest blooms cognitive levels. Analysis: The outputs are dynamic in nature to create a different set of questions at each execution. Here, input paragraph is selected from computer science domain and their output efficiency are measured using the precision and recall.

Download Full-text

Pairwise maximum entropy model explains the role of white matter structure in shaping emergent co-activation states

Communications Biology ◽

10.1038/s42003-021-01700-6 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Arian Ashourvan ◽

Preya Shah ◽

Adam Pines ◽

Shi Gu ◽

Christopher W. Lynn ◽

...

Keyword(s):

White Matter ◽

Maximum Entropy ◽

Large Scale ◽

Structural Connectivity ◽

Quantitative Relationship ◽

Brain Regions ◽

Maximum Entropy Model ◽

Entropy Model ◽

Activation Patterns ◽

Co Activation

AbstractA major challenge in neuroscience is determining a quantitative relationship between the brain’s white matter structural connectivity and emergent activity. We seek to uncover the intrinsic relationship among brain regions fundamental to their functional activity by constructing a pairwise maximum entropy model (MEM) of the inter-ictal activation patterns of five patients with medically refractory epilepsy over an average of ~14 hours of band-passed intracranial EEG (iEEG) recordings per patient. We find that the pairwise MEM accurately predicts iEEG electrodes’ activation patterns’ probability and their pairwise correlations. We demonstrate that the estimated pairwise MEM’s interaction weights predict structural connectivity and its strength over several frequencies significantly beyond what is expected based solely on sampled regions’ distance in most patients. Together, the pairwise MEM offers a framework for explaining iEEG functional connectivity and provides insight into how the brain’s structural connectome gives rise to large-scale activation patterns by promoting co-activation between connected structures.

Download Full-text