Statistical Methods

Author(s):  
Christer Samuelsson

Statistical methods now belong to mainstream natural language processing. They have been successfully applied to virtually all tasks within language processing and neighbouring fields, including part-of-speech tagging, syntactic parsing, semantic interpretation, lexical acquisition, machine translation, information retrieval, and information extraction and language learning. This article reviews mathematical statistics and applies it to language modelling problems, leading up to the hidden Markov model and maximum entropy model. The real strength of maximum-entropy modelling lies in combining evidence from several rules, each one of which alone might not be conclusive, but which taken together dramatically affect the probability. Maximum-entropy modelling allows combining heterogeneous information sources to produce a uniform probabilistic model where each piece of information is formulated as a feature. The key ideas of mathematical statistics are simple and intuitive, but tend to be buried in a sea of mathematical technicalities. Finally, the article provides mathematical detail related to the topic of discussion.

Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1372
Author(s):  
Sanjanasri JP ◽  
Vijay Krishna Menon ◽  
Soman KP ◽  
Rajendran S ◽  
Agnieszka Wolk

Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.


2022 ◽  
Vol 14 (1) ◽  
pp. 0-0

POS (Parts of Speech) tagging, a vital step in diverse Natural Language Processing (NLP) tasks has not drawn much attention in case of Odia a computationally under-developed language. The proposed hybrid method suggests a robust POS tagger for Odia. Observing the rich morphology of the language and unavailability of sufficient annotated text corpus a combination of machine learning and linguistic rules is adopted in the building of the tagger. The tagger is trained on tagged text corpus from the domain of tourism and is capable of obtaining a perceptible improvement in the result. Also an appreciable performance is observed for news articles texts of varied domains. The performance of proposed algorithm experimenting on Odia language shows its manifestation in dominating over existing methods like rule based, hidden Markov model (HMM), maximum entropy (ME) and conditional random field (CRF).


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260236
Author(s):  
Débora Torres ◽  
Wagner R. Sena ◽  
Humberto A. Carmona ◽  
André A. Moreira ◽  
Hernán A. Makse ◽  
...  

Reading is a complex cognitive process that involves primary oculomotor function and high-level activities like attention focus and language processing. When we read, our eyes move by primary physiological functions while responding to language-processing demands. In fact, the eyes perform discontinuous twofold movements, namely, successive long jumps (saccades) interposed by small steps (fixations) in which the gaze “scans” confined locations. It is only through the fixations that information is effectively captured for brain processing. Since individuals can express similar as well as entirely different opinions about a given text, it is therefore expected that the form, content and style of a text could induce different eye-movement patterns among people. A question that naturally arises is whether these individuals’ behaviours are correlated, so that eye-tracking while reading can be used as a proxy for text subjective properties. Here we perform a set of eye-tracking experiments with a group of individuals reading different types of texts, including children stories, random word generated texts and excerpts from literature work. In parallel, an extensive Internet survey was conducted for categorizing these texts in terms of their complexity and coherence, considering a large number of individuals selected according to different ages, gender and levels of education. The computational analysis of the fixation maps obtained from the gaze trajectories of the subjects for a given text reveals that the average “magnetization” of the fixation configurations correlates strongly with their complexity observed in the survey. Moreover, we perform a thermodynamic analysis using the Maximum-Entropy Model and find that coherent texts were closer to their corresponding “critical points” than non-coherent ones, as computed from the Pairwise Maximum-Entropy method, suggesting that different texts may induce distinct cohesive reading activities.


Author(s):  
G Deena ◽  
K Raja ◽  
K Kannan

: In this competing world, education has become part of everyday life. The process of imparting the knowledge to the learner through education is the core idea in the Teaching-Learning Process (TLP). An assessment is one way to identify the learner’s weak spot of the area under discussion. An assessment question has higher preferences in judging the learner's skill. In manual preparation, the questions are not assured in excellence and fairness to assess the learner’s cognitive skill. Question generation is the most important part of the teaching-learning process. It is clearly understood that generating the test question is the toughest part. Methods: Proposed an Automatic Question Generation (AQG) system which automatically generates the assessment questions dynamically from the input file. Objective: The Proposed system is to generate the test questions that are mapped with blooms taxonomy to determine the learner’s cognitive level. The cloze type questions are generated using the tag part-of-speech and random function. Rule-based approaches and Natural Language Processing (NLP) techniques are implemented to generate the procedural question of the lowest blooms cognitive levels. Analysis: The outputs are dynamic in nature to create a different set of questions at each execution. Here, input paragraph is selected from computer science domain and their output efficiency are measured using the precision and recall.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Arian Ashourvan ◽  
Preya Shah ◽  
Adam Pines ◽  
Shi Gu ◽  
Christopher W. Lynn ◽  
...  

AbstractA major challenge in neuroscience is determining a quantitative relationship between the brain’s white matter structural connectivity and emergent activity. We seek to uncover the intrinsic relationship among brain regions fundamental to their functional activity by constructing a pairwise maximum entropy model (MEM) of the inter-ictal activation patterns of five patients with medically refractory epilepsy over an average of ~14 hours of band-passed intracranial EEG (iEEG) recordings per patient. We find that the pairwise MEM accurately predicts iEEG electrodes’ activation patterns’ probability and their pairwise correlations. We demonstrate that the estimated pairwise MEM’s interaction weights predict structural connectivity and its strength over several frequencies significantly beyond what is expected based solely on sampled regions’ distance in most patients. Together, the pairwise MEM offers a framework for explaining iEEG functional connectivity and provides insight into how the brain’s structural connectome gives rise to large-scale activation patterns by promoting co-activation between connected structures.


Sign in / Sign up

Export Citation Format

Share Document