bigram frequency
Recently Published Documents


TOTAL DOCUMENTS

27
(FIVE YEARS 6)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
Vol 6 ◽  
Author(s):  
Kyla McConnell ◽  
Alice Blumenthal-Dramé

While it is widely acknowledged that both predictive expectations and retrodictive integration influence language processing, the individual differences that affect these two processes and the best metrics for observing them have yet to be fully described. The present study aims to contribute to the debate by investigating the extent to which experienced-based variables modulate the processing of word pairs (bigrams). Specifically, we investigate how age and reading experience correlate with lexical anticipation and integration, and how this effect can be captured by the metrics of forward and backward transition probability (TP). Participants read more and less strongly associated bigrams, paired in sets of four to control for known lexical covariates such as bigram frequency and semantic meaning (i.e., absolute control, total control, absolute silence, total silence) in a self-paced reading (SPR) task. They additionally completed assessments of exposure to print text (Author Recognition Test, Shipley vocabulary assessment, Words that Go Together task) and provided their age. Results show that both older age and lesser reading experience individually correlate with stronger TP effects. Moreover, TP effects differ across the spillover region (the two words following the noun in the bigram).


2021 ◽  
pp. 088626052110014
Author(s):  
Jason M. Baik ◽  
Thet H. Nyein ◽  
Sepideh Modrek

Online social media movements are now common and support cultural discussions on difficult health and social topics. The #MeToo movement, focusing on the pervasiveness of sexual assault and harassment, has been one of the largest and most influential online movements. Our study examines topics of conversation on Twitter by supporters of the #MeToo movement and by Twitter users who were uninvolved in the movement to explore the extent to which tweet topics for these two groups converge over time. We identify and collect one year’s worth of tweets for supporters of the #MeToo movement ( N = 168 users; N = 105,538 tweets) and users not involved in the movement ( N = 147 users; N = 112,301 tweets referred to as the Neutral Sample). We conduct topic frequency analysis and implement an unsupervised machine learning topic modeling algorithm, latent Dirichlet allocation, to explore topics of discussion on Twitter for these two groups of users before and after the initial #MeToo movement. Our results suggest that supporters of #MeToo discussed different topics compared to the Neutral Sample of Twitter users before #MeToo with some overlap on politics. The supporters were already discussing sexual assault and harassment issues six months before #MeToo, and discussion on this topic increased 13.7-fold in the six months after. For the Neutral Sample, sexual assault and harassment was not a key topic of discussion on Twitter before #MeToo, but there was some limited increase afterward. Results of bigram frequency analysis and topic modeling showed a clear increase in topic related to gender for the supporters of #MeToo but gave mixed results for the Neutral Sample comparison group. Our results suggest limited shifts in the conversation on Twitter for the Neutral Sample. Our methods and results have implications for measuring the extent to which online social media movements, like #MeToo, reach a broad audience.


2020 ◽  
pp. 174702182096906
Author(s):  
Todd A Kahan ◽  
Louisa M Slowiaczek ◽  
Ned Scott ◽  
Brian T Pfohl

Whether attention is allocated to an entire word or can be confined to part of a word was examined in an experiment using a visual composite task. Participants saw a study word, a cue to attend to either the right or left half, and a test word, and indicated if the cued half of the words (e.g., left) was the same (e.g., TOLD-TONE) or different (e.g., TOLD-WINE). Prior research using this task reports a larger congruency effect for low-frequency words relative to high-frequency words but extraneous variables were not equated. In this study ( N = 33), lexical (orthographic neighbourhood density) and sublexical (bigram frequency) variables were controlled, and word frequency was manipulated. Results indicate that word frequency does not moderate the degree to which parts of a word can be selectively attended/ignored. Response times to high-frequency words were faster than response times to low-frequency words but the congruency effect was equivalent. The data support a capacity model where attention is equally distributed across low-frequency and high-frequency words but low-frequency words require additional processing resources.


2020 ◽  
Author(s):  
Efthymia C Kapnoula ◽  
Athanassios Protopapas ◽  
Steven J. Saunders ◽  
Max Coltheart

We evaluated the dual route cascaded (DRC) model of visual word recognition using Greek behavioural data on word and nonword naming and lexical decision, focusing on the effects of syllable and bigram frequency. DRC was modified to process polysyllabic Greek words and nonwords. The Greek DRC and native speakers of Greek were presented with the same sets of word and nonword stimuli, spanning a wide range on several psycholinguistic variables, and the sensitivity of the model to lexical and sublexical variables was compared to the effects of these factors on the behavioural data. DRC pronounced correctly all the stimuli and successfully simulated the effects of frequency in words, and of length and bigram frequency in nonwords. However, unlike native speakers of Greek, DRC failed to demonstrate sensitivity to word length and syllabic frequency. We discuss the significance of these findings in constraining models of visual word recognition.


Author(s):  
Robin Crockett ◽  
Kirstie Best

We report a stylometric investigation of a portfolio of 20 assignments submitted by an individual student over two consecutive academic years. This investigation followed a formal disciplinary investigation which had identified that eight of the assignments had been ghostwritten, with seven of those showing explicit ghost-writer ID information and three of those showing ID information from the same commercial provider. The stylometric investigation involved a conventional word and bigram frequency analysis and a prototype word complexity analysis. The word and bigram analysis identified four consistent groups of assignments, which associate other assignments with the eight known to have been ghost-written, indicating that those were probably also ghost-written. One of those groups comprises the three assignments from the same provider, plus another assignment, implying that the provider has a ‘house style’ and that the other assignment also came from that provider. The prototype analysis clearly categorised the core members of two of those same groups, including the group from the identified provider, adding further weight those associations. More generally, this investigation shows that it is possible to categorise assignments according to aspects of writing style: we would have obtained the same groups even if we had not possessed the ghost-writer ID information. Where such consistent groups are identified it implies, on balance of probabilities, multiple authorship of assignments and that the student concerned cannot have written all the submitted assignments and that some were ghost-written.


2019 ◽  
Vol 0 (0) ◽  
Author(s):  
Kyla McConnell ◽  
Alice Blumenthal-Dramé

AbstractIn the following self-paced reading study, we assess the cognitive realism of six widely used corpus-derived measures of association strength between words (collocated modifier–noun combinations likevast majority): MI, MI3, Dice coefficient,T-score,Z-score, and log-likelihood. The ability of these collocation metrics to predict reading times is tested against predictors of lexical processing cost that are widely established in the psycholinguistic and usage-based literature, respectively: forward/backward transition probability and bigram frequency. In addition, the experiment includes the treatment variable oftask: it is split into two blocks which only differ in the format of interleaved comprehension questions (multiple choice vs. typed free response). Results show that the traditional corpus-linguistic metrics are outperformed by both backward transition probability and bigram frequency. Moreover, the multiple-choice condition elicits faster overall reading times than the typed condition, and the two winning metrics show stronger facilitation on the critical word (i.e. the noun in the bigrams) in the multiple-choice condition. In the typed condition, we find an effect that is weaker and, in the case of bigram frequency, longer lasting, continuing into the first spillover word. We argue that insufficient attention to task effects might have obscured the cognitive correlates of association scores in earlier research.


2018 ◽  
Author(s):  
Muhammad Moinuddin ◽  
Wasim Aftab ◽  
Adnan Memic

AbstractPDZ domains represent one of the most common protein homology regions playing key roles in several diseases. Point mutations (PM) in amino acid primary sequence of PDZ domains can alter domain functions by affecting for example, downstream phosphorylation, a pivotal process in biology. Our goal in this present study was to introduce a novel approach to investigate how point mutations within the Class 1, Class 2 and Class 1–2 PDZ domains could affect the changes in binding with their partner ligands and hence affect their classification. We focused on features in PDZ domains of various species including human, rat and mouse. However, our work represents a generic computational framework that could be used to analyze PM in any given PDZ sequence. We have adopted two different approaches to investigate the impact of PM. In the first approach, we have developed a statistical model using bigram frequencies of amino acids and employed six different similarity measures to contrast the bigram frequency distribution of a wild type sequence relevant to its point mutants. In the next approach, we developed a statistical method that incorporates the impact of bigram frequency history associated with each mutational site that we call history weighted conditional change in probabilities. In this PM study, we observed that the history weighted method performs best when compared to all other methods studied in terms of picking up sites in PDZ domain where a PM could flip the class. We anticipate that this method will present a step forward towards computational techniques unveiling PDZ domain point mutants that largely affect the protein-ligand binding, specificity and affinity. We hope that this and future studies could aid therapy in which PDZ mutations have been implicated as the main disease drivers such as the Usher syndrome.


2017 ◽  
Vol 12 (2) ◽  
pp. 263-282 ◽  
Author(s):  
Xenia Schmalz ◽  
Claudio Mulatti

Abstract Psycholinguistic researchers identify linguistic variables and assess if they affect cognitive processes. One such variable is letter bigram frequency, or the frequency with which a given letter pair co-occurs in an orthography. While early studies reported that bigram frequency affects visual lexical decision, subsequent, well-controlled studies not shown this effect. Still, researchers continue to use it as a control variable in psycholinguistic experiments. We propose two reasons for the persistence of this variable: (1) Reporting no significant effect of bigram frequency cannot provide evidence for no effect. (2) Despite empirical work, theoretical implications of bigram frequency are largely neglected. We perform Bayes Factor analyses to address the first issue. In analyses of existing large-scale databases, we find no effect of bigram frequency in lexical decision in the British Lexicon Project, and some evidence for an inhibitory effect in the English Lexicon Project. We find strong evidence for an effect in reading aloud. This suggests that, for lexical decision, the effect is unstable, and may depend on item characteristics and task demands rather than reflecting cognitive processes underlying visual word recognition. We call for more consideration of theoretical implications of the presence or absence of a bigram frequency effect.


2017 ◽  
Author(s):  
Xenia Schmalz ◽  
Claudio Mulatti

Psycholinguistic researchers identify linguistic variables and assess if and how these affect cognitive processes. One such variable is letter bigram frequency, or the frequency with which a given pair of letters co-occurs in an orthography. While early studies have shown that bigram frequency affects visual word recognition, subsequent, well-controlled studies have failed to show such an effect. Still, researchers continue to use it as a control variable in psycholinguistic experiments. We propose two reasons for the persistence of this variable: (1) Studies have reported no evidence for an effect of bigram frequency, but this cannot provide evidence for no effect. (2) The theoretical implications of a bigram frequency have been largely neglected. We address the first issue by performing Bayes Factor tests on both a matched item set and large-scale studies, and the second by discussing possible theoretical implications. We find no effects of bigram frequency effects for lexical decisions, though there is some evidence for an effect in reading aloud.


2016 ◽  
Vol 38 (2) ◽  
pp. 427-456 ◽  
Author(s):  
JAMES BARTOLOTTI ◽  
VIORICA MARIAN

ABSTRACTMany adults struggle with second language acquisition but learn new native-language words relatively easily. We investigated the role of sublexical native-language patterns on novel word acquisition. Twenty English monolinguals learned 48 novel written words in five repeated testing blocks. Half were orthographically wordlike (e.g., nish, high neighborhood density and high segment/bigram frequency), while half were not (e.g., gofp, low neighborhood density and low segment/bigram frequency). Participants were faster and more accurate at recognizing and producing wordlike items, indicating a native-language similarity benefit. Individual differences in memory and vocabulary size influenced learning, and error analyses indicated that participants extracted probabilistic information from the novel vocabulary. Results suggest that language learners benefit from both native-language overlap and regularities within the novel language.


Sign in / Sign up

Export Citation Format

Share Document