bigram frequency Latest Research Papers

While it is widely acknowledged that both predictive expectations and retrodictive integration influence language processing, the individual differences that affect these two processes and the best metrics for observing them have yet to be fully described. The present study aims to contribute to the debate by investigating the extent to which experienced-based variables modulate the processing of word pairs (bigrams). Specifically, we investigate how age and reading experience correlate with lexical anticipation and integration, and how this effect can be captured by the metrics of forward and backward transition probability (TP). Participants read more and less strongly associated bigrams, paired in sets of four to control for known lexical covariates such as bigram frequency and semantic meaning (i.e., absolute control, total control, absolute silence, total silence) in a self-paced reading (SPR) task. They additionally completed assessments of exposure to print text (Author Recognition Test, Shipley vocabulary assessment, Words that Go Together task) and provided their age. Results show that both older age and lesser reading experience individually correlate with stronger TP effects. Moreover, TP effects differ across the spillover region (the two words following the noun in the bigram).

Download Full-text

Social Media Activism and Convergence in Tweet Topics After the Initial #MeToo Movement for Two Distinct Groups of Twitter Users

Journal of Interpersonal Violence ◽

10.1177/08862605211001481 ◽

2021 ◽

pp. 088626052110014

Author(s):

Jason M. Baik ◽

Thet H. Nyein ◽

Sepideh Modrek

Keyword(s):

Social Media ◽

Sexual Assault ◽

Frequency Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Bigram Frequency ◽

Media Activism ◽

Online Social Media ◽

Before And After ◽

Twitter Users

Online social media movements are now common and support cultural discussions on difficult health and social topics. The #MeToo movement, focusing on the pervasiveness of sexual assault and harassment, has been one of the largest and most influential online movements. Our study examines topics of conversation on Twitter by supporters of the #MeToo movement and by Twitter users who were uninvolved in the movement to explore the extent to which tweet topics for these two groups converge over time. We identify and collect one year’s worth of tweets for supporters of the #MeToo movement ( N = 168 users; N = 105,538 tweets) and users not involved in the movement ( N = 147 users; N = 112,301 tweets referred to as the Neutral Sample). We conduct topic frequency analysis and implement an unsupervised machine learning topic modeling algorithm, latent Dirichlet allocation, to explore topics of discussion on Twitter for these two groups of users before and after the initial #MeToo movement. Our results suggest that supporters of #MeToo discussed different topics compared to the Neutral Sample of Twitter users before #MeToo with some overlap on politics. The supporters were already discussing sexual assault and harassment issues six months before #MeToo, and discussion on this topic increased 13.7-fold in the six months after. For the Neutral Sample, sexual assault and harassment was not a key topic of discussion on Twitter before #MeToo, but there was some limited increase afterward. Results of bigram frequency analysis and topic modeling showed a clear increase in topic related to gender for the supporters of #MeToo but gave mixed results for the Neutral Sample comparison group. Our results suggest limited shifts in the conversation on Twitter for the Neutral Sample. Our methods and results have implications for measuring the extent to which online social media movements, like #MeToo, reach a broad audience.

Download Full-text

Word frequency does not moderate the degree to which people can selectively attend to parts of visually presented words

Quarterly Journal of Experimental Psychology ◽

10.1177/1747021820969069 ◽

2020 ◽

pp. 174702182096906

Author(s):

Todd A Kahan ◽

Louisa M Slowiaczek ◽

Ned Scott ◽

Brian T Pfohl

Keyword(s):

Word Frequency ◽

High Frequency ◽

Congruency Effect ◽

Response Times ◽

Test Word ◽

Low Frequency ◽

Bigram Frequency ◽

Capacity Model ◽

Entire Word ◽

High Frequency Words

Whether attention is allocated to an entire word or can be confined to part of a word was examined in an experiment using a visual composite task. Participants saw a study word, a cue to attend to either the right or left half, and a test word, and indicated if the cued half of the words (e.g., left) was the same (e.g., TOLD-TONE) or different (e.g., TOLD-WINE). Prior research using this task reports a larger congruency effect for low-frequency words relative to high-frequency words but extraneous variables were not equated. In this study ( N = 33), lexical (orthographic neighbourhood density) and sublexical (bigram frequency) variables were controlled, and word frequency was manipulated. Results indicate that word frequency does not moderate the degree to which parts of a word can be selectively attended/ignored. Response times to high-frequency words were faster than response times to low-frequency words but the congruency effect was equivalent. The data support a capacity model where attention is equally distributed across low-frequency and high-frequency words but low-frequency words require additional processing resources.

Download Full-text

Greek word recognition by Greek readers and the DRC model

10.31234/osf.io/3fu4n ◽

2020 ◽

Author(s):

Efthymia C Kapnoula ◽

Athanassios Protopapas ◽

Steven J. Saunders ◽

Max Coltheart

Keyword(s):

Word Recognition ◽

Visual Word Recognition ◽

Word Length ◽

Native Speakers ◽

Visual Word ◽

Bigram Frequency ◽

Nonword Naming ◽

Wide Range ◽

Psycholinguistic Variables ◽

Dual Route

We evaluated the dual route cascaded (DRC) model of visual word recognition using Greek behavioural data on word and nonword naming and lexical decision, focusing on the effects of syllable and bigram frequency. DRC was modified to process polysyllabic Greek words and nonwords. The Greek DRC and native speakers of Greek were presented with the same sets of word and nonword stimuli, spanning a wide range on several psycholinguistic variables, and the sensitivity of the model to lexical and sublexical variables was compared to the effects of these factors on the behavioural data. DRC pronounced correctly all the stimuli and successfully simulated the effects of frequency in words, and of length and bigram frequency in nonwords. However, unlike native speakers of Greek, DRC failed to demonstrate sensitivity to word length and syllabic frequency. We discuss the significance of these findings in constraining models of visual word recognition.

Download Full-text

Stylometric Comparison of Professionally Ghost-Written and Student-Written Assignments

Integrity in Education for Future Happiness ◽

10.11118/978-80-7509-772-9-0035 ◽

2020 ◽

Author(s):

Robin Crockett ◽

Kirstie Best

Keyword(s):

Frequency Analysis ◽

Complexity Analysis ◽

The Other ◽

Bigram Frequency ◽

Individual Student ◽

Multiple Authorship ◽

Writing Style ◽

The Core ◽

Written Assignments

We report a stylometric investigation of a portfolio of 20 assignments submitted by an individual student over two consecutive academic years. This investigation followed a formal disciplinary investigation which had identified that eight of the assignments had been ghostwritten, with seven of those showing explicit ghost-writer ID information and three of those showing ID information from the same commercial provider. The stylometric investigation involved a conventional word and bigram frequency analysis and a prototype word complexity analysis. The word and bigram analysis identified four consistent groups of assignments, which associate other assignments with the eight known to have been ghost-written, indicating that those were probably also ghost-written. One of those groups comprises the three assignments from the same provider, plus another assignment, implying that the provider has a ‘house style’ and that the other assignment also came from that provider. The prototype analysis clearly categorised the core members of two of those same groups, including the group from the identified provider, adding further weight those associations. More generally, this investigation shows that it is possible to categorise assignments according to aspects of writing style: we would have obtained the same groups even if we had not possessed the ghost-writer ID information. Where such consistent groups are identified it implies, on balance of probabilities, multiple authorship of assignments and that the student concerned cannot have written all the submitted assignments and that some were ghost-written.

Download Full-text

Effects of task and corpus-derived association scores on the online processing of collocations

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2018-0030 ◽

2019 ◽

Vol 0 (0) ◽

Author(s):

Kyla McConnell ◽

Alice Blumenthal-Dramé

Keyword(s):

Lexical Processing ◽

Transition Probability ◽

Critical Word ◽

Multiple Choice ◽

Bigram Frequency ◽

Choice Condition ◽

Treatment Variable ◽

Cognitive Correlates ◽

Online Processing ◽

Task Effects

AbstractIn the following self-paced reading study, we assess the cognitive realism of six widely used corpus-derived measures of association strength between words (collocated modifier–noun combinations likevast majority): MI, MI3, Dice coefficient,T-score,Z-score, and log-likelihood. The ability of these collocation metrics to predict reading times is tested against predictors of lexical processing cost that are widely established in the psycholinguistic and usage-based literature, respectively: forward/backward transition probability and bigram frequency. In addition, the experiment includes the treatment variable oftask: it is split into two blocks which only differ in the format of interleaved comprehension questions (multiple choice vs. typed free response). Results show that the traditional corpus-linguistic metrics are outperformed by both backward transition probability and bigram frequency. Moreover, the multiple-choice condition elicits faster overall reading times than the typed condition, and the two winning metrics show stronger facilitation on the critical word (i.e. the noun in the bigrams) in the multiple-choice condition. In the typed condition, we find an effect that is weaker and, in the case of bigram frequency, longer lasting, continuing into the first spillover word. We argue that insufficient attention to task effects might have obscured the cognitive correlates of association scores in earlier research.

Download Full-text

A Novel Computational Framework to Predict the Impact of a Point Mutation on PDZ Domain Classification

10.1101/244251 ◽

2018 ◽

Author(s):

Muhammad Moinuddin ◽

Wasim Aftab ◽

Adnan Memic

Keyword(s):

Usher Syndrome ◽

Pdz Domain ◽

Point Mutations ◽

Similarity Measures ◽

Bigram Frequency ◽

Computational Techniques ◽

Pdz Domains ◽

Computational Framework ◽

Class 1 ◽

The Impact

AbstractPDZ domains represent one of the most common protein homology regions playing key roles in several diseases. Point mutations (PM) in amino acid primary sequence of PDZ domains can alter domain functions by affecting for example, downstream phosphorylation, a pivotal process in biology. Our goal in this present study was to introduce a novel approach to investigate how point mutations within the Class 1, Class 2 and Class 1–2 PDZ domains could affect the changes in binding with their partner ligands and hence affect their classification. We focused on features in PDZ domains of various species including human, rat and mouse. However, our work represents a generic computational framework that could be used to analyze PM in any given PDZ sequence. We have adopted two different approaches to investigate the impact of PM. In the first approach, we have developed a statistical model using bigram frequencies of amino acids and employed six different similarity measures to contrast the bigram frequency distribution of a wild type sequence relevant to its point mutants. In the next approach, we developed a statistical method that incorporates the impact of bigram frequency history associated with each mutational site that we call history weighted conditional change in probabilities. In this PM study, we observed that the history weighted method performs best when compared to all other methods studied in terms of picking up sites in PDZ domain where a PM could flip the class. We anticipate that this method will present a step forward towards computational techniques unveiling PDZ domain point mutants that largely affect the protein-ligand binding, specificity and affinity. We hope that this and future studies could aid therapy in which PDZ mutations have been implicated as the main disease drivers such as the Usher syndrome.

Download Full-text

Busting a myth with the Bayes Factor

The Mental Lexicon ◽

10.1075/ml.17009.sch ◽

2017 ◽

Vol 12 (2) ◽

pp. 263-282 ◽

Cited By ~ 4

Author(s):

Xenia Schmalz ◽

Claudio Mulatti

Keyword(s):

Lexical Decision ◽

Cognitive Processes ◽

Large Scale ◽

Bayes Factor ◽

Control Variable ◽

Empirical Work ◽

Letter Pair ◽

Task Demands ◽

Frequency Effect ◽

Bigram Frequency

Abstract Psycholinguistic researchers identify linguistic variables and assess if they affect cognitive processes. One such variable is letter bigram frequency, or the frequency with which a given letter pair co-occurs in an orthography. While early studies reported that bigram frequency affects visual lexical decision, subsequent, well-controlled studies not shown this effect. Still, researchers continue to use it as a control variable in psycholinguistic experiments. We propose two reasons for the persistence of this variable: (1) Reporting no significant effect of bigram frequency cannot provide evidence for no effect. (2) Despite empirical work, theoretical implications of bigram frequency are largely neglected. We perform Bayes Factor analyses to address the first issue. In analyses of existing large-scale databases, we find no effect of bigram frequency in lexical decision in the British Lexicon Project, and some evidence for an inhibitory effect in the English Lexicon Project. We find strong evidence for an effect in reading aloud. This suggests that, for lexical decision, the effect is unstable, and may depend on item characteristics and task demands rather than reflecting cognitive processes underlying visual word recognition. We call for more consideration of theoretical implications of the presence or absence of a bigram frequency effect.

Download Full-text

Busting a myth with the Bayes Factor: Effects of letter bigram frequency in visual lexical decision do not reflect reading processes

10.31219/osf.io/3ybwd ◽

2017 ◽

Author(s):

Xenia Schmalz ◽

Claudio Mulatti

Keyword(s):

Cognitive Processes ◽

Large Scale ◽

Bayes Factor ◽

Reading Aloud ◽

Control Variable ◽

Bigram Frequency ◽

Linguistic Variables ◽

Reading Processes ◽

Frequency Effects ◽

Lexical Decisions

Psycholinguistic researchers identify linguistic variables and assess if and how these affect cognitive processes. One such variable is letter bigram frequency, or the frequency with which a given pair of letters co-occurs in an orthography. While early studies have shown that bigram frequency affects visual word recognition, subsequent, well-controlled studies have failed to show such an effect. Still, researchers continue to use it as a control variable in psycholinguistic experiments. We propose two reasons for the persistence of this variable: (1) Studies have reported no evidence for an effect of bigram frequency, but this cannot provide evidence for no effect. (2) The theoretical implications of a bigram frequency have been largely neglected. We address the first issue by performing Bayes Factor tests on both a matched item set and large-scale studies, and the second by discussing possible theoretical implications. We find no effects of bigram frequency effects for lexical decisions, though there is some evidence for an effect in reading aloud.

Download Full-text

Orthographic knowledge and lexical form influence vocabulary learning

Applied Psycholinguistics ◽

10.1017/s0142716416000242 ◽

2016 ◽

Vol 38 (2) ◽

pp. 427-456 ◽

Cited By ~ 8

Author(s):

JAMES BARTOLOTTI ◽

VIORICA MARIAN

Keyword(s):

Second Language Acquisition ◽

Language Learners ◽

Native Language ◽

Neighborhood Density ◽

Orthographic Knowledge ◽

Bigram Frequency ◽

The Novel ◽

Repeated Testing ◽

Vocabulary Size ◽

Word Acquisition

ABSTRACTMany adults struggle with second language acquisition but learn new native-language words relatively easily. We investigated the role of sublexical native-language patterns on novel word acquisition. Twenty English monolinguals learned 48 novel written words in five repeated testing blocks. Half were orthographically wordlike (e.g., nish, high neighborhood density and high segment/bigram frequency), while half were not (e.g., gofp, low neighborhood density and low segment/bigram frequency). Participants were faster and more accurate at recognizing and producing wordlike items, indicating a native-language similarity benefit. Individual differences in memory and vocabulary size influenced learning, and error analyses indicated that participants extracted probabilistic information from the novel vocabulary. Results suggest that language learners benefit from both native-language overlap and regularities within the novel language.

Download Full-text

bigram frequency
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Usage-Based Individual Differences in the Probabilistic Processing of Multi-Word Sequences

Social Media Activism and Convergence in Tweet Topics After the Initial #MeToo Movement for Two Distinct Groups of Twitter Users

Word frequency does not moderate the degree to which people can selectively attend to parts of visually presented words

Greek word recognition by Greek readers and the DRC model

Stylometric Comparison of Professionally Ghost-Written and Student-Written Assignments

Effects of task and corpus-derived association scores on the online processing of collocations

A Novel Computational Framework to Predict the Impact of a Point Mutation on PDZ Domain Classification

Busting a myth with the Bayes Factor

Busting a myth with the Bayes Factor: Effects of letter bigram frequency in visual lexical decision do not reflect reading processes

Orthographic knowledge and lexical form influence vocabulary learning

Export Citation Format

bigram frequencyRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Usage-Based Individual Differences in the Probabilistic Processing of Multi-Word Sequences

Social Media Activism and Convergence in Tweet Topics After the Initial #MeToo Movement for Two Distinct Groups of Twitter Users

Word frequency does not moderate the degree to which people can selectively attend to parts of visually presented words

Greek word recognition by Greek readers and the DRC model

Stylometric Comparison of Professionally Ghost-Written and Student-Written Assignments

Effects of task and corpus-derived association scores on the online processing of collocations

A Novel Computational Framework to Predict the Impact of a Point Mutation on PDZ Domain Classification

Busting a myth with the Bayes Factor

Busting a myth with the Bayes Factor: Effects of letter bigram frequency in visual lexical decision do not reflect reading processes

Orthographic knowledge and lexical form influence vocabulary learning

bigram frequency
Recently Published Documents