Detecting near-duplicate documents using sentence-level features and supervised learning

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.

Download Full-text

CERT: Contrastive Self-supervised Learning for Language Understanding

10.36227/techrxiv.12308378.v1 ◽

2020 ◽

Author(s):

Hongchao Fang ◽

Pengtao Xie

Keyword(s):

Supervised Learning ◽

Language Models ◽

Language Understanding ◽

Great Effectiveness ◽

Language Representation ◽

Sentence Level ◽

Back Translation

Pretrained language models such as BERT, GPT have shown great effectiveness in language understanding. The auxiliary predictive tasks in existing pretraining approaches are mostly defined on tokens, thus may not be able to capture sentence-level semantics very well. To address this issue, we propose CERT: Contrastive self-supervised Encoder Representations from Transformers, which pretrains language representation models using contrastive self-supervised learning at the sentence level. CERT creates augmentations of original sentences using back-translation. Then it finetunes a pretrained language encoder (e.g., BERT) by predicting whether two augmented sentences originate from the same sentence. CERT is simple to use and can be flexibly plugged into any pretraining-finetuning NLP pipeline. We evaluate CERT on three language understanding tasks: CoLA, RTE, and QNLI. CERT outperforms BERT significantly.<br>

Download Full-text

CERT: Contrastive Self-supervised Learning for Language Understanding

10.36227/techrxiv.12308378 ◽

2020 ◽

Author(s):

Hongchao Fang ◽

Pengtao Xie

Keyword(s):

Supervised Learning ◽

Language Models ◽

Language Understanding ◽

Great Effectiveness ◽

Language Representation ◽

Sentence Level ◽

Back Translation

Pretrained language models such as BERT, GPT have shown great effectiveness in language understanding. The auxiliary predictive tasks in existing pretraining approaches are mostly defined on tokens, thus may not be able to capture sentence-level semantics very well. To address this issue, we propose CERT: Contrastive self-supervised Encoder Representations from Transformers, which pretrains language representation models using contrastive self-supervised learning at the sentence level. CERT creates augmentations of original sentences using back-translation. Then it finetunes a pretrained language encoder (e.g., BERT) by predicting whether two augmented sentences originate from the same sentence. CERT is simple to use and can be flexibly plugged into any pretraining-finetuning NLP pipeline. We evaluate CERT on three language understanding tasks: CoLA, RTE, and QNLI. CERT outperforms BERT significantly.<br>

Download Full-text

Exploring the Potential Impact of Sentence-Level Comprehension and Sentence-Level Fluency on Deaf Students' Passage Comprehension

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-19-00205 ◽

2020 ◽

Vol 63 (7) ◽

pp. 2281-2292

Author(s):

Ying Zhao ◽

Xinchun Wu ◽

Hongjun Chen ◽

Peng Sun ◽

Ruibo Xie ◽

...

Keyword(s):

Independent Predictor ◽

Vocabulary Knowledge ◽

Group Differences ◽

Regression Analyses ◽

Deaf Students ◽

Mediating Role ◽

Sentence Level ◽

Mediating Mechanisms ◽

Potential Impact ◽

The Relationship

Purpose This exploratory study aimed to investigate the potential impact of sentence-level comprehension and sentence-level fluency on passage comprehension of deaf students in elementary school. Method A total of 159 deaf students, 65 students ( M age = 13.46 years) in Grades 3 and 4 and 94 students ( M age = 14.95 years) in Grades 5 and 6, were assessed for nonverbal intelligence, vocabulary knowledge, sentence-level comprehension, sentence-level fluency, and passage comprehension. Group differences were examined using t tests, whereas the predictive and mediating mechanisms were examined using regression modeling. Results The regression analyses showed that the effect of sentence-level comprehension on passage comprehension was not significant, whereas sentence-level fluency was an independent predictor in Grades 3–4. Sentence-level comprehension and fluency contributed significant variance to passage comprehension in Grades 5–6. Sentence-level fluency fully mediated the influence of sentence-level comprehension on passage comprehension in Grades 3–4, playing a partial mediating role in Grades 5–6. Conclusions The relative contributions of sentence-level comprehension and fluency to deaf students' passage comprehension varied, and sentence-level fluency mediated the relationship between sentence-level comprehension and passage comprehension.

Download Full-text

Mapping Treatment: An Approach To Treating Sentence Level Impairments In Agrammatism

Perspectives on Neurophysiology and Neurogenic Speech and Language Disorders ◽

10.1044/nnsld11.3.14 ◽

2001 ◽

Vol 11 (3) ◽

pp. 14-23 ◽

Cited By ~ 4

Author(s):

Ruth B. Fink

Keyword(s):

Sentence Level

Download Full-text

“To Corrupt a Man in the Midst of a Verse”: Ben Jonson and the Prose of the World

Ben Jonson Journal ◽

10.3366/bjj.2017.0179 ◽

2017 ◽

Vol 24 (1) ◽

pp. 46-72

Author(s):

Jacob Tootalian

Keyword(s):

Mixed Mode ◽

Ben Jonson ◽

Digital Analysis ◽

Interactive Engagement ◽

Sentence Level ◽

Linguistic Patterns ◽

The World ◽

Selection Of ◽

Bartholomew Fair

Ben Jonson's early plays show a marked interest in prose as a counterpoint to the blank verse norm of the Renaissance stage. This essay presents a digital analysis of Jonson's early mixed-mode plays and his two later full-prose comedies. It examines this selection of the Jonsonian corpus using DocuScope, a piece of software that catalogs sentence-level features of texts according to a series of rhetorical categories, highlighting the distinctive linguistic patterns associated with Jonson's verse and prose. Verse tends to employ abstract, morally and emotionally charged language, while prose is more often characterized by expressions that are socially explicit, interrogative, and interactive. In the satirical economy of these plays, Jonson's characters usually adopt verse when they articulate censorious judgements, descending into prose when they wade into the intractable banter of the vicious world. Surprisingly, the prosaic signature that Jonson fashioned in his earlier drama persisted in the two later full-prose comedies. The essay presents readings of Every Man Out of his Humour and Bartholomew Fair, illustrating how the tension between verse and prose that motivated the satirical dynamics of the mixed-mode plays was released in the full-prose comedies. Jonson's final experiments with theatrical prose dramatize the exhaustion of the satirical impulse by submerging his characters almost entirely in the prosaic world of interactive engagement.

Download Full-text

Vernon Lee’s Novel Construction

Nineteenth-Century Literature ◽

10.1525/ncl.2020.75.3.346 ◽

2020 ◽

Vol 75 (3) ◽

pp. 346-371

Author(s):

Irena Yamboliev

Keyword(s):

Nineteenth Century ◽

Literary Practice ◽

Prose Style ◽

Sentence Level

Irena Yamboliev, “Vernon Lee’s Novel Construction” (pp. 346–371) This essay proposes that we understand Vernon Lee’s debut novel, Miss Brown (1884), as enacting a theory of literary language’s constructive potency that Lee develops in her critical essays. Those critical essays offer a vibrant nineteenth-century formalism, suggesting how fiction constructs and formalizes our realities, shaping readers’ mental and emotional circuits as it arranges phrases, sentences, and paragraphs. In Miss Brown, Lee crafts a prose style that meticulously tracks the protagonist’s formation—the “little dramas of expectation, fulfilment and disappointment,…of tensions and relaxations”—rendering that formation as a drama of sentence-level structuration. The resulting “representation” of Anne Brown is interrupted with adjective-rich stretches conspicuously geared toward defining, formulating, and theorizing what is being represented, essay-like. By treating the protagonist as an occasion to foreground syntax’s active building and abstracting, Miss Brown’s prose partakes in the kind of literary practice that has recently been described as nonmimetic realism—realism that does more than denote and refer and reflect what is, and instead performs, meditating on form’s process, to project and inform new potentialities.

Download Full-text

Sentence Level Alignment of Digitized Books Parallel Corpora

Informatica ◽

10.15388/informatica.2018.188 ◽

2018 ◽

Vol 29 (4) ◽

pp. 693-710

Author(s):

Algirdas Laukaitis ◽

Darius Plikynas ◽

Egidijus Ostasius

Keyword(s):

Parallel Corpora ◽

Sentence Level

Download Full-text

The effects of motivation on processing instruction in the acquisition of Modern Standard Arabic gender agreement

Instructed Second Language Acquisition ◽

10.1558/isla.34879 ◽

2018 ◽

Vol 2 (1) ◽

pp. 61-82

Author(s):

Ayah Farhat ◽

Alessandro Benati

Keyword(s):

Academic Motivation ◽

Processing Instruction ◽

Gender Agreement ◽

Modern Standard Arabic ◽

Language Background ◽

Standard Arabic ◽

Positive Effects ◽

Sentence Level ◽

Post Test ◽

Modern Standard

The present study investigates the effects of motivation and processing instruction on the acquisition of Modern Standard Arabic gender agreement. The role of individual differences (e.g. age, gender, aptitude, language background and working memory) on the positive effects generated by processing instruction has been investigated in the last few years. However, no previous research has been conducted to measure the possible effects of motivation on L2 learners exposed to processing instruction. In addition, a reasonable question to be addressed within the processing instruction research framework is whether its positive effects can be generalised to the acquisition of Modern Standard Arabic. The Academic Motivation Scale (AMS) and the Attitude Motivation Test Battery (AMTB) motivation questionnaires were used to capture different variables that influence motivation in order to create the two different groups (high and low motivated). In this experimental study, forty-one native English school-age learners (aged 8–11) were assigned to two groups: ‘the high motivated group’ (n = 29): and the ‘low motivated group’ (n = 12). Both groups received processing instruction, which lasted for three hours. Sentence-level interpretation and production tasks were used in a pre-test and post-test design to measure instructional effects. The learners were required to fill in gaps in both written and spoken mode for the activities. The study also included a delayed post-test administered to the two groups four weeks later. The results indicated that both groups improved equally from pre-test to post-test in all assessment measures and they both retained the positive effects of the training in the delayed posttests. Processing instruction was proved to be the main factor for the improvement in performance regardless of the learner’s level of motivation.

Download Full-text