Detecting near-duplicate documents using sentence-level features and supervised learning

2013 ◽  
Vol 40 (5) ◽  
pp. 1467-1476 ◽  
Author(s):  
Yung-Shen Lin ◽  
Ting-Yi Liao ◽  
Shie-Jue Lee
Author(s):  
Yujin Yuan ◽  
Liyuan Liu ◽  
Siliang Tang ◽  
Zhongfei Zhang ◽  
Yueting Zhuang ◽  
...  

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations. However, the generated training data typically contain massive noise, and may result in poor performances with the vanilla supervised learning. In this paper, we propose to conduct multi-instance learning with a novel Cross-relation Cross-bag Selective Attention (C2SA), which leads to noise-robust training for distant supervised relation extractor. Specifically, we employ the sentence-level selective attention to reduce the effect of noisy or mismatched sentences, while the correlation among relations were captured to improve the quality of attention weights. Moreover, instead of treating all entity-pairs equally, we try to pay more attention to entity-pairs with a higher quality. Similarly, we adopt the selective attention mechanism to achieve this goal. Experiments with two types of relation extractor demonstrate the superiority of the proposed approach over the state-of-the-art, while further ablation studies verify our intuitions and demonstrate the effectiveness of our proposed two techniques.


2020 ◽  
Author(s):  
Hongchao Fang ◽  
Pengtao Xie

Pretrained language models such as BERT, GPT have shown great effectiveness in language understanding. The auxiliary predictive tasks in existing pretraining approaches are mostly defined on tokens, thus may not be able to capture sentence-level semantics very well. To address this issue, we propose CERT: Contrastive self-supervised Encoder Representations from Transformers, which pretrains language representation models using contrastive self-supervised learning at the sentence level. CERT creates augmentations of original sentences using back-translation. Then it finetunes a pretrained language encoder (e.g., BERT) by predicting whether two augmented sentences originate from the same sentence. CERT is simple to use and can be flexibly plugged into any pretraining-finetuning NLP pipeline. We evaluate CERT on three language understanding tasks: CoLA, RTE, and QNLI. CERT outperforms BERT significantly.<br>


2020 ◽  
Author(s):  
Hongchao Fang ◽  
Pengtao Xie

Pretrained language models such as BERT, GPT have shown great effectiveness in language understanding. The auxiliary predictive tasks in existing pretraining approaches are mostly defined on tokens, thus may not be able to capture sentence-level semantics very well. To address this issue, we propose CERT: Contrastive self-supervised Encoder Representations from Transformers, which pretrains language representation models using contrastive self-supervised learning at the sentence level. CERT creates augmentations of original sentences using back-translation. Then it finetunes a pretrained language encoder (e.g., BERT) by predicting whether two augmented sentences originate from the same sentence. CERT is simple to use and can be flexibly plugged into any pretraining-finetuning NLP pipeline. We evaluate CERT on three language understanding tasks: CoLA, RTE, and QNLI. CERT outperforms BERT significantly.<br>


2020 ◽  
Vol 63 (7) ◽  
pp. 2281-2292
Author(s):  
Ying Zhao ◽  
Xinchun Wu ◽  
Hongjun Chen ◽  
Peng Sun ◽  
Ruibo Xie ◽  
...  

Purpose This exploratory study aimed to investigate the potential impact of sentence-level comprehension and sentence-level fluency on passage comprehension of deaf students in elementary school. Method A total of 159 deaf students, 65 students ( M age = 13.46 years) in Grades 3 and 4 and 94 students ( M age = 14.95 years) in Grades 5 and 6, were assessed for nonverbal intelligence, vocabulary knowledge, sentence-level comprehension, sentence-level fluency, and passage comprehension. Group differences were examined using t tests, whereas the predictive and mediating mechanisms were examined using regression modeling. Results The regression analyses showed that the effect of sentence-level comprehension on passage comprehension was not significant, whereas sentence-level fluency was an independent predictor in Grades 3–4. Sentence-level comprehension and fluency contributed significant variance to passage comprehension in Grades 5–6. Sentence-level fluency fully mediated the influence of sentence-level comprehension on passage comprehension in Grades 3–4, playing a partial mediating role in Grades 5–6. Conclusions The relative contributions of sentence-level comprehension and fluency to deaf students' passage comprehension varied, and sentence-level fluency mediated the relationship between sentence-level comprehension and passage comprehension.


2017 ◽  
Vol 24 (1) ◽  
pp. 46-72
Author(s):  
Jacob Tootalian

Ben Jonson's early plays show a marked interest in prose as a counterpoint to the blank verse norm of the Renaissance stage. This essay presents a digital analysis of Jonson's early mixed-mode plays and his two later full-prose comedies. It examines this selection of the Jonsonian corpus using DocuScope, a piece of software that catalogs sentence-level features of texts according to a series of rhetorical categories, highlighting the distinctive linguistic patterns associated with Jonson's verse and prose. Verse tends to employ abstract, morally and emotionally charged language, while prose is more often characterized by expressions that are socially explicit, interrogative, and interactive. In the satirical economy of these plays, Jonson's characters usually adopt verse when they articulate censorious judgements, descending into prose when they wade into the intractable banter of the vicious world. Surprisingly, the prosaic signature that Jonson fashioned in his earlier drama persisted in the two later full-prose comedies. The essay presents readings of Every Man Out of his Humour and Bartholomew Fair, illustrating how the tension between verse and prose that motivated the satirical dynamics of the mixed-mode plays was released in the full-prose comedies. Jonson's final experiments with theatrical prose dramatize the exhaustion of the satirical impulse by submerging his characters almost entirely in the prosaic world of interactive engagement.


2020 ◽  
Vol 75 (3) ◽  
pp. 346-371
Author(s):  
Irena Yamboliev

Irena Yamboliev, “Vernon Lee’s Novel Construction” (pp. 346–371) This essay proposes that we understand Vernon Lee’s debut novel, Miss Brown (1884), as enacting a theory of literary language’s constructive potency that Lee develops in her critical essays. Those critical essays offer a vibrant nineteenth-century formalism, suggesting how fiction constructs and formalizes our realities, shaping readers’ mental and emotional circuits as it arranges phrases, sentences, and paragraphs. In Miss Brown, Lee crafts a prose style that meticulously tracks the protagonist’s formation—the “little dramas of expectation, fulfilment and disappointment,…of tensions and relaxations”—rendering that formation as a drama of sentence-level structuration. The resulting “representation” of Anne Brown is interrupted with adjective-rich stretches conspicuously geared toward defining, formulating, and theorizing what is being represented, essay-like. By treating the protagonist as an occasion to foreground syntax’s active building and abstracting, Miss Brown’s prose partakes in the kind of literary practice that has recently been described as nonmimetic realism—realism that does more than denote and refer and reflect what is, and instead performs, meditating on form’s process, to project and inform new potentialities.


Informatica ◽  
2018 ◽  
Vol 29 (4) ◽  
pp. 693-710
Author(s):  
Algirdas Laukaitis ◽  
Darius Plikynas ◽  
Egidijus Ostasius

2018 ◽  
Vol 2 (1) ◽  
pp. 61-82
Author(s):  
Ayah Farhat ◽  
Alessandro Benati

The present study investigates the effects of motivation and processing instruction on the acquisition of Modern Standard Arabic gender agreement. The role of individual differences (e.g. age, gender, aptitude, language background and working memory) on the positive effects generated by processing instruction has been investigated in the last few years. However, no previous research has been conducted to measure the possible effects of motivation on L2 learners exposed to processing instruction. In addition, a reasonable question to be addressed within the processing instruction research framework is whether its positive effects can be generalised to the acquisition of Modern Standard Arabic. The Academic Motivation Scale (AMS) and the Attitude Motivation Test Battery (AMTB) motivation questionnaires were used to capture different variables that influence motivation in order to create the two different groups (high and low motivated). In this experimental study, forty-one native English school-age learners (aged 8–11) were assigned to two groups: ‘the high motivated group’ (n = 29): and the ‘low motivated group’ (n = 12). Both groups received processing instruction, which lasted for three hours. Sentence-level interpretation and production tasks were used in a pre-test and post-test design to measure instructional effects. The learners were required to fill in gaps in both written and spoken mode for the activities. The study also included a delayed post-test administered to the two groups four weeks later. The results indicated that both groups improved equally from pre-test to post-test in all assessment measures and they both retained the positive effects of the training in the delayed posttests. Processing instruction was proved to be the main factor for the improvement in performance regardless of the learner’s level of motivation.


2018 ◽  
Vol 2018 (15) ◽  
pp. 132-1-1323
Author(s):  
Shijie Zhang ◽  
Zhengtian Song ◽  
G. M. Dilshan P. Godaliyadda ◽  
Dong Hye Ye ◽  
Atanu Sengupta ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document