Word-level information extraction from science and technology announcements corpus based on CRF

Author(s):  
Yushu Cao ◽  
Jun Wang ◽  
Lei Li
2003 ◽  
Author(s):  
Rohini K. Srihari ◽  
Wei Li ◽  
Cheng Niu ◽  
Thomas Cornell

2018 ◽  
Vol 6 ◽  
pp. 451-465 ◽  
Author(s):  
Daniela Gerz ◽  
Ivan Vulić ◽  
Edoardo Ponti ◽  
Jason Naradowsky ◽  
Roi Reichart ◽  
...  

Neural architectures are prominent in the construction of language models (LMs). However, word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community, while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.


2006 ◽  
Vol 14 (01) ◽  
Author(s):  
ROHINI K. SRIHARI ◽  
WEI LI ◽  
THOMAS CORNELL ◽  
CHENG NIU

Cognition ◽  
2013 ◽  
Vol 127 (3) ◽  
pp. 427-438 ◽  
Author(s):  
Naomi H. Feldman ◽  
Emily B. Myers ◽  
Katherine S. White ◽  
Thomas L. Griffiths ◽  
James L. Morgan

Author(s):  
D. G. Anastasyev ◽  

In this paper, we build a joint morpho-syntactic parser for Russian. We describe a method to train a joint model which is significantly faster and as accurate as a traditional pipeline of models. We explore various ways to encode the word-level information and how they can affect the parser’s performance. To this end, we utilize learned from scratch character-level word embeddings and grammeme embeddings that have shown state-of-theart results for similar tasks for Russian in the past. We compare them with the pretrained contextualized word embeddings, such as ELMo and BERT, known to lead to the breakthrough in miscellaneous tasks in English. As a result, we prove that their usage can significantly improve parsing quality.


2003 ◽  
Vol 60 (1) ◽  
pp. 256-257
Author(s):  
M.A. Niznikiewicz ◽  
S.D. Hun ◽  
P.G. Nestor ◽  
C. Dodd ◽  
M.E. Shenton ◽  
...  
Keyword(s):  

Author(s):  
Robert Wille ◽  
Gorschwin Fey ◽  
Daniel Grobe ◽  
Stephan Eggersgluss ◽  
Rolf Drechsler
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document