scholarly journals Word associations and the distance properties of context-aware word embeddings

Author(s):  
Maria A. Rodriguez ◽  
Paola Merlo
Author(s):  
Janus Wawrzinek ◽  
Said Ahmad Ratib Hussaini ◽  
Oliver Wiehr ◽  
José María González Pinto ◽  
Wolf-Tilo Balke

Author(s):  
Maria Antoniak ◽  
David Mimno

Word embeddings are increasingly being used as a tool to study word associations in specific corpora. However, it is unclear whether such embeddings reflect enduring properties of language or if they are sensitive to inconsequential variations in the source documents. We find that nearest-neighbor distances are highly sensitive to small changes in the training corpus for a variety of algorithms. For all methods, including specific documents in the training set can result in substantial variations. We show that these effects are more prominent for smaller training corpora. We recommend that users never rely on single embedding models for distance calculations, but rather average over multiple bootstrap samples, especially for small corpora.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Vlad-Iulian Ilie ◽  
Ciprian-Octavian Truica ◽  
Elena-Simona Apostol ◽  
Adrian Paschke

2021 ◽  
Author(s):  
Wataru Nakata ◽  
Tomoki Koriyama ◽  
Shinnosuke Takamichi ◽  
Naoko Tanji ◽  
Yusuke Ijima ◽  
...  

1968 ◽  
Author(s):  
Lorand B. Szalay ◽  
Jack E. Brent ◽  
Dale A. Lysne

Sign in / Sign up

Export Citation Format

Share Document