scholarly journals Evaluating the Stability of Embedding-based Word Similarities

Author(s):  
Maria Antoniak ◽  
David Mimno

Word embeddings are increasingly being used as a tool to study word associations in specific corpora. However, it is unclear whether such embeddings reflect enduring properties of language or if they are sensitive to inconsequential variations in the source documents. We find that nearest-neighbor distances are highly sensitive to small changes in the training corpus for a variety of algorithms. For all methods, including specific documents in the training set can result in substantial variations. We show that these effects are more prominent for smaller training corpora. We recommend that users never rely on single embedding models for distance calculations, but rather average over multiple bootstrap samples, especially for small corpora.

Author(s):  
S. R. Herd ◽  
P. Chaudhari

Electron diffraction and direct transmission have been used extensively to study the local atomic arrangement in amorphous solids and in particular Ge. Nearest neighbor distances had been calculated from E.D. profiles and the results have been interpreted in terms of the microcrystalline or the random network models. Direct transmission electron microscopy appears the most direct and accurate method to resolve this issue since the spacial resolution of the better instruments are of the order of 3Å. In particular the tilted beam interference method is used regularly to show fringes corresponding to 1.5 to 3Å lattice planes in crystals as resolution tests.


2011 ◽  
Vol 25 (12n13) ◽  
pp. 1041-1051 ◽  
Author(s):  
HO KHAC HIEU ◽  
VU VAN HUNG

Using the statistical moment method (SMM), the temperature and pressure dependences of thermodynamic quantities of zinc-blende-type semiconductors have been investigated. The analytical expressions of the nearest-neighbor distances, the change of volumes and the mean-square atomic displacements (MSDs) have been derived. Numerical calculations have been performed for a series of zinc-blende-type semiconductors: GaAs , GaP , GaSb , InAs , InP and InSb . The agreement between our calculations and both earlier other theoretical results and experimental data is a support for our new theory in investigating the temperature and pressure dependences of thermodynamic quantities of semiconductors.


Author(s):  
Søren Ager Meldgaard ◽  
Jonas Köhler ◽  
Henrik Lund Mortensen ◽  
Mads-Peter Verner Christiansen ◽  
Frank Noé ◽  
...  

Abstract Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.


2019 ◽  
Vol 5 (2) ◽  
pp. eaav0693 ◽  
Author(s):  
Christopher J. Bartel ◽  
Christopher Sutton ◽  
Bryan R. Goldsmith ◽  
Runhai Ouyang ◽  
Charles B. Musgrave ◽  
...  

Predicting the stability of the perovskite structure remains a long-standing challenge for the discovery of new functional materials for many applications including photovoltaics and electrocatalysts. We developed an accurate, physically interpretable, and one-dimensional tolerance factor, τ, that correctly predicts 92% of compounds as perovskite or nonperovskite for an experimental dataset of 576 ABX3 materials (X = O2−, F−, Cl−, Br−, I−) using a novel data analytics approach based on SISSO (sure independence screening and sparsifying operator). τ is shown to generalize outside the training set for 1034 experimentally realized single and double perovskites (91% accuracy) and is applied to identify 23,314 new double perovskites (A2BB′X6) ranked by their probability of being stable as perovskite. This work guides experimentalists and theorists toward which perovskites are most likely to be successfully synthesized and demonstrates an approach to descriptor identification that can be extended to arbitrary applications beyond perovskite stability predictions.


2017 ◽  
Vol 43 (3) ◽  
pp. 593-617 ◽  
Author(s):  
Sascha Rothe ◽  
Hinrich Schütze

We present AutoExtend, a system that combines word embeddings with semantic resources by learning embeddings for non-word objects like synsets and entities and learning word embeddings that incorporate the semantic information from the resource. The method is based on encoding and decoding the word embeddings and is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The obtained embeddings live in the same vector space as the input word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet, GermaNet, and Freebase as semantic resources. AutoExtend achieves state-of-the-art performance on Word-in-Context Similarity and Word Sense Disambiguation tasks.


2001 ◽  
Vol 669 ◽  
Author(s):  
M. A. Sahiner ◽  
S. W. Novak ◽  
J. C. Woicik ◽  
J. Liu ◽  
V. Krishnamoorty

ABSTRACTDoping silicon with arsenic by ion implantation above the solid solubility level leads to As clusters and/or precipitates in the form of monoclinic SiAs causing electrical deactivation of the dopant. Information on the local structure around the As atom, and the As concentration depth profiles is important for the implantation and annealing process in order to reduce the precipitated As and maximize the electrically activated As. In this study, we determined the local As structure and the precipitated versus substituted As for As implants in CZ (001) Si wafers, with implant energies between 20 keV and 100 keV, and implant doses ranging from 1 × 1015/cm2 to 1 × 1018/cm2. The samples were subjected to different thermal annealing conditions. We used secondary ion mass spectrometry (SIMS) and UT- MARLOWE simulations to determine the region where the As-concentration is above the solid solubility level. By x-ray absorption fine structure spectroscopy (XAFS), we probed the structure of the local environment around As. XAFS being capable of probing the short-range order in crystalline and amorphous materials provides information on the number, distance and chemical identity of the neighbors of the main absorbing atom. Using Fourier analysis, the coordination numbers (N) and the nearest-neighbor distances (R) to As atoms in the first shell were extracted from the XAFS data. When As precipitates as monoclinic SiAs, the nearest-neighbor distances and coordination numbers are ∼2.37 Å and ∼3, as opposed to ∼2.40 Å and ∼4 when As is substitutional. Based on this information, the critical implant dose where the precipitation/clustering of As starts, and the ratio of the substitutional versus cluster/precipitate form As in the samples were determined.


2013 ◽  
Vol 85 (20) ◽  
pp. 9449-9458 ◽  
Author(s):  
Witold Nowik ◽  
Sylvie Héron ◽  
Myriam Bonose ◽  
Mateusz Nowik ◽  
Alain Tchapla

Sign in / Sign up

Export Citation Format

Share Document