scholarly journals Wide range screening of algorithmic bias in word embedding models using large sentiment lexicons reveals underreported bias types

PLoS ONE ◽  
2020 ◽  
Vol 15 (4) ◽  
pp. e0231189
Author(s):  
David Rozado
2018 ◽  
Author(s):  
Emeric Dynomant ◽  
Romain Lelong ◽  
Badisse Dahamna ◽  
Clément Massonaud ◽  
Gaétan Kerdelhué ◽  
...  

BACKGROUND Word embedding technologies are now used in a wide range of applications. However, no formal evaluation and comparison have been made on models produced by the three most famous implementations (Word2Vec, GloVe and FastText). OBJECTIVE The goal of this study is to compare embedding implementations on a corpus of documents produced in a working context, by health professionals. METHODS Models have been trained on documents coming from the Rouen university hospital. This data is not structured and cover a wide range of documents produced in a clinic (discharge summary, prescriptions ...). Four evaluation tasks have been defined (cosine similarity, odd one, mathematical operations and human formal evaluation) and applied on each model. RESULTS Word2Vec had the highest score for three of the four tasks (mathematical operations, odd one similarity and human validation), particularly regarding the Skip-Gram architecture. CONCLUSIONS Even if this implementation had the best rate, each model has its own qualities and defects, like the training time which is very short for GloVe or morphosyntaxic similarity conservation observed with FastText. Models and test sets produced by this study will be the first publicly available through a graphical interface to help advance French biomedical research.


2020 ◽  
Vol 10 (19) ◽  
pp. 6893
Author(s):  
Yerai Doval ◽  
Jesús Vilares ◽  
Carlos Gómez-Rodríguez

Research on word embeddings has mainly focused on improving their performance on standard corpora, disregarding the difficulties posed by noisy texts in the form of tweets and other types of non-standard writing from social media. In this work, we propose a simple extension to the skipgram model in which we introduce the concept of bridge-words, which are artificial words added to the model to strengthen the similarity between standard words and their noisy variants. Our new embeddings outperform baseline models on noisy texts on a wide range of evaluation tasks, both intrinsic and extrinsic, while retaining a good performance on standard texts. To the best of our knowledge, this is the first explicit approach at dealing with these types of noisy texts at the word embedding level that goes beyond the support for out-of-vocabulary words.


2021 ◽  
Vol 15 (02) ◽  
pp. 263-290
Author(s):  
Renjith P. Ravindran ◽  
Kavi Narayana Murthy

Word embeddings have recently become a vital part of many Natural Language Processing (NLP) systems. Word embeddings are a suite of techniques that represent words in a language as vectors in an n-dimensional real space that has been shown to encode a significant amount of syntactic and semantic information. When used in NLP systems, these representations have resulted in improved performance across a wide range of NLP tasks. However, it is not clear how syntactic properties interact with the more widely studied semantic properties of words. Or what the main factors in the modeling formulation are that encourages embedding spaces to pick up more of syntactic behavior as opposed to semantic behavior of words. We investigate several aspects of word embedding spaces and modeling assumptions that maximize syntactic coherence — the degree to which words with similar syntactic properties form distinct neighborhoods in the embedding space. We do so in order to understand which of the existing models maximize syntactic coherence making it a more reliable source for extracting syntactic category (POS) information. Our analysis shows that syntactic coherence of S-CODE is superior to the other more popular and more recent embedding techniques such as Word2vec, fastText, GloVe and LexVec, when measured under compatible parameter settings. Our investigation also gives deeper insights into the geometry of the embedding space with respect to syntactic coherence, and how this is influenced by context size, frequency of words, and dimensionality of the embedding space.


10.2196/12310 ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. e12310 ◽  
Author(s):  
Emeric Dynomant ◽  
Romain Lelong ◽  
Badisse Dahamna ◽  
Clément Massonnaud ◽  
Gaétan Kerdelhué ◽  
...  

Background Word embedding technologies, a set of language modeling and feature learning techniques in natural language processing (NLP), are now used in a wide range of applications. However, no formal evaluation and comparison have been made on the ability of each of the 3 current most famous unsupervised implementations (Word2Vec, GloVe, and FastText) to keep track of the semantic similarities existing between words, when trained on the same dataset. Objective The aim of this study was to compare embedding methods trained on a corpus of French health-related documents produced in a professional context. The best method will then help us develop a new semantic annotator. Methods Unsupervised embedding models have been trained on 641,279 documents originating from the Rouen University Hospital. These data are not structured and cover a wide range of documents produced in a clinical setting (discharge summary, procedure reports, and prescriptions). In total, 4 rated evaluation tasks were defined (cosine similarity, odd one, analogy-based operations, and human formal evaluation) and applied on each model, as well as embedding visualization. Results Word2Vec had the highest score on 3 out of 4 rated tasks (analogy-based operations, odd one similarity, and human validation), particularly regarding the skip-gram architecture. Conclusions Although this implementation had the best rate for semantic properties conservation, each model has its own qualities and defects, such as the training time, which is very short for GloVe, or morphological similarity conservation observed with FastText. Models and test sets produced by this study will be the first to be publicly available through a graphical interface to help advance the French biomedical research.


Author(s):  
R.W. Horne

The technique of surrounding virus particles with a neutralised electron dense stain was described at the Fourth International Congress on Electron Microscopy, Berlin 1958 (see Home & Brenner, 1960, p. 625). For many years the negative staining technique in one form or another, has been applied to a wide range of biological materials. However, the full potential of the method has only recently been explored following the development and applications of optical diffraction and computer image analytical techniques to electron micrographs (cf. De Hosier & Klug, 1968; Markham 1968; Crowther et al., 1970; Home & Markham, 1973; Klug & Berger, 1974; Crowther & Klug, 1975). These image processing procedures have allowed a more precise and quantitative approach to be made concerning the interpretation, measurement and reconstruction of repeating features in certain biological systems.


Author(s):  
E.D. Wolf

Most microelectronics devices and circuits operate faster, consume less power, execute more functions and cost less per circuit function when the feature-sizes internal to the devices and circuits are made smaller. This is part of the stimulus for the Very High-Speed Integrated Circuits (VHSIC) program. There is also a need for smaller, more sensitive sensors in a wide range of disciplines that includes electrochemistry, neurophysiology and ultra-high pressure solid state research. There is often fundamental new science (and sometimes new technology) to be revealed (and used) when a basic parameter such as size is extended to new dimensions, as is evident at the two extremes of smallness and largeness, high energy particle physics and cosmology, respectively. However, there is also a very important intermediate domain of size that spans from the diameter of a small cluster of atoms up to near one micrometer which may also have just as profound effects on society as “big” physics.


Author(s):  
B. J. Hockey

Ceramics, such as Al2O3 and SiC have numerous current and potential uses in applications where high temperature strength, hardness, and wear resistance are required often in corrosive environments. These materials are, however, highly anisotropic and brittle, so that their mechanical behavior is often unpredictable. The further development of these materials will require a better understanding of the basic mechanisms controlling deformation, wear, and fracture.The purpose of this talk is to describe applications of TEM to the study of the deformation, wear, and fracture of Al2O3. Similar studies are currently being conducted on SiC and the techniques involved should be applicable to a wide range of hard, brittle materials.


Author(s):  
H. Todokoro ◽  
S. Nomura ◽  
T. Komoda

It is interesting to observe polymers at atomic size resolution. Some works have been reported for thorium pyromellitate by using a STEM (1), or a CTEM (2,3). The results showed that this polymer forms a chain in which thorium atoms are arranged. However, the distance between adjacent thorium atoms varies over a wide range (0.4-1.3nm) according to the different authors.The present authors have also observed thorium pyromellitate specimens by means of a field emission STEM, described in reference 4. The specimen was prepared by placing a drop of thorium pyromellitate in 10-3 CH3OH solution onto an amorphous carbon film about 2nm thick. The dark field image is shown in Fig. 1A. Thorium atoms are clearly observed as regular atom rows having a spacing of 0.85nm. This lattice gradually deteriorated by successive observations. The image changed to granular structures, as shown in Fig. 1B, which was taken after four scanning frames.


Author(s):  
T. Miyokawa ◽  
S. Norioka ◽  
S. Goto

Field emission SEMs (FE-SEMs) are becoming popular due to their high resolution needs. In the field of semiconductor product, it is demanded to use the low accelerating voltage FE-SEM to avoid the electron irradiation damage and the electron charging up on samples. However the accelerating voltage of usual SEM with FE-gun is limited until 1 kV, which is not enough small for the present demands, because the virtual source goes far from the tip in lower accelerating voltages. This virtual source position depends on the shape of the electrostatic lens. So, we investigated several types of electrostatic lenses to be applicable to the lower accelerating voltage. In the result, it is found a field emission gun with a conical anode is effectively applied for a wide range of low accelerating voltages.A field emission gun usually consists of a field emission tip (cold cathode) and the Butler type electrostatic lens.


Author(s):  
David A. Ansley

The coherence of the electron flux of a transmission electron microscope (TEM) limits the direct application of deconvolution techniques which have been used successfully on unmanned spacecraft programs. The theory assumes noncoherent illumination. Deconvolution of a TEM micrograph will, therefore, in general produce spurious detail rather than improved resolution.A primary goal of our research is to study the performance of several types of linear spatial filters as a function of specimen contrast, phase, and coherence. We have, therefore, developed a one-dimensional analysis and plotting program to simulate a wide 'range of operating conditions of the TEM, including adjustment of the:(1) Specimen amplitude, phase, and separation(2) Illumination wavelength, half-angle, and tilt(3) Objective lens focal length and aperture width(4) Spherical aberration, defocus, and chromatic aberration focus shift(5) Detector gamma, additive, and multiplicative noise constants(6) Type of spatial filter: linear cosine, linear sine, or deterministic


Sign in / Sign up

Export Citation Format

Share Document