scholarly journals ALaSca: an Automated approach for Large-Scale Lexical Substitution

Author(s):  
Caterina Lacerra ◽  
Tommaso Pasini ◽  
Rocco Tripodi ◽  
Roberto Navigli

The lexical substitution task aims at finding suitable replacements for words in context. It has proved to be useful in several areas, such as word sense induction and text simplification, as well as in more practical applications such as writing-assistant tools. However, the paucity of annotated data has forced researchers to apply mainly unsupervised approaches, limiting the applicability of large pre-trained models and thus hampering the potential benefits of supervised approaches to the task. In this paper, we mitigate this issue by proposing ALaSca, a novel approach to automatically creating large-scale datasets for English lexical substitution. ALaSca allows examples to be produced for potentially any word in a language vocabulary and to cover most of the meanings it lists. Thanks to this, we can unleash the full potential of neural architectures and finetune them on the lexical substitution task. Indeed, when using our data, a transformer-based model performs substantially better than when using manually annotated data only. We release ALaSca at https://sapienzanlp.github.io/alasca/.

Author(s):  
Reinald Kim Amplayo ◽  
Seung-won Hwang ◽  
Min Song

Word sense induction (WSI), or the task of automatically discovering multiple senses or meanings of a word, has three main challenges: domain adaptability, novel sense detection, and sense granularity flexibility. While current latent variable models are known to solve the first two challenges, they are not flexible to different word sense granularities, which differ very much among words, from aardvark with one sense, to play with over 50 senses. Current models either require hyperparameter tuning or nonparametric induction of the number of senses, which we find both to be ineffective. Thus, we aim to eliminate these requirements and solve the sense granularity problem by proposing AutoSense, a latent variable model based on two observations: (1) senses are represented as a distribution over topics, and (2) senses generate pairings between the target word and its neighboring word. These observations alleviate the problem by (a) throwing garbage senses and (b) additionally inducing fine-grained word senses. Results show great improvements over the stateof-the-art models on popular WSI datasets. We also show that AutoSense is able to learn the appropriate sense granularity of a word. Finally, we apply AutoSense to the unsupervised author name disambiguation task where the sense granularity problem is more evident and show that AutoSense is evidently better than competing models. We share our data and code here: https://github.com/rktamplayo/AutoSense.


2013 ◽  
Vol 39 (3) ◽  
pp. 709-754 ◽  
Author(s):  
Antonio Di Marco ◽  
Roberto Navigli

Web search result clustering aims to facilitate information search on the Web. Rather than the results of a query being presented as a flat list, they are grouped on the basis of their similarity and subsequently shown to the user as a list of clusters. Each cluster is intended to represent a different meaning of the input query, thus taking into account the lexical ambiguity (i.e., polysemy) issue. Existing Web clustering methods typically rely on some shallow notion of textual similarity between search result snippets, however. As a result, text snippets with no word in common tend to be clustered separately even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query. In this article we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction. Key to our approach is to first acquire the various senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on data sets of ambiguous queries, show that our approach outperforms both Web clustering and search engines.


Author(s):  
Michael T. Postek

The term ultimate resolution or resolving power is the very best performance that can be obtained from a scanning electron microscope (SEM) given the optimum instrumental conditions and sample. However, as it relates to SEM users, the conventional definitions of this figure are ambiguous. The numbers quoted for the resolution of an instrument are not only theoretically derived, but are also verified through the direct measurement of images on micrographs. However, the samples commonly used for this purpose are specifically optimized for the measurement of instrument resolution and are most often not typical of the sample used in practical applications.SEM RESOLUTION. Some instruments resolve better than others either due to engineering design or other reasons. There is no definitively accurate definition of how to quantify instrument resolution and its measurement in the SEM.


2019 ◽  
Author(s):  
Antoine Maruani ◽  
Peter A. Szijj ◽  
Calise Bahou ◽  
João C. F. Nogueira ◽  
Stephen Caddick ◽  
...  

<p>Diseases are multifactorial, with redundancies and synergies between various pathways. However, most of the antibody-based therapeutics in clinical trials and on the market interact with only one target thus limiting their efficacy. The targeting of multiple epitopes could improve the therapeutic index of treatment and counteract mechanisms of resistance. To this effect, a new class of therapeutics emerged: bispecific antibodies.</p><p>Bispecific formation using chemical methods is rare and low yielding and/or requires a large excess of one of the two proteins to avoid homodimerisation. In order for chemically prepared bispecifics to deliver their full potential, high-yielding, modular and reliable cross-linking technologies are required. Herein, we describe a novel approach not only for the rapid and high-yielding chemical generation of bispecific antibodies from native antibody fragments, but also for the site-specific dual functionalisation of the resulting bioconjugates. Based on orthogonal clickable functional groups, this strategy enables the assembly of functionalised bispecifics with controlled loading in a modular and convergent manner.</p>


2019 ◽  
Author(s):  
Chem Int

This research work presents a facile and green route for synthesis silver sulfide (Ag2SNPs) nanoparticles from silver nitrate (AgNO3) and sodium sulfide nonahydrate (Na2S.9H2O) in the presence of rosemary leaves aqueous extract at ambient temperature (27 oC). Structural and morphological properties of Ag2SNPs nanoparticles were analyzed by X-ray diffraction (XRD) and transmission electron microscopy (TEM). The surface Plasmon resonance for Ag2SNPs was obtained around 355 nm. Ag2SNPs was spherical in shape with an effective diameter size of 14 nm. Our novel approach represents a promising and effective method to large scale synthesis of eco-friendly antibacterial activity silver sulfide nanoparticles.


2018 ◽  
Vol 16 (1) ◽  
pp. 67-76
Author(s):  
Disyacitta Neolia Firdana ◽  
Trimurtini Trimurtini

This research aimed to determine the properness and effectiveness of the big book media on learning equivalent fractions of fourth grade students. The method of research is Research and Development  (R&D). This study was conducted in fourth grade of SDN Karanganyar 02 Kota Semarang. Data sources from media validation, material validation, learning outcomes, and teacher and students responses on developed media. Pre-experimental research design with one group pretest-posttest design. Big book developed consist of equivalent fractions material, students learning activities sheets with rectangle and circle shape pictures, and questions about equivalent fractions. Big book was developed based on students and teacher needs. This big book fulfill the media validity of 3,75 with very good criteria and scored 3 by material experts with good criteria. In large-scale trial, the result of students posttest have learning outcomes completness 82,14%. The result of N-gain calculation with result 0,55 indicates the criterion “medium”. The t-test result 9,6320 > 2,0484 which means the average of posttest outcomes is better than the average of pretest outcomes. Based on that data, this study has produced big book media which proper and effective as a media of learning equivalent fractions of fourth grade elementary school.


Author(s):  
Ron Avi Astor ◽  
Rami Benbenisthty

Since 2005, the bullying, school violence, and school safety literatures have expanded dramatically in content, disciplines, and empirical studies. However, with this massive expansion of research, there is also a surprising lack of theoretical and empirical direction to guide efforts on how to advance our basic science and practical applications of this growing scientific area of interest. Parallel to this surge in interest, cultural norms, media coverage, and policies to address school safety and bullying have evolved at a remarkably quick pace over the past 13 years. For example, behaviors and populations that just a decade ago were not included in the school violence, bullying, and school safety discourse are now accepted areas of inquiry. These include, for instance, cyberbullying, sexting, social media shaming, teacher–student and student–teacher bullying, sexual harassment and assault, homicide, and suicide. Populations in schools not previously explored, such as lesbian, gay, bisexual, transgender, and queer students and educators and military- and veteran-connected students, become the foci of new research, policies, and programs. As a result, all US states and most industrialized countries now have a complex quilt of new school safety and bullying legislation and policies. Large-scale research and intervention funding programs are often linked to these policies. This book suggests an empirically driven unifying model that brings together these previously distinct literatures. This book presents an ecological model of school violence, bullying, and safety in evolving contexts that integrates all we have learned in the 13 years, and suggests ways to move forward.


GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Ariel Rokem ◽  
Kendrick Kay

Abstract Background Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable. Results The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets. Conclusion Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.


Author(s):  
Silvia Huber ◽  
Lars B. Hansen ◽  
Lisbeth T. Nielsen ◽  
Mikkel L. Rasmussen ◽  
Jonas Sølvsteen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document