semantic representation
Recently Published Documents





Fan Xu ◽  
Yangjie Dan ◽  
Keyu Yan ◽  
Yong Ma ◽  
Mingwen Wang

Chinese dialects discrimination is a challenging natural language processing task due to scarce annotation resource. In this article, we develop a novel Chinese dialects discrimination framework with transfer learning and data augmentation (CDDTLDA) in order to overcome the shortage of resources. To be more specific, we first use a relatively larger Chinese dialects corpus to train a source-side automatic speech recognition (ASR) model. Then, we adopt a simple but effective data augmentation method (i.e., speed, pitch, and noise disturbance) to augment the target-side low-resource Chinese dialects, and fine-tune another target ASR model based on the previous source-side ASR model. Meanwhile, the potential common semantic features between source-side and target-side ASR models can be captured by using self-attention mechanism. Finally, we extract the hidden semantic representation in the target ASR model to conduct Chinese dialects discrimination. Our extensive experimental results demonstrate that our model significantly outperforms state-of-the-art methods on two benchmark Chinese dialects corpora.

2022 ◽  
Vol 31 (2) ◽  
pp. 1-34
Patrick Keller ◽  
Abdoul Kader Kaboré ◽  
Laura Plein ◽  
Jacques Klein ◽  
Yves Le Traon ◽  

Recent successes in training word embeddings for Natural Language Processing ( NLP ) tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robust or non-generalizable. In this work, we investigate a novel embedding approach based on the intuition that source code has visual patterns of semantics. We further use these patterns to address the outstanding challenge of identifying semantic code clones. We propose the WySiWiM  ( ‘ ‘What You See Is What It Means ” ) approach where visual representations of source code are fed into powerful pre-trained image classification neural networks from the field of computer vision to benefit from the practical advantages of transfer learning. We evaluate the proposed embedding approach on the task of vulnerable code prediction in source code and on two variations of the task of semantic code clone identification: code clone detection (a binary classification problem), and code classification (a multi-classification problem). We show with experiments on the BigCloneBench (Java), Open Judge (C) that although simple, our WySiWiM  approach performs as effectively as state-of-the-art approaches such as ASTNN or TBCNN. We also showed with data from NVD and SARD that WySiWiM  representation can be used to learn a vulnerable code detector with reasonable performance (accuracy ∼90%). We further explore the influence of different steps in our approach, such as the choice of visual representations or the classification algorithm, to eventually discuss the promises and limitations of this research direction.

Information ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 40
Nemury Silega ◽  
Eliani Varén ◽  
Alfredo Varén ◽  
Yury I. Rogozov ◽  
Vyacheslav S. Lapshin ◽  

The COVID-19 pandemic has caused the deaths of millions of people around the world. The scientific community faces a tough struggle to reduce the effects of this pandemic. Several investigations dealing with different perspectives have been carried out. However, it is not easy to find studies focused on COVID-19 contagion chains. A deep analysis of contagion chains may contribute new findings that can be used to reduce the effects of COVID-19. For example, some interesting chains with specific behaviors could be identified and more in-depth analyses could be performed to investigate the reasons for such behaviors. To represent, validate and analyze the information of contagion chains, we adopted an ontological approach. Ontologies are artificial intelligence techniques that have become widely accepted solutions for the representation of knowledge and corresponding analyses. The semantic representation of information by means of ontologies enables the consistency of the information to be checked, as well as automatic reasoning to infer new knowledge. The ontology was implemented in Ontology Web Language (OWL), which is a formal language based on description logics. This approach could have a special impact on smart cities, which are characterized as using information to enhance the quality of basic services for citizens. In particular, health services could take advantage of this approach to reduce the effects of COVID-19.

Languages ◽  
2022 ◽  
Vol 7 (1) ◽  
pp. 13
Fabrizio Macagno

The fallacy of ignoring qualifications, or secundum quid et simpliciter, is a deceptive strategy that is pervasive in argumentative dialogues, discourses, and discussions. It consists in misrepresenting an utterance so that its meaning is broadened, narrowed, or simply modified to pursue different goals, such as drawing a specific conclusion, attacking the interlocutor, or generating humorous reactions. The “secundum quid” was described by Aristotle as an interpretative manipulative strategy, based on the contrast between the “proper” sense of a statement and its meaning taken absolutely or in a certain respect. However, how can an “unqualified” statement have a proper meaning different from the qualified one, and vice versa? This “linguistic” fallacy brings to light a complex relationship between pragmatics, argumentation, and interpretation. The secundum quid is described in this paper as a manipulative argument, whose deceptive effect lies in its pragmatic dimension. This fallacy is analyzed as a strategy of decontextualization lying at the interface between pragmatics and argumentation and consisting of the unwarranted passage from an utterance to its semantic representation. By ignoring the available evidence and the presumptive interpretation of a statement, the speaker places it in a different context or suppresses textual and contextual evidence to infer a specific meaning different from the presumable one.

2022 ◽  
Vol 7 (5) ◽  
pp. 16-23
V. M. Glushak

The present paper deals with the semantic representation of attributive elements in the nominative group − relative adjectives and composites. The aim of the present study is to analyze the semantic representation of a predominantly noun in an attributive group with reference to Qualia structures, as applied in J. Pustejovsky’s theory of generative lexicon, and their realization through relative adjectives and composites, which are direct explicators of the semantic structure of predominant words (nouns). The study is based on the 150 most frequent relative adjectives identified on the basis of the electronic corpus of the German language. Relative adjectives qualify as Qualia structures and can denote objective constitutive (material and origin), formal (physical parameters, colour, time, place), telic (purpose and function of an object) and agentive (information about the creator, artifact, natural genus and causal chain) properties. The system of relative adjectives demonstrates a wide range of semantic meanings that are expressed in German by other linguistic means, e.g. composites. When describing the possibility of implementing Qualia structures in the noun group, other correlative linguistic units, such as composites, are analyzed. The variation in the use of nouns in composites and relative adjectives formed from them helps to actualize the presence of a feature in a particular situation, to give the nominal groups a terminological character and to make the transition from one qualification group to another.

2021 ◽  
Taiwo Kolajo ◽  
Olawande Daramola ◽  
Ayodele A. Adebiyi

Abstract Interactions via social media platforms have made it possible for anyone, irrespective of physical location, to gain access to quick information on events taking place all over the globe. However, the semantic processing of social media data is complicated due to challenges such as language complexity, unstructured data, and ambiguity. In this paper, we proposed the Social Media Analysis Framework for Event Detection (SMAFED). SMAFED aims to facilitate improved semantic analysis of noisy terms in social media streams, improved representation/embedding of social media stream content, and improved summarisation of event clusters in social media streams. For this, we employed key concepts such as integrated knowledge base, resolving ambiguity, semantic representation of social media streams, and Semantic Histogram-based Incremental Clustering based on semantic relatedness. Two evaluation experiments were conducted to validate the approach. First, we evaluated the impact of the data enrichment layer of SMAFED. We found that SMAFED outperformed other pre-processing frameworks with a lower loss function of 0.15 on the first dataset and 0.05 on the second dataset. Secondly, we determined the accuracy of SMAFED at detecting events from social media streams. The result of this second experiment showed that SMAFED outperformed existing event detection approaches with better Precision (0.922), Recall (0.793), and F-Measure (0.853) metric scores. The findings of the study present SMAFED as a more efficient approach to event detection in social media.

2021 ◽  
Vol 87 ◽  
pp. 523-573 ◽  
István Mikó ◽  
Lubomir Masner ◽  
Jonah M. Ulmer ◽  
Monique Raymond ◽  
Julia Hobbie ◽  

Teleasinae are commonly collected scelionids that are the only known egg parasitoids of carabid beetles and therefore play a crucial role in shaping carabid populations in natural and agricultural ecosystems. We review the available host information of Teleasinae, report a new host record, and revise Gryonoides Dodd, 1920, a morphologically distinct teleasine genus. We review the generic concept of Gryonoides and provide diagnoses and descriptions of thirteen Gryonoides species and two varieties: G. glabriceps Dodd, 1920, G. pulchellus Dodd, 1920 (= G. doddi Ogloblin, 1967, syn. nov. and G. pulchricornis Ogloblin, 1967, syn. nov.), G. brasiliensis Masner & Mikó, sp. nov., G. flaviclavus Masner & Mikó, sp. nov., G. fuscoclavatus Masner & Mikó, sp. nov., G. garciai Masner & Mikó, sp. nov., G. mexicali Masner & Mikó, sp. nov., G. mirabilicornis Masner & Mikó, sp. nov., G. obtusus Masner & Mikó, sp. nov., G. paraguayensis Masner & Mikó, sp. nov., G. rugosus Masner & Mikó, sp. nov., G. uruguayensis Masner & Mikó, sp. nov. We treat Gryonoides scutellaris Dodd, 1920, as status uncertain. Gryonoides mirabilicornis Masner & Mikó, sp. nov. is the only known teleasine with tyloids on two consecutive flagellomeres, a well-known trait of Sparasionidae. An illustrated identification key to species of Gryonoides, a queryable semantic representation of species descriptions using PhenoScript, and a simple approach for making Darwin Core Archive files in taxonomic revisions accessible are provided.

2021 ◽  
JeYoung Jung ◽  
Stephen Williams ◽  
Faezeh Sanae Nezhad ◽  
Matthew Lambon Ralph

Abstract The effect of repetitive transcranial magnetic stimulation can vary considerably across individuals, but the reasons for this still remain unclear. Here, we investigated whether the response to continuous theta-burst stimulation (cTBS) – an effective protocol for decreasing cortical excitability – related to individual differences in glutamate and GABA neurotransmission. We applied cTBS over the anterior temporal lobe (ATL), a hub for semantic representation, to explore the relationship between the baseline neurochemical profiles in this region and the response to this stimulation. Our experiments revealed that non-responders (subjects who did not show an inhibitory effect of cTBS on subsequent semantic performance) had higher excitatory-inhibitory balance (glutamate + glutamine/GABA ratio) in the ATL, which led to up-regulated task-induced regional activity as well as increased ATL-connectivity with other semantic regions compared to responders. These results disclose that the baseline neurochemical state of a cortical region can be a significant factor in predicting responses to cTBS.

Tatiana Melnichuk ◽  
Natalia Saburova ◽  

Media discourse is an effective tool for projecting and shaping the public perception of a certain idea or image. The article focuses on the linguistic and semantic representation of the concept “Black” in the American media discourse with a particular attention to how the concept representation has evolved from the 1990s to 2010s. The study employed corpus methodology (keyness, frequency, concordances) to analyze news articles from “The New York Times” and “The Los Angeles Times”, which were arranged into three corpora according to the publication date (1990s, 2000s, 2010s). The corpus analysis established a number of changes in the concept “Black” representation manifested primarily through the high relevance keywords and high frequency collocations. Dominant semantic components were identified in the concept representation in each corpus, as well as notable shifts in core and peripheral aspects within these semantic components. The analysis showed that although the semantic components ‘racial / ethnic inequality’ and ‘economic issues’ remain at the core of the concept in each corpus, they are expressed through connections with other semantic components which may vary throughout three decades, such as ‘culture’ in the 1990s, ‘education’ and ‘politics’ in the 2000s and ‘police brutality and profiling’ and ‘appearance’ in the 2010s.

2021 ◽  
Daniela Moctezuma ◽  
Víctor Muníz ◽  
Jorge García

Social media data is currently the main input to a wide variety of research works in many knowledge fields. This kind of data is generally multimodal, i.e., it contains different modalities of information such as text, images, video or audio, mainly. To deal with multimodal data to tackle a specific task could be very difficult. One of the main challenges is to find useful representations of the data, capable of capturing the subtle information that the users who generate that information provided, or even the way they use it. In this paper, we analysed the usage of two modalities of data, images, and text, both in a separate way and by combining them to address two classification problems: meme's classification and user profiling. For images, we use a textual semantic representation by using a pre-trained model of image captioning. Later, a text classifier based on optimal lexical representations was used to build a classification model. Interesting findings were found in the usage of these two modalities of data, and the pros and cons of using them to solve the two classification problems are also discussed.

Sign in / Sign up

Export Citation Format

Share Document