semantic distance Latest Research Papers

Semantic Based Weighted Web Session Clustering Using Adapted K-Means and Hierarchical Agglomerative Algorithms

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2125 ◽

2022 ◽

Author(s):

Sowmya HK ◽

R. J. Anandhi

Keyword(s):

Clustering Algorithms ◽

Threshold Value ◽

Semantic Distance ◽

Web Usage Mining ◽

Identification Algorithm ◽

Agglomerative Clustering ◽

Dissimilarity Matrix ◽

Identification Methods ◽

Web Usage ◽

Stay Time

The WWW has a big number of pages and URLs that supply the user with a great amount of content. In an intensifying epoch of information, analysing users browsing behaviour is a significant affair. Web usage mining techniques are applied to the web server log to analyse the user behaviour. Identification of user sessions is one of the key and demanding tasks in the pre-processing stage of web usage mining. This paper emphasizes on two important fallouts with the approaches used in the existing session identification methods such as Time based and Referrer based sessionization. The first is dealing with comparing of current request’s referrer field with the URL of previous request. The second is dealing with session creation, new sessions are created or comes in to one session due to threshold value of page stay time and session time. So, authors developed enhanced semantic distance based session identification algorithm that tackles above mentioned issues of traditional session identification methods. The enhanced semantic based method has an accuracy of 84 percent, which is higher than the Time based and Time-Referrer based session identification approaches. The authors also used adapted K-Means and Hierarchical Agglomerative clustering algorithms to improve the prediction of user browsing patterns. Clusters were found using a weighted dissimilarity matrix, which is calculated using two key parameters: page weight and session weight. The Dunn Index and Davies-Bouldin Index are then used to evaluate the clusters. Experimental results shows that more pure and accurate session clusters are formed when adapted clustering algorithms are applied on the weighted sessions rather than the session obtained from traditional sessionization algorithms. Accuracy of the semantic session cluster is higher compared with the cluster of sessions obtained using traditional sessionization.

SEMANTIC AND STYLISTIC BASIS OF MORPHOLOGICAL DOUBLETS

Известия СОИГСИ ◽

10.46698/vnc.2021.81.42.011 ◽

2021 ◽

pp. 107-118

Author(s):

Э.Т. ГУТИЕВА

Keyword(s):

Semantic Distance ◽

General Development ◽

Plural Form ◽

Parallel Forms ◽

The Matrix ◽

Diachronic Analysis ◽

The Individual ◽

Morphological System ◽

Semantic Processes

Параллельные формы множественного числа в осетинском языке встречаются как у заимствованных слов, у которых могут быть приметой ассимилированности слова, так и в парадигме склонения исконных слов. Такие формы отмечены преимущественно в терминосистеме родства, анализ которых позволяет уточнить представления об этимологическом развитии отдельных слов и общегрупповых морфологических, семантических процессах в системе терминов родства. Экстраполяция данных других языков, а также диахронический анализ осетинского материала позволяет восполнить недостающие элементы матрицы. Вывод о языковой потребности маркировать морфологическим способом различные значения, оформлять собирательное плюрального, отмеченного в ряде языков для различения простого множественного и собирательного, а также обозначить стилистическую принадлежность той или иной формы основывается на сопоставлении параллельных форм множественного числа терминов родства. Особое внимание уделяется отсутствию регулярной формы множественного числа у осетинской лексемы ус “жена”. Наличие параллельных форм может отражать первоначальное существование паронимов, у которых, ввиду минимальности семантической дистанции между ними и высокой омонимичности, происходила контаминация парадигм склонения. Пароним, оформленный суффиксом терминов родства, мог обозначать “жена”, соответственно, пароним без форманта мог употребляться в значении “женщина”. На образование форм множественного числа могли оказывать влияние и другие языковые процессы. И возникновение, и утрата дублетных форм могли быть обусловлены экстралингвистическими факторами. Parallel plural forms in the Ossetian language are found both in the borrowed words, which may sign the degree of assimilation of a loan, and in the declension of the original words. Such forms are noted mainly in the system of kinship terms, their analysis makes it possible to clarify ideas about the etymological development of the individual words, about the general development of morphological system, and semantic processes in the system of terms of kinship. Extrapolation of data from other languages, as well as diachronic analysis of the Ossetian material, proper, makes it possible to fill in the missing elements of the matrix. Based on the comparison of the parallel forms of the plural of kinship terms, a conclusion is made about the linguistic need to mark different meanings in a morphological way, to indicate the collective plural, peculiar to a number of languages, to distinguish between the simple plural and the collective, and to designate the stylistic features of one form or another. Particular attention is paid to the absence of a regular plural form in the Ossetian lexeme of affinity "us" - "wife". The presence of parallel forms may reflect the initial existence of the paronyms, in which, due to the minimal semantic distance between them and the high homonymy, there was a contamination of two declination paradigms. The paronym, formed by the suffix of the terms of kinship, could have meant “wife”, respectively, the paronym without the formant could have been used in the meaning of “woman”. The formation of plural forms could also be influenced by other linguistic processes. Both the emergence and loss of doublet forms could be accounted for by extralinguistic factors.

HAZOP Ontology Semantic Similarity Algorithm Based on ACO-GRNN

Processes ◽

10.3390/pr9122115 ◽

2021 ◽

Vol 9 (12) ◽

pp. 2115

Author(s):

Yujie Bai ◽

Dong Gao ◽

Lanfei Peng

Keyword(s):

Semantic Similarity ◽

Petrochemical Industry ◽

Expert Knowledge ◽

Safety Evaluation ◽

Semantic Distance ◽

Propagation Path ◽

Analysis Method ◽

Pearson Coefficient ◽

Knowledge Expression ◽

Similarity Algorithm

Hazard and operability (HAZOP) is an important safety analysis method, which is widely used in the safety evaluation of petrochemical industry. The HAZOP analysis report contains a large amount of expert knowledge and experience. In order to realize the effective expression and reuse of knowledge, the knowledge ontology is constructed to store the risk propagation path and realize the standardization of knowledge expression. On this basis, a comprehensive algorithm of ontology semantic similarity based on the ant clony optimization generalized neural network (ACO-GRNN) model is proposed to improve the accuracy of semantic comparison. This method combines the concept name, semantic distance, and improved attribute coincidence calculation method, and ACO-GRNN is used to train the weights of each part, avoiding the influence of manual weighting. The results show that the Pearson coefficient of this method reaches 0.9819, which is 45.83% higher than the traditional method. It could solve the problems of semantic comparison and matching, and lays a good foundation for subsequent knowledge retrieval and reuse.

An experimental analysis of inspiration for creative ideation

10.31234/osf.io/7aewd ◽

2021 ◽

Author(s):

Serena Mastria ◽

Sergio Agnoli ◽

GIOVANNI EMANUELE CORAZZA ◽

Laura Franchin

Keyword(s):

Divergent Thinking ◽

Past Research ◽

Semantic Distance ◽

Irrelevant Information ◽

Creative Performance ◽

Attentional Processing ◽

Main Determinants ◽

The Individual ◽

Creative Ideation

What inspires us during a creative act? We know from past research that apparently irrelevant information for a task at hand can lead to higher creative performance, especially in open-minded individuals. But what does irrelevance mean and how can open-minded individuals be inspired by this kind of information? Through two diverse experimental procedures, the present work investigated which type of irrelevance information inspires (i.e., increases) the creative performance during a divergent thinking (DT) task. In Experiment 1, the attentional processing of information that was either relevant or irrelevant for the execution of a verbal DT task was assessed by means of an eye-tracking methodology. In Experiment 2, creative performance was explored through a verbal priming paradigm, which forcedly introduced irrelevant information during the DT task. In both experiments, the level of irrelevance was operationalized in terms of semantic distance between the information that is central for the task at hand and the information that is apparently irrelevant for its execution. Results from both experiments highlighted the role of irrelevant information and of the Openness trait in influencing the originality or uncommonness of the responses produced during the task as well as the role of the semantic meaning of the irrelevant information as one of the main determinants of inspiration (i.e., enhancement) of the creative performance. Inspiration emerged therefore to be related to the meaning of the inspirational (i.e., apparently irrelevant) information in a given context and to the individual disposition to process this kind of information.

Automated Scoring of Figural Creativity using a Convolutional Neural Network

10.31234/osf.io/8qe7y ◽

2021 ◽

Author(s):

David H Cropley ◽

Rebecca L Marrone

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Creative Thinking ◽

Divergent Thinking ◽

Reliability And Validity ◽

Semantic Distance ◽

Self Report ◽

Creativity Assessment ◽

Creativity Research ◽

Creative Self

One of the abiding challenges in creativity research is assessment. Objectively scored tests of creativity such as the Torrance Tests of Creativity (TTCT) and the Test of Creative Thinking - Drawing Production (TCT-DP) offer high levels of reliability and validity but are slow and expensive to administer and score. As a result, many creativity researchers default to simpler and faster self-report measures of creativity and related constructs (e.g., creative self-efficacy, openness). Recent research, however, has begun to explore the use of computational approaches to address these limitations. Examples include the Divergent Association Task (DAT) that uses computational methods to rapidly assess the semantic distance of words, as a proxy for divergent thinking. To date, however, no research appears to have emerged that uses methods drawn from the field of artificial intelligence to assess existing objective, figural (i.e., drawing) tests of creativity. This paper describes the application of machine learning, in the form of a convolutional neural network, to the assessment of a figural creativity test – the TCT-DP. The approach shows excellent accuracy and speed, eliminating traditional barriers to the use of these objective, figural creativity tests and opening new avenues for automated creativity assessment.

Efficient NC Process Scheme Generation Method Based on Reusable Macro and Micro Process Fusion

10.21203/rs.3.rs-922612/v1 ◽

2021 ◽

Author(s):

Bo Huang ◽

Kai He ◽

Rui Huang ◽

Feifei Zhang ◽

Xiuling Li ◽

...

Keyword(s):

Process Design ◽

Manufacturing Industry ◽

Evaluation Model ◽

Semantic Distance ◽

Process Scheme ◽

Engineering Applications ◽

User Interactions ◽

Design Efficiency ◽

Local Structures ◽

Practical Engineering

Abstract Process reuse technology has been widely studied and applied in manufacturing industry. However, the current NC process reuse generally assumes that the micro process is compatible with the macro process, but in fact, the reusable processes from the similar local structures of multiple parts are usually difficult to be compatible with each other under the overall manufacturing requirements of the query parts, which leads to the fact that a large amount of user interactions are still required for modification and adjustment in practical engineering applications, so it is not significant to improve the design efficiency. Therefore, an efficient NC process scheme generation method based on reusable macro and micro process fusion is proposed in this paper. Firstly, according to the calculation of semantic distance of process design intention, the micro processes are mapped to the macro process to realize the fusion of the macro process and the micro process, and a compatibility credibility evaluation model is established to evaluate the compatibility of fusion results. Then, when the fusion result is credible, the machining areas corresponding to the process scheme are adjusted and optimized from the geometric level. The adjustment and optimization of machining areas mainly realize the integration of machining areas and the optimization of machining sequence. Finally, the effectiveness and feasibility of the proposed method are verified by the test of actual parts.

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

10.21437/interspeech.2021-1929 ◽

2021 ◽

Author(s):

Suyoun Kim ◽

Abhinav Arora ◽

Duc Le ◽

Ching-Feng Yeh ◽

Christian Fuegen ◽

...

Keyword(s):

Performance Analysis ◽

Semantic Distance ◽

Spoken Language ◽

Language Understanding ◽

Spoken Language Understanding

Nymph Piss and Gravy Orgies: Local and Global Contrast Effects in Relational Humor

10.31234/osf.io/3r7fw ◽

2021 ◽

Author(s):

Cynthia S. Q. Siew ◽

Tomas Engelthaler ◽

Thomas Hills

Keyword(s):

Word Pair ◽

Local Level ◽

Semantic Distance ◽

Local Context ◽

Large Set ◽

Local Contrast ◽

Global Contrast ◽

Letter Frequency ◽

Global And Local ◽

The Impact

How does the relation between two words create humor? In this paper, we investigated the effect of global and local contrast on the humor of word pairs. We capitalized on the existence of psycholinguistic lexical norms by examining violations of expectations set up by typical patterns of English usage (global contrast) and within the local context of the words within the word pairs (local contrast). Global contrast was operationalized as lexical-semantic norms for single-words and local contrast was operationalized as the orthographic, phonological, and semantic distance between the two words in the pair. Through crowdsourced (Study 1) and best-worst (Study 2) ratings of the humor of a large set of word pairs (i.e., compounds), we find evidence of both global and local contrast on compound-word humor. Specifically, we find that humor arises when there is a violation of expectations at the local level, between the individual words that make up the word pair, even after accounting for violations at the global level relative to the entire language. Semantic variables (arousal, dominance, concreteness) were stronger predictors of word pair humor whereas form-related variables (number of letters, phonemes, letter frequency) were stronger predictors of single-word humor. Moreover, we also find evidence for the specific ways in which semantic dissimilarity can increase humor, by using local contrast to defuse the impact of low-valence words by making them seem amusing, or to enhance the incongruence of highly imageable pairs of concrete words.

DIVIS: A Semantic Distance to Improve the Visualization of Incomplete Heterogeneous Phenotypic Datasets

10.21203/rs.3.rs-742853/v1 ◽

2021 ◽

Author(s):

Rayan Eid ◽

Claudine Landès ◽

Alix Pernet ◽

Emmanuel Benoît ◽

Pierre Santagostini ◽

...

Keyword(s):

Missing Values ◽

Real Life ◽

Semantic Distance ◽

Phenotypic Traits ◽

Phenotypic Data ◽

Large Numbers ◽

Underlying Principle ◽

Qualitative Variables ◽

Realistic Representation ◽

Insight Into

Abstract BackgroundThanks to the wider spread of high-throughput experimental techniques, biologists are accumulating large amounts of datasets which often mix quantitative and qualitative variables and are not always complete, in particular when they regard phenotypic traits. In order to get a first insight into these datasets and reduce the data matrices size scientists often rely on multivariate analyses. However such approaches are not always easily practicable in particular when faced with mixed datasets with missing values. Moreover displaying large numbers of individuals leads to cluttered visualizations which are difficult to interpret. ResultsWe introduce a new methodology to overcome these limits. The underlying principle consists in (i) grouping similar individuals, (ii) representing each group by emblematic individuals we call archetypes and (iii) build sparse visualizations based on these archetypes. As a preliminary step to the clustering we design a new semantic distance tailored for both quantitative and qualitative variables which allows a realistic representation of the relationships between individuals. This semantic distance is based on ontologies which are engineered to represent real life knowledge regarding the underlying variables. Our approach is implemented as a Python pipeline and illustrated by a rosebush dataset including passport and phenotypic data. ConclusionsThe introduction of our new semantic distance and of the archetype concept allows us to build a comprehensive representation of an incomplete dataset characterized by large proportion of qualitative data. The methodology described here could have wider use beyond information characterizing organisms or species and beyond plant science. Indeed we could apply the same approach to any incomplete mixed dataset.

Improved Deep Hashing with Scalable Interblock for Tourist Image Retrieval

Scientific Programming ◽

10.1155/2021/9937061 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Jiangfan Feng ◽

Wenzheng Sun

Keyword(s):

Image Retrieval ◽

Deep Neural Networks ◽

Feature Vector ◽

Semantic Distance ◽

Retrieval Performance ◽

Deep Hashing ◽

Binary Feature ◽

Two Samples ◽

Image Representations ◽

Hash Codes

Tourist image retrieval has attracted increasing attention from researchers. Mainly, supervised deep hash methods have significantly boosted the retrieval performance, which takes hand-crafted features as inputs and maps the high-dimensional binary feature vector to reduce feature-searching complexity. However, their performance depends on the supervised labels, but few labeled temporal and discriminative information is available in tourist images. This paper proposes an improved deep hash to learn enhanced hash codes for tourist image retrieval. It jointly determines image representations and hash functions with deep neural networks and simultaneously enhances the discriminative capability of tourist image hash codes with refined semantics of the accompanying relationship. Furthermore, we have tuned the CNN to implement end-to-end training hash mapping, calculating the semantic distance between two samples of the obtained binary codes. Experiments on various datasets demonstrate the superiority of the proposed approach compared to state-of-the-art shallow and deep hashing techniques.

semantic distance
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Semantic Based Weighted Web Session Clustering Using Adapted K-Means and Hierarchical Agglomerative Algorithms

SEMANTIC AND STYLISTIC BASIS OF MORPHOLOGICAL DOUBLETS

HAZOP Ontology Semantic Similarity Algorithm Based on ACO-GRNN

An experimental analysis of inspiration for creative ideation

Automated Scoring of Figural Creativity using a Convolutional Neural Network

Efficient NC Process Scheme Generation Method Based on Reusable Macro and Micro Process Fusion

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

Nymph Piss and Gravy Orgies: Local and Global Contrast Effects in Relational Humor

DIVIS: A Semantic Distance to Improve the Visualization of Incomplete Heterogeneous Phenotypic Datasets

Improved Deep Hashing with Scalable Interblock for Tourist Image Retrieval

Export Citation Format

semantic distanceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Semantic Based Weighted Web Session Clustering Using Adapted K-Means and Hierarchical Agglomerative Algorithms

SEMANTIC AND STYLISTIC BASIS OF MORPHOLOGICAL DOUBLETS

HAZOP Ontology Semantic Similarity Algorithm Based on ACO-GRNN

An experimental analysis of inspiration for creative ideation

Automated Scoring of Figural Creativity using a Convolutional Neural Network

Efficient NC Process Scheme Generation Method Based on Reusable Macro and Micro Process Fusion

Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding

Nymph Piss and Gravy Orgies: Local and Global Contrast Effects in Relational Humor

DIVIS: A Semantic Distance to Improve the Visualization of Incomplete Heterogeneous Phenotypic Datasets

Improved Deep Hashing with Scalable Interblock for Tourist Image Retrieval

semantic distance
Recently Published Documents