Words and Idioms

Mapping Intimacies ◽

10.1093/oxfordhb/9780195396683.013.0015 ◽

2013 ◽

Cited By ~ 1

Author(s):

Stefanie Wulff

Keyword(s):

Semantic Similarity ◽

Quantitative Measure

This chapter presents a constructionist analysis of words and idioms. It provides a summary of early constructionist research that argued in favor of viewing idioms not as anomalies, and addresses the problem of how the degree of semantic and syntactic irregularity of different constructions can be measured quantitatively. The chapter proposes a quantitative measure that numerically expresses this degree of semantic similarity and which can be considered a numerical measure of compositionality.

Download Full-text

Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology

10.1101/472217 ◽

2018 ◽

Author(s):

Muhammad Asif ◽

Hugo F. M. C. M. Martiniano ◽

Astrid M. Vicente ◽

Francisco M. Couto

Keyword(s):

Machine Learning ◽

Gene Ontology ◽

Candidate Genes ◽

Semantic Similarity ◽

Quantitative Measure ◽

Complex Diseases ◽

Supervised Machine Learning ◽

Disease Genes ◽

Machine Learning Classifiers ◽

Learning Classifiers

AbstractIdentifying disease genes from a vast amount of genetic data is one of the most challenging tasks in the post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance is highly dependent upon the size and quality of available data.In this study, we demonstrated that machine learning classifiers trained on gene functional similarities, using Gene Ontology (GO), can improve the identification of genes involved in complex diseases. For this purpose, we developed a supervised machine learning methodology to predict complex disease genes. The proposed pipeline was assessed using Autism Spectrum Disorder (ASD) candidate genes. A quantitative measure of gene functional similarities was obtained by employing different semantic similarity measures. To infer the hidden functional similarities between ASD genes, various types of machine learning classifiers were built on quantitative semantic similarity matrices of ASD and non-ASD genes. The classifiers trained and tested on ASD and non-ASD gene functional similarities outperformed previously reported ASD classifiers. For example, a Random Forest (RF) classifier achieved an AUC of 0. 80 for predicting new ASD genes, which was higher than the reported classifier (0.73). Additionally, this classifier was able to predict 73 novel ASD candidate genes that were were enriched for core ASD phenotypes, such as autism and obsessive-compulsive behavior. In addition, predicted genes were also enriched for ASD co-occurring conditions, including Attention Deficit Hyperactivity Disorder (ADHD).We also developed a KNIME workflow with the proposed methodology which allows users to configure and execute it without requiring machine learning and programming skills. Machine learning is an effective and reliable technique to decipher ASD mechanism by identifying novel disease genes, but this study further demonstrated that their performance can be improved by incorporating a quantitative measure of gene functional similarities. Source code and the workflow of the proposed methodology are available at https://github.com/Muh-Asif/ASD-genes-prediction.

Download Full-text

Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language Pair

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3472618 ◽

2022 ◽

Vol 21 (2) ◽

pp. 1-16

Author(s):

Ghazeefa Fatima ◽

Rao Muhammad Adeel Nawab ◽

Muhammad Salman Khan ◽

Ali Saeed

Keyword(s):

Semantic Similarity ◽

Quantitative Measure ◽

Automated Identification ◽

Digital Text ◽

Word Similarity ◽

Cross Lingual ◽

Corpus Generation ◽

Similar Evaluation ◽

Baseline Approach ◽

Language Pair

Semantic word similarity is a quantitative measure of how much two words are contextually similar. Evaluation of semantic word similarity models requires a benchmark corpus. However, despite the millions of speakers and the large digital text of the Urdu language on the Internet, there is a lack of benchmark corpus for the Cross-lingual Semantic Word Similarity task for the Urdu language. This article reports our efforts in developing such a corpus. The newly developed corpus is based on the SemEval-2017 task 2 English dataset, and it contains 1,945 cross-lingual English–Urdu word pairs. For each of these pairs of words, semantic similarity scores were assigned by 11 native Urdu speakers. In addition to corpus generation, this article also reports the evaluation results of a baseline approach, namely “Translation Plus Monolingual Analysis” for automated identification of semantic similarity between English–Urdu word pairs. The results showed that the path length similarity measure performs better for the Google and Bing translated words. The newly created corpus and evaluation results are freely available online for further research and development.

Download Full-text

English Words Connected Via Hebrew Morphology: L1-L2 Bidirectional Effects on Semantic Similarity

PsycEXTRA Dataset ◽

10.1037/e527342012-808 ◽

2007 ◽

Author(s):

Tamar Degani ◽

Anat Prior ◽

Natasha Tokowicz

Keyword(s):

Semantic Similarity ◽

Bidirectional Effects

Download Full-text

Effects of semantic similarity and associative strength on the transfer and generalization of probability learning

PsycEXTRA Dataset ◽

10.1037/e666672011-271 ◽

1969 ◽

Author(s):

Lowell Schipper ◽

Bruce L. Hanson ◽

Leonard M. Davidson

Keyword(s):

Semantic Similarity ◽

Associative Strength ◽

Probability Learning

Download Full-text

Procedures for the Quantitative Determination of Autoprothrombin C

Thrombosis and Haemostasis ◽

10.1055/s-0038-1655441 ◽

1962 ◽

Vol 08 (03) ◽

pp. 434-441 ◽

Cited By ~ 3

Author(s):

Edmond R Cole ◽

Ewa Marciniak ◽

Walter H Seegers

Keyword(s):

Quantitative Determination ◽

Plasma Sample ◽

Quantitative Measure ◽

The Other ◽

Clotting Time ◽

Bovine Plasma ◽

Autoprothrombin C

SummaryTwo quantitative procedures for autoprothrombin C are described. In one of these purified prothrombin is used as a substrate, and the activity of autoprothrombin C can be measured even if thrombin is in the preparation. In this procedure a reaction mixture is used wherein the thrombin titer which develops in 20 minutes is proportional to the autoprothrombin C in the reaction mixture. A unit is defined as the amount which will generate 70 units of thrombin in the standardized reaction mixture. In the other method thrombin interferes with the result, because a standard bovine plasma sample is recalcified and the clotting time is noted. Autoprothrombin C shortens the clotting time, and the extent of this is a quantitative measure of autoprothrombin C activity.

Download Full-text

Document- and Keyword-based Author Co-citation Analysis

Data and Information Management ◽

10.2478/dim-2018-0009 ◽

2018 ◽

Vol 2 (2) ◽

pp. 70-82 ◽

Cited By ~ 2

Author(s):

Binglu Wang ◽

Yi Bu ◽

Win-bin Huang

Keyword(s):

Citation Analysis ◽

Semantic Similarity ◽

Method Validation ◽

New Method ◽

Global Network ◽

Network Visualization ◽

Knowledge Domain ◽

Knowledge Domains ◽

Domain Mapping ◽

The Relationship

AbstractIn the field of scientometrics, the principal purpose for author co-citation analysis (ACA) is to map knowledge domains by quantifying the relationship between co-cited author pairs. However, traditional ACA has been criticized since its input is insufficiently informative by simply counting authors’ co-citation frequencies. To address this issue, this paper introduces a new method that reconstructs the raw co-citation matrices by regarding document unit counts and keywords of references, named as Document- and Keyword-Based Author Co-Citation Analysis (DKACA). Based on the traditional ACA, DKACA counted co-citation pairs by document units instead of authors from the global network perspective. Moreover, by incorporating the information of keywords from cited papers, DKACA captured their semantic similarity between co-cited papers. In the method validation part, we implemented network visualization and MDS measurement to evaluate the effectiveness of DKACA. Results suggest that the proposed DKACA method not only reveals more insights that are previously unknown but also improves the performance and accuracy of knowledge domain mapping, representing a new basis for further studies.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text

RTM-DCU: Referential Translation Machines for Semantic Similarity

10.3115/v1/s14-2085 ◽

2014 ◽

Cited By ~ 1

Author(s):

Ergun Bicici ◽

Andy Way

Keyword(s):

Semantic Similarity

Download Full-text

NTNU: Measuring Semantic Similarity with Sublexical Feature Representations and Soft Cardinality

10.3115/v1/s14-2078 ◽

2014 ◽

Cited By ~ 1

Author(s):

André Lynum ◽

Partha Pakray ◽

Björn Gambäck ◽

Sergio Jimenez

Keyword(s):

Semantic Similarity ◽

Feature Representations

Download Full-text

The Opposition of Surprisal and Semantic Similarity in the Prediction of Language Processing: Evidence from Eye-tracking Data

10.31234/osf.io/zypk9 ◽

2020 ◽

Author(s):

Kun Sun

Keyword(s):

Eye Tracking ◽

Semantic Similarity ◽

Cognitive Processing ◽

Language Processing ◽

Language Comprehension ◽

Word Processing ◽

Reading Time ◽

Computational Models ◽

Tracking Data ◽

Dynamic Approach

Expectations or predictions about upcoming content play an important role during language comprehension and processing. One important aspect of recent studies of language comprehension and processing concerns the estimation of the upcoming words in a sentence or discourse. Many studies have used eye-tracking data to explore computational and cognitive models for contextual word predictions and word processing. Eye-tracking data has previously been widely explored with a view to investigating the factors that influence word prediction. However, these studies are problematic on several levels, including the stimuli, corpora, statistical tools they applied. Although various computational models have been proposed for simulating contextual word predictions, past studies usually preferred to use a single computational model. The disadvantage of this is that it often cannot give an adequate account of cognitive processing in language comprehension. To avoid these problems, this study draws upon a massive natural and coherent discourse as stimuli in collecting the data on reading time. This study trains two state-of-art computational models (surprisal and semantic (dis)similarity from word vectors by linear discriminative learning (LDL)), measuring knowledge of both the syntagmatic and paradigmatic structure of language. We develop a `dynamic approach' to compute semantic (dis)similarity. It is the first time that these two computational models have been merged. Models are evaluated using advanced statistical methods. Meanwhile, in order to test the efficiency of our approach, one recently developed cosine method of computing semantic (dis)similarity based on word vectors data adopted is used to compare with our `dynamic' approach. The two computational and fixed-effect statistical models can be used to cross-verify the findings, thus ensuring that the result is reliable. All results support that surprisal and semantic similarity are opposed in the prediction of the reading time of words although both can make good predictions. Additionally, our `dynamic' approach performs better than the popular cosine method. The findings of this study are therefore of significance with regard to acquiring a better understanding how humans process words in a real-world context and how they make predictions in language cognition and processing.

Download Full-text