scholarly journals Word pair Dataset for Semantic Similarity and Relatedness in Korean Medical Vocabulary: Reference Development and Validation (Preprint)

10.2196/29667 ◽  
2021 ◽  
Author(s):  
Yunjin Yum ◽  
Jeong Moon Lee ◽  
Moon Joung Jang ◽  
Yoojoong Kim ◽  
Jong-Ho Kim ◽  
...  
2021 ◽  
Author(s):  
Yunjin Yum ◽  
Jeong Moon Lee ◽  
Moon Joung Jang ◽  
Yoojoong Kim ◽  
Jong-Ho Kim ◽  
...  

BACKGROUND The fact that medical terms require special expertise and are becoming increasingly complex makes it difficult to employ natural language processing techniques in medical informatics. Several human-validated reference standards for medical terms have been developed to evaluate word embedding models using the semantic similarity and relatedness of medical word pairs. However, there are very few reference standards in non-English languages. In addition, because the existing reference standards were developed a long time ago, there is a need to develop an updated standard to represent recent findings in medical sciences. OBJECTIVE We propose a new Korean word pair reference set to verify embedding models. METHODS From January 2010 to December 2020, 518 medical textbooks, 72,844 health information news, and 15,698 medical research articles were collected, and the top 10,000 medical terms were selected to develop medical word pairs. Sixteen attending physicians participated in the verification of the developed set with 607 word pairs. RESULTS The proportion of word pairs answered by all participants was 90.8% (551/607) for the similarity task and 86.5% (525/605) for the relatedness task. The similarity and relatedness of the word pair showed a high correlation (ρ=0.70, P<.001). The intraclass correlation coefficients to assess the inter-rater agreements of the word pair sets were 0.47 on the similarity task and 0.53 on the relatedness task. The final reference standard was 604 word pairs for the similarity task and 599 word pairs for relatedness, excluding word pairs with answers corresponding to outliers and word pairs that were answered by less than 50% of all the respondents. When FastText models were applied to the final reference standard word pair sets, the embedding models learning medical documents had a higher correlation between the calculated cosine similarity scores compared to human-judged similarity and relatedness scores (ρ=0.12, namu vs. ρ=0.47, with medical text for the similarity task and ρ=0.02, with namu vs. ρ=0.30, with medical text for the relatedness task). CONCLUSIONS Korean medical word pair reference standard sets for semantic similarity and relatedness were developed based on medical documents from the past 10 years. It is expected that our word pair reference sets will be actively utilized in the development of medical and multilingual natural language processing technology in the future.


The concept of relevancy is a most blazing topic in information regaining process. In the last few years there is a drastically increase the digital data so there is a need to increase the accuracy of information regaining process .Semantic Similarity measure the similarity between word-pair by using WordNet as ontology.We have analyzed the different category of semantic similarity algorithm to compute semantic closeness between word-pair and evaluate its value by using WordNet.We have compared various algorithms on Miller- Charles data set of 30 word-pair is used to rank them category wise.


2007 ◽  
Vol 177 (4S) ◽  
pp. 7-7
Author(s):  
Brent K. Hollenbeck ◽  
J. Stuart Wolf ◽  
Rodney L. Dunn ◽  
Martin G. Sanda ◽  
David P. Wood ◽  
...  

2018 ◽  
Vol 34 (3) ◽  
pp. 193-205 ◽  
Author(s):  
Julia Steinbach ◽  
Heidrun Stoeger

Abstract. We describe the development and validation of an instrument for measuring the affective component of primary school teachers’ attitudes towards self-regulated learning. The questionnaire assesses the affective component towards those cognitive and metacognitive strategies that are especially effective in primary school. In a first study (n = 230), the factor structure was verified via an exploratory factor analysis. A confirmatory factor analysis with data from a second study (n = 400) indicated that the theoretical factor structure is appropriate. A comparison with four alternative models identified the theoretically derived factor structure as the most appropriate. Concurrent validity was demonstrated by correlations with a scale that measures the degree to which teachers create learning environments that enable students to self-regulate their learning. Retrospective validity was demonstrated by correlations with a scale that measures teachers’ experiences with self-regulated learning. In a third study (n = 47), the scale’s concurrent validity was tested with scales measuring teachers’ evaluation of the desirability of different aspects of self-regulated learning in class. Additionally, predictive validity was demonstrated via a binary logistic regression, with teachers attitudes as predictor on their registration for a workshop on self-regulated learning and their willingness to implement a seven-week training program on self-regulated learning.


2020 ◽  
Vol 36 (5) ◽  
pp. 852-863 ◽  
Author(s):  
George Gunnesch-Luca ◽  
Klaus Moser

Abstract. The current paper presents the development and validation of a unit-level Organizational Citizenship Behavior (OCB) scale based on the Referent-Shift Consensus Model (RSCM). In Study 1, with 124 individuals measured twice, both an Exploratory Factor Analysis (EFA) and a Confirmatory Factor Analysis (CFA) established and confirmed a five-factor solution (helping behavior, sportsmanship, loyalty, civic virtue, and conscientiousness). Test–retest reliabilities at a 2-month interval were high (between .59 and .79 for the subscales, .83 for the total scale). In Study 2, unit-level OCB was analyzed in a sample of 129 work teams. Both Interrater Reliability (IRR) measures and Interrater Agreement (IRA) values provided support for RSCM requirements. Finally, unit-level OCB was associated with group task interdependence and was more predictable (by job satisfaction and integrity of the supervisor) than individual-level OCB in previous research.


2018 ◽  
Vol 17 (4) ◽  
pp. 193-203 ◽  
Author(s):  
Tanja Hentschel ◽  
Lisa Kristina Horvath ◽  
Claudia Peus ◽  
Sabine Sczesny

Abstract. Entrepreneurship programs often aim at increasing women’s lower entrepreneurial activities. We investigate how advertisements for entrepreneurship programs can be designed to increase women’s application intentions. Results of an experiment with 156 women showed that women indicate (1) lower self-ascribed fit to and interest in the program after viewing a male-typed image (compared to a gender-neutral or female-typed image) in the advertisement; and (2) lower self-ascribed fit to and interest in the program as well as lower application intentions if the German masculine linguistic form of the term “entrepreneur” (compared to the gender-fair word pair “female and male entrepreneur”) is used in the recruitment advertisement. Women’s reactions are most negative when both a male-typed image and the masculine linguistic form appear in the advertisement. Self-ascribed fit and program interest mediate the relationship of advertisement characteristics on application intentions.


2008 ◽  
Vol 93 (2) ◽  
pp. 250-267 ◽  
Author(s):  
Troy V. Mumford ◽  
Chad H. Van Iddekinge ◽  
Frederick P. Morgeson ◽  
Michael A. Campion

Sign in / Sign up

Export Citation Format

Share Document