semantic analysis
Recently Published Documents


TOTAL DOCUMENTS

4035
(FIVE YEARS 1558)

H-INDEX

59
(FIVE YEARS 8)

2022 ◽  
Vol 9 (3) ◽  
pp. 0-0

This paper presents the work done on recommendations of healthcare related journal papers by understanding the semantics of terms from the papers referred by users in past. In other words, user profiles based on user interest within the healthcare domain are constructed from the kind of journal papers read by the users. Multiple user profiles are constructed for each user based on different categories of papers read by the users. The proposed approach goes to the granular level of extrinsic and intrinsic relationship between terms and clusters highly semantically related relevant domain terms where each cluster represents a user interest area. The semantic analysis of terms is done starting from co-occurrence analysis to extract the intra-couplings between terms and then the inter-couplings are extracted from the intra-couplings and then finally clusters of highly related terms are formed. The experiments showed improved precision for the proposed approach as compared to the state-of-the-art technique with a mean reciprocal rank of 0.76.


Author(s):  
Pooja Kherwa ◽  
Poonam Bansal

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.


Author(s):  
Sujatha Arun Kokatnoor ◽  
Balachandran Krishnan

<p>The main focus of this research is to find the reasons behind the fresh cases of COVID-19 from the public’s perception for data specific to India. The analysis is done using machine learning approaches and validating the inferences with medical professionals. The data processing and analysis is accomplished in three steps. First, the dimensionality of the vector space model (VSM) is reduced with improvised feature engineering (FE) process by using a weighted term frequency-inverse document frequency (TF-IDF) and forward scan trigrams (FST) followed by removal of weak features using feature hashing technique. In the second step, an enhanced K-means clustering algorithm is used for grouping, based on the public posts from Twitter®. In the last step, latent dirichlet allocation (LDA) is applied for discovering the trigram topics relevant to the reasons behind the increase of fresh COVID-19 cases. The enhanced K-means clustering improved Dunn index value by 18.11% when compared with the traditional K-means method. By incorporating improvised two-step FE process, LDA model improved by 14% in terms of coherence score and by 19% and 15% when compared with latent semantic analysis (LSA) and hierarchical dirichlet process (HDP) respectively thereby resulting in 14 root causes for spike in the disease.</p>


The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Michael Kend ◽  
Lan Anh Nguyen

Purpose The purpose of this study is to explore audit procedure disclosures related to key audit risks, during the prior year and the initial year of the COVID-19 outbreak, by reporting on matters published in over 3,000 Australian statutory audit reports during 2019 and 2020. Design/methodology/approach This study partially uses latent semantic analysis methods to apply textual and readability analyses to external audit reports in Australia. The authors measure the tone of the audit reports using the Loughran and McDonald (2011) approach. Findings The authors find that 3% of audit procedures undertaken during 2020 were designed to address audit risks associated with the COVID-19 pandemic. As a percentage of total audit procedures undertaken during 2020, the authors find that smaller practitioners reported much less audit procedures related to COVID-19 audit risks than most larger audit firms. Finally, the textual analysis further found differences in the sentiment or tone of words used by different auditors in 2020, but differences in sentiment or tone were not found when 2020 was compared to the prior year 2019. Originality/value This study provides early evidence on whether auditors designed audit procedures to deal specifically with audit risks that arose due to the COVID-19 pandemic and on the extent and nature of those audit procedures. The study will help policymakers to better understand whether Key Audit Matters provided informational value to investors during a time of global crisis.


Author(s):  
José Ángel Martínez-Huertas ◽  
Ricardo Olmos ◽  
Guillermo Jorge-Botana ◽  
José A. León

AbstractIn this paper, we highlight the importance of distilling the computational assessments of constructed responses to validate the indicators/proxies of constructs/trins using an empirical illustration in automated summary evaluation. We present the validation of the Inbuilt Rubric (IR) method that maps rubrics into vector spaces for concepts’ assessment. Specifically, we improved and validated its scores’ performance using latent variables, a common approach in psychometrics. We also validated a new hierarchical vector space, namely a bifactor IR. 205 Spanish undergraduate students produced 615 summaries of three different texts that were evaluated by human raters and different versions of the IR method using latent semantic analysis (LSA). The computational scores were validated using multiple linear regressions and different latent variable models like CFAs or SEMs. Convergent and discriminant validity was found for the IR scores using human rater scores as validity criteria. While this study was conducted in the Spanish language, the proposed scheme is language-independent and applicable to any language. We highlight four main conclusions: (1) Accurate performance can be observed in topic-detection tasks without hundreds/thousands of pre-scored samples required in supervised models. (2) Convergent/discriminant validity can be improved using measurement models for computational scores as they adjust for measurement errors. (3) Nouns embedded in fragments of instructional text can be an affordable alternative to use the IR method. (4) Hierarchical models, like the bifactor IR, can increase the validity of computational assessments evaluating general and specific knowledge in vector space models. R code is provided to apply the classic and bifactor IR method.


Author(s):  
Алексей Николаевич Копайгородский ◽  
Елена Павловна Хайруллина

В статье рассмотрены подходы к проектированию и реализации отдельных компонентов инструментальных средств для семантического анализа извлекаемой из открытых источников информации о научных и технологических решениях в области энергетики. Рассмотрена структура билингвистической онтологии, позволяющая решать задачу классификации информации с учётом ее представления в различных языках и синонимии. Рассмотрен подход к поиску и обработке информации из открытых источников, основанный на применении разработанных авторами средств семантического анализа, реализация которых выполнялась на Python с использованием библиотеки Natural Language Toolkit. The article discusses approaches to the design and implementation of individual components of instrumental tools for semantic analysis of information on scientific and technological solutions in the field of energy. This information has already been placed open sources. The structure of billinguistic ontology is considered, which makes it possible to solve the task of classifying information, taking into account its submission in various languages and synonyms. The authors reviewed the approach to the search and processing of information from open sources based on the use of semantic analysis developed by authors, the implementation of which was performed on Python using the Natural Language Toolkit library


Author(s):  
Jiatong Meng ◽  
Yucheng Chen

The traditional quasi-social relationship type prediction model obtains prediction results by analyzing and clustering the direct data. The prediction results are easily disturbed by noisy data, and the problems of low processing efficiency and accuracy of the traditional prediction model gradually appear as the amount of user data increases. To address the above problems, the research constructs a prediction model of user quasi-social relationship type based on social media text big data. After pre-processing the collected social media text big data, the interference data that affect the accuracy of non-model prediction are removed. The interaction information in the text data is mined based on the principle of similarity calculation, and semantic analysis and sentiment annotation are performed on the information content. On the basis of BP neural network, we construct a prediction model of user’s quasi-social relationship type. The performance test data of the model shows that the average prediction accuracy of the constructed model is 89.84%, and the model has low time complexity and higher processing efficiency, which is better than other traditional models.


2022 ◽  
Vol 23 (4) ◽  
pp. 1041-1050
Author(s):  
N. A. Kurakina ◽  
I. S. Achinovich

Phono-stylistics is a promising research area. Expressive power of a text depends on its phonetic imagery. The research objective was to identify the pragmatic features of phonic expressive means in translations of contemporary English poetry. The methods included a comparative analysis, phono-semantic and phono-stylistic interpretation of the original poems and their translations, and O. N. Tynyanov's law of versification. The method of sound counting developed by E. V. Elkina and L. S. Yudina was used to calculate the frequency of sounds in the context of phono-semantic analysis in the Russian translations. The method of sound counting designed by Tsoi Vi Chuen Thomas was used to calculate the frequency of sounds in the original English texts. The theoretical foundation of the research was formed by the works by M. A. Balash, G. V. Vekshin, Z. S. Dotmurzieva, V. N. Elkina, A. P. Zhuravlev, L. V. Laenko, F. Miko, L. P. Prokofyeva, E. A. Titov, etc. The study featured the phonics and pragmatics of S. Dugdale’s poem Zaitz and its three translations made by E. Tretyakova, A. Shchetinina, and M. Vinogradova, and C. E. Duffy’s Anne Hathaway translated by Yu. Fokina. The author compared the pragmatics of sound imagery in the English originals and their Russian translations. The research made it possible to define the role of sound imagery in the poetic discourse, as well as the relationship between the sound organization of poetic speech and the pragmatic value at the phonographic level. The results can be used in courses of translation, stylistics, and phonetics.


BMC Genomics ◽  
2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Dhwani Dholakia ◽  
Ankit Kalra ◽  
Bishnu Raman Misir ◽  
Uma Kanga ◽  
Mitali Mukerji

AbstractExtreme complexity in the Human Leukocyte Antigens (HLA) system and its nomenclature makes it difficult to interpret and integrate relevant information for HLA associations with diseases, Adverse Drug Reactions (ADR) and Transplantation. PubMed search displays ~ 146,000 studies on HLA reported from diverse locations. Currently, IPD-IMGT/HLA (Robinson et al., Nucleic Acids Research 48:D948–D955, 2019) database houses data on 28,320 HLA alleles. We developed an automated pipeline with a unified graphical user interface HLA-SPREAD that provides a structured information on SNPs, Populations, REsources, ADRs and Diseases information. Information on HLA was extracted from ~ 28 million PubMed abstracts extracted using Natural Language Processing (NLP). Python scripts were used to mine and curate information on diseases, filter false positives and categorize to 24 tree hierarchical groups and named Entity Recognition (NER) algorithms followed by semantic analysis to infer HLA association(s). This resource from 109 countries and 40 ethnic groups provides interesting insights on: markers associated with allelic/haplotypic association in autoimmune, cancer, viral and skin diseases, transplantation outcome and ADRs for hypersensitivity. Summary information on clinically relevant biomarkers related to HLA disease associations with mapped susceptible/risk alleles are readily retrievable from HLASPREAD. The resource is available at URL http://hla-spread.igib.res.in/. This resource is first of its kind that can help uncover novel patterns in HLA gene-disease associations.


Sign in / Sign up

Export Citation Format

Share Document