scholarly journals Establishing Semantic Similarity of the Cluster Documents and Extracting Key Entities in the Problem of the Semantic Analysis of News Texts

2015 ◽  
Vol 9 (5) ◽  
Author(s):  
Anastasia Nikolaevna Soloshenko ◽  
Yulia Aleksandrovna Orlova ◽  
Vladimir Leonidovich Rozaliev ◽  
Alla Viktorovna Zaboleeva-Zotova
2014 ◽  
Vol 12 (01) ◽  
pp. 1450004 ◽  
Author(s):  
SLAVKA JAROMERSKA ◽  
PETR PRAUS ◽  
YOUNG-RAE CHO

Reconstruction of signaling pathways is crucial for understanding cellular mechanisms. A pathway is represented as a path of a signaling cascade involving a series of proteins to perform a particular function. Since a protein pair involved in signaling and response have a strong interaction, putative pathways can be detected from protein–protein interaction (PPI) networks. However, predicting directed pathways from the undirected genome-wide PPI networks has been challenging. We present a novel computational algorithm to efficiently predict signaling pathways from PPI networks given a starting protein and an ending protein. Our approach integrates topological analysis of PPI networks and semantic analysis of PPIs using Gene Ontology data. An advanced semantic similarity measure is used for weighting each interacting protein pair. Our distance-wise algorithm iteratively selects an adjacent protein from a PPI network to build a pathway based on a distance condition. On each iteration, the strength of a hypothetical path passing through a candidate edge is estimated by a local heuristic. We evaluate the performance by comparing the resultant paths to known signaling pathways on yeast. The results show that our approach has higher accuracy and efficiency than previous methods.


2017 ◽  
Vol 7 (1) ◽  
pp. 32-48 ◽  
Author(s):  
Samar Fathy ◽  
Nahla El-Haggar ◽  
Mohamed H. Haggag

Emotions can be judged by a combination of cues such as speech facial expressions and actions. Emotions are also articulated by text. This paper shows a new hybrid model for detecting emotion from text which depends on ontology with keywords semantic similarity. The text labelled with one of the six basic Ekman emotion categories. The main idea is to extract ontology from input sentences and match it with the ontology base which created from simple ontologies and the emotion of each ontology. The ontology extracted from the input sentence by using a triplet (subject, predicate, and object) extraction algorithm, then the ontology matching process is applied with the ontology base. After that the emotion of the input sentence is the emotion of the ontology which it matches with the highest score of matching. If the extracted ontology doesn't match with any ontology from the ontology base, then the keyword semantic similarity approach used. The suggested approach depends on the meaning of each sentence, the syntax and semantic analysis of the context.


2018 ◽  
Vol 14 (1) ◽  
pp. 65-97 ◽  
Author(s):  
Florent Perek

AbstractThis paper presents a corpus-based study of recent change in the Englishway-construction, drawing on data from the 1830s to the 2000s. Semantic change in the distribution of the construction is characterized by means of a distributional semantic model, which captures semantic similarity between verbs through their co-occurrence frequency with other words in the corpus. By plotting and comparing the semantic domain of the three senses of the construction at different points in time, it is found that they all have gained in semantic diversity. These findings are interpreted in terms of increases in schematicity, either of the verb slot or the motion component contributed by the construction.


2019 ◽  
Author(s):  
Rick Hass

Semantic search and retrieval of information plays an important role in creative idea generation. This study was designed to examine how semantic and temporal clustering varies when asking participants to generate ideas about uses for objects compared with generating members of goal-derived categories. Participants generated uses for three objects: brick, hammer, picture frame, and also generated members of the following goal-derived categories: things to take in case of a fire, things to sell at a garage sale, and ways to spend lottery winnings. Using response-time analysis and semantic analysis, results illustrated that all six prompts generally led to exponentialcumulative response-time distributions. However, the proportion of temporally clustered responses, defined using the slope-difference algorithm, was higher for goal-derived category responses compared with object uses. Despite that, overall pairwise semantic similarity was higher for object uses than for goal derived exemplars. The effect of prompt on pairwise semantic similarity is likely the result of context-dependency of exemplars from goal-derived categories. However, the current analysis contains a potential confound such that special instructions to give “common and uncommon” responses were provided only for the object-uses prompts. The confound is likely minimal, but future work is necessary to illustrate the robustness of the results.


2021 ◽  
Vol 5 (1) ◽  
pp. 45-56
Author(s):  
Poonam Chahal ◽  
Manjeet Singh

In today's era, with the availability of a huge amount of dynamic information available in world wide web (WWW), it is complex for the user to retrieve or search the relevant information. One of the techniques used in information retrieval is clustering, and then the ranking of the web documents is done to provide user the information as per their query. In this paper, semantic similarity score of Semantic Web documents is computed by using the semantic-based similarity feature combining the latent semantic analysis (LSA) and latent relational analysis (LRA). The LSA and LRA help to determine the relevant concepts and relationships between the concepts which further correspond to the words and relationships between these words. The extracted interrelated concepts are represented by the graph further representing the semantic content of the web document. From this graph representation for each document, the HCS algorithm of clustering is used to extract the most connected subgraph for constructing the different number of clusters which is according to the information-theoretic approach. The web documents present in clusters in graphical form are ranked by using the text-rank method in combination with the proposed method. The experimental analysis is done by using the benchmark datasets OpinRank. The performance of the approach on ranking of web documents using semantic-based clustering has shown promising results.


2015 ◽  
Vol 15 (1) ◽  
pp. 91-121 ◽  
Author(s):  
Joanne Vera Stolk

Semantic analysis of the prenominal first person singular genitive pronoun (μου) in the Greek of the documentary papyri shows that the pronoun is typically found in the position between a verbal form and an alienable possessum which functions as the patient of the predicate. When the event expressed by the predicate is patient-affecting, the possessor is indirectly also affected. Hence the semantic role of this affected alienable possessor might be interpreted as a benefactive or malefactive in genitive possession constructions. By semantic extension the meaning of the genitive case in this position is extended into goal-oriented roles, such as addressee and recipient, which are commonly denoted by the dative case in Ancient Greek. The semantic similarity of the genitive and dative cases in these constructions might have provided the basis for the merger of the cases in the Greek language.


2017 ◽  
Vol 01 (01) ◽  
pp. 1630006 ◽  
Author(s):  
Flora Amato ◽  
Vincenzo Moscato ◽  
Antonio Picariello ◽  
Giancarlo Sperlí ◽  
Antonio D’Acierno ◽  
...  

In this paper, we present a general framework for retrieving relevant information from news papers that exploits a novel summarization algorithm based on a deep semantic analysis of texts. In particular, we extract from each Web document a set of triples (subject, predicate, object) that are then used to build a summary through an unsupervised clustering algorithm exploiting the notion of semantic similarity. Finally, we leverage the centroids of clusters to determine the most significant summary sentences using some heuristics. Several experiments are carried out using the standard DUC methodology and ROUGE software and show how the proposed method outperforms several summarizer systems in terms of recall and readability.


2010 ◽  
Vol 43 (1) ◽  
pp. 193-200 ◽  
Author(s):  
Nicholas S. Holtzman ◽  
John Paul Schott ◽  
Michael N. Jones ◽  
David A. Balota ◽  
Tal Yarkoni

2018 ◽  
Author(s):  
Zhi-Hui Luo ◽  
Meng-Wei Shi ◽  
Zhuang Yang ◽  
Hong-Yu Zhang ◽  
Zhen-Xia Chen

ABSTRACTMotivationIncreasing disease causal genes have been identified through different methods, while there are still no uniform biomedical named entity (bio-NE) annotations of the disease phenotypes. Furthermore, semantic similarity comparison between two bio-NE annotations, like disease descriptions, has become important for data integration or system genetics analysis.MethodsThe package pyMeSHSim realizes bio-NEs recognition using MetaMap, which produces Unified Medical Language System (UMLS) concepts in natural language process. To map the UMLS concepts to MeSH, pyMeSHSim embedded a house made dataset containing the Medical Subject Headings (MeSH) main headings (MHs), supplementary concept records (SCRs) and relations between them. Based on the dataset, pyMeSHSim implemented four information content (IC) based algorithms and one graph-based algorithm to measure the semantic similarity between two MeSH terms.ResultsTo evaluate its performance, we used pyMeSHSim to parse OMIM and GWAS phenotypes. The inclusion of SCRs and the curation strategy of non-MeSH-synonymous UMLS concepts used by pyMeSHSim improved the performance of pyMeSHSim in the recognition of OMIM phenotypes. In the curation of GWAS phenotypes, pyMeSHSim and previous manual work recognized the same MeSH terms from 276/461 GWAS phenotypes, and the correlation between their semantic similarity calculated by pyMeSHSim and another semantic analysis tool meshes was as high as 0.53-0.97.ConclusionWith the embedded dataset including both MeSH MHs and SCRs, the integrative MeSH tool pyMeSHSim realized the disease recognition, normalization and comparison in biomedical text-mining.AvailabilityPackage’s source code and test datasets are available under the GPLv3 license at https://github.com/luozhhub/pyMeSHSim


Author(s):  
Khaoula Mrhar ◽  
Mounia Abik

Explicit Semantic Analysis (ESA) is an approach to measure the semantic relatedness between terms or documents based on similarities to documents of a references corpus usually Wikipedia. ESA usage has received tremendous attention in the field of natural language processing NLP and information retrieval. However, ESA utilizes a huge Wikipedia index matrix in its interpretation by multiplying a large matrix by a term vector to produce a high-dimensional vector. Consequently, the ESA process is too expensive in interpretation and similarity steps. Therefore, the efficiency of ESA will slow down because we lose a lot of time in unnecessary operations. This paper propose enhancements to ESA called optimize-ESA that reduce the dimension at the interpretation stage by computing the semantic similarity in a specific domain. The experimental results show clearly that our method correlates much better with human judgement than the full version ESA approach.


Sign in / Sign up

Export Citation Format

Share Document