SEMANTIC SEARCH OF SERVICES

2013 ◽  
Vol 07 (03) ◽  
pp. 257-290 ◽  
Author(s):  
KE HAO ◽  
PHILLIP C-Y SHEU ◽  
HIROSHI YAMAGUCHI

This paper addresses semantic search of Web services using natural language processing. First we survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a service search framework based on the vector space model to combine the traditional frequency weighted term-document matrix, the syntactical information extracted from a lexical database and a dependency grammar parser. In particular, instead of using terms as the rows in a term-document matrix, we propose using synsets from WordNet to distinguish different meanings of a word under different contexts as well as clustering different words with similar meanings. Also based on the characteristics of Web services descriptions, we propose an approach to identifying semantically important terms to adjust weightings. Our experiments show that our approach achieves its goal well.

2019 ◽  
Vol 11 (5) ◽  
pp. 114 ◽  
Author(s):  
Korawit Orkphol ◽  
Wu Yang

Words have different meanings (i.e., senses) depending on the context. Disambiguating the correct sense is important and a challenging task for natural language processing. An intuitive way is to select the highest similarity between the context and sense definitions provided by a large lexical database of English, WordNet. In this database, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms interlinked through conceptual semantics and lexicon relations. Traditional unsupervised approaches compute similarity by counting overlapping words between the context and sense definitions which must match exactly. Similarity should compute based on how words are related rather than overlapping by representing the context and sense definitions on a vector space model and analyzing distributional semantic relationships among them using latent semantic analysis (LSA). When a corpus of text becomes more massive, LSA consumes much more memory and is not flexible to train a huge corpus of text. A word-embedding approach has an advantage in this issue. Word2vec is a popular word-embedding approach that represents words on a fix-sized vector space model through either the skip-gram or continuous bag-of-words (CBOW) model. Word2vec is also effectively capturing semantic and syntactic word similarities from a huge corpus of text better than LSA. Our method used Word2vec to construct a context sentence vector, and sense definition vectors then give each word sense a score using cosine similarity to compute the similarity between those sentence vectors. The sense definition also expanded with sense relations retrieved from WordNet. If the score is not higher than a specific threshold, the score will be combined with the probability of that sense distribution learned from a large sense-tagged corpus, SEMCOR. The possible answer senses can be obtained from high scores. Our method shows that the result (50.9% or 48.7% without the probability of sense distribution) is higher than the baselines (i.e., original, simplified, adapted and LSA Lesk) and outperforms many unsupervised systems participating in the SENSEVAL-3 English lexical sample task.


2019 ◽  
Vol 3 (2) ◽  
pp. 257-264
Author(s):  
Bayu Sugara ◽  
Dody Dody ◽  
Donny Donny

Information is now very easy to get anywhere. Information technology, especially the internet, strongly supports the exchange of information very quickly. The internet has become an information and communication media that has been used by many people with many interests, especially in taking large-scale information data, Unfortunately the information presented is sometimes less relevant. Quality information is influenced by relevance, accuracy and on time. However, there are not many effective search systems available. This study discusses the implementation of an information retrieval system to find and find symptoms of autism disorders using the Vector Space Model (VSM) method. Vector Space Model (VSM) is a model used to measure the similarity between a document and a query. In this model, queries and documents are considered vectors in n dimensional space. Where n is the number of all terms listed. The purpose of this study was to design an information retrieval software to find and match the symptoms of autism disorders. By using Vector Space Model, it is hoped that it can provide a solution to the search engine to provide text matching information in the database using certain keywords, the results of the matching are presented in the form of ranks.


2020 ◽  
Author(s):  
Yuqi Kong ◽  
Fanchao Meng ◽  
Ben Carterette

Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges’ results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.


2009 ◽  
Vol 18 (02) ◽  
pp. 239-272 ◽  
Author(s):  
SUJEEVAN ASEERVATHAM

Kernels are widely used in Natural Language Processing as similarity measures within inner-product based learning methods like the Support Vector Machine. The Vector Space Model (VSM) is extensively used for the spatial representation of the documents. However, it is purely a statistical representation. In this paper, we present a Concept Vector Space Model (CVSM) representation which uses linguistic prior knowledge to capture the meanings of the documents. We also propose a linear kernel and a latent kernel for this space. The linear kernel takes advantage of the linguistic concepts whereas the latent kernel combines statistical and linguistic concepts. Indeed, the latter kernel uses latent concepts extracted by the Latent Semantic Analysis (LSA) in the CVSM. The kernels were evaluated on a text categorization task in the biomedical domain. The Ohsumed corpus, well known for being difficult to categorize, was used. The results have shown that the CVSM improves performance compared to the VSM.


2018 ◽  
Vol 17 (2) ◽  
pp. 313-324 ◽  
Author(s):  
Abdul Majid ◽  
Mukhtaj Khan ◽  
Nadeem Iqbal ◽  
Mian Ahmad Jan ◽  
Mushtaq Khan ◽  
...  

Author(s):  
Anthony Anggrawan ◽  
Azhari

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.


2018 ◽  
Vol 9 (2) ◽  
pp. 97-105
Author(s):  
Richard Firdaus Oeyliawan ◽  
Dennis Gunawan

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model


Sign in / Sign up

Export Citation Format

Share Document