Efficient Visualisation of the Relative Distribution of Keyword Search Results in a Corpus Data Cube

Author(s):  
Mark Sifer ◽  
Yutaka Watanobe ◽  
Subhash Bhalla
2014 ◽  
Vol 136 (11) ◽  
Author(s):  
Michael W. Glier ◽  
Daniel A. McAdams ◽  
Julie S. Linsey

Bioinspired design is the adaptation of methods, strategies, or principles found in nature to solve engineering problems. One formalized approach to bioinspired solution seeking is the abstraction of the engineering problem into a functional need and then seeking solutions to this function using a keyword type search method on text based biological knowledge. These function keyword search approaches have shown potential for success, but as with many text based search methods, they produce a large number of results, many of little relevance to the problem in question. In this paper, we develop a method to train a computer to identify text passages more likely to suggest a solution to a human designer. The work presented examines the possibility of filtering biological keyword search results by using text mining algorithms to automatically identify which results are likely to be useful to a designer. The text mining algorithms are trained on a pair of surveys administered to human subjects to empirically identify a large number of sentences that are, or are not, helpful for idea generation. We develop and evaluate three text classification algorithms, namely, a Naïve Bayes (NB) classifier, a k nearest neighbors (kNN) classifier, and a support vector machine (SVM) classifier. Of these methods, the NB classifier generally had the best performance. Based on the analysis of 60 word stems, a NB classifier's precision is 0.87, recall is 0.52, and F score is 0.65. We find that word stem features that describe a physical action or process are correlated with helpful sentences. Similarly, we find biological jargon feature words are correlated with unhelpful sentences.


Author(s):  
Weidong Yang ◽  
Hao Zhu

In this chapter, firstly, the LCA-based approaches for XML keyword search are analyzed and compared with each other. Several fundamental flaws of LCA-based models are explored, of which, the most important one is that the search results are eternally determined nonadjustable. Then, the chapter presents a system of adaptive keyword search in XML, called AdaptiveXKS, which employs a novel and flexible result model for avoiding these defects. Within the new model, a scoring function is presented to judge the quality of each result, and the considered metrics of evaluating results are weighted and can be updated as needed. Through the interface, the system administrator or the users can adjust some parameters according to their search intentions. One of three searching algorithms could also be chosen freely in order to catch specific querying requirements. Section 1 describes the Introduction and motivation. Section 2 defines the result model. In section 3 the scoring function is discussed deeply. Section 4 presents the system implementation and gives the detailed keyword search algorithms. Section 5 presents the experiments. Section 6 is the related work. Section 7 is the conclusion of this chapter.


2016 ◽  
Vol 34 (4) ◽  
pp. 705-732 ◽  
Author(s):  
Young Man Ko ◽  
Min Sun Song ◽  
Seung Jun Lee

Purpose The purpose of this paper is to construct a structural definition-based terminology ontology system that defines the meanings of academic terms on the basis of properties and links terms with properties that are structured by conceptual categories (classes). This study also aims to test the possibility of semantic searches by generating inference rules and setting very complicated search scenarios. Design/methodology/approach For the study, 55,236 keywords from the articles of the “Korea Citation Index” were structurally defined and relationships among terms and properties were built. Then, the authors converted the RDB data into RDF and designed ontologies using the ontology developing tool Protégé. The authors also tested the designed ontology with the inference engine of the Protégé editor. The generated reference rules were tested by TBox and SPARQL queries. Findings The authors generated inference control rules targeting high-input-ratio data in the properties of classes by calculating the input ratio of real input data in the system, and then the authors executed a semantic search by SPARQL query by setting very complicated search scenarios, for which it would be difficult to deduce results via a simple keyword search. As a result, it was confirmed that the search results show the logical combination of semantically related term data. Practical implications The proposed terminology ontology system was constructed with the author keywords from research papers, it will be useful in searching the research papers which include the keywords as search results by the complex combination of semantic relation. And the Structural Terminology Net database could be utilized as an index database in retrieval services and the mining of informal big data through the application of well-defined semantic concepts to each term. Originality/value This paper presented a methodology for supporting IR using expanded queries based on a novel model of structural terminology-based ontology. The user who wants to access the specific topic can create query that brings the semantically relevant information. The search results show the logical combination of semantically related term data, which would be difficult to deduce results via traditional IR systems.


Author(s):  
Ji Ke ◽  
J. S. Wallace ◽  
L. H. Shu

Biology is a good source of analogies for engineering design. One approach of retrieving biological analogies is to perform keyword searches on natural-language sources such as books, journals, etc. A challenge of retrieving information from natural-language sources is the potential requirement to process a large number of search results. This paper describes a categorization method that organizes a large group of diverse biological information into meaningful categories. The benefits of the categorization functionality are demonstrated through a case study on the redesign of a fuel cell bipolar plate. In this case study, our categorization method reduced the effort to systematically identify biological phenomena by up to ∼80%.


2021 ◽  
Vol 17 (2) ◽  
pp. 1-10
Author(s):  
Hussein Mohammed ◽  
Ayad Abdulsada

Searchable encryption (SE) is an interesting tool that enables clients to outsource their encrypted data into external cloud servers with unlimited storage and computing power and gives them the ability to search their data without decryption. The current solutions of SE support single-keyword search making them impractical in real-world scenarios. In this paper, we design and implement a multi-keyword similarity search scheme over encrypted data by using locality-sensitive hashing functions and Bloom filter. The proposed scheme can recover common spelling mistakes and enjoys enhanced security properties such as hiding the access and search patterns but with costly latency. To support similarity search, we utilize an efficient bi-gram-based method for keyword transformation. Such a method improves the search results accuracy. Our scheme employs two non-colluding servers to break the correlation between search queries and search results. Experiments using real-world data illustrate that our scheme is practically efficient, secure, and retains high accuracy.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Young Man Ko ◽  
Min Sun Song ◽  
Seung Jun Lee

Purpose This study aims to develop metadata of conceptual elements based on the text structure of research articles on Korean studies, to propose a search algorithm that reflects the combination of semantically relevant data in accordance with the search intention of research paper and to examine the algorithm whether there is a difference in the intention-based search results. Design/methodology/approach This study constructed a metadata database of 5,007 research articles on Korean studies arranged by conceptual elements of text structure and developed F1(w)-score weighted to conceptual elements based on the F1-score and the number of data points from each element. This study evaluated the algorithm by comparing search results of the F1(w)-score algorithm with those of the Term Frequency- Inverse Document Frequency (TF-IDF) algorithm and simple keyword search. Findings The authors find that the higher the F1(w)-score, the closer the semantic relevance of search intention. Furthermore, F1(w)-score generated search results were more closely related to the search intention than those of TF-IDF and simple keyword search. Research limitations/implications Even though the F1(w)-score was developed in this study to evaluate the search results of metadata database structured by conceptual elements of text structure of Korean studies, the algorithm can be used as a tool for searching the database which is a tuning process of weighting required. Practical implications A metadata database based on text structure and a search method based on weights of metadata elements – F1(w)-score – can be useful for interdisciplinary studies, especially for semantic search in regional studies. Originality/value This paper presents a methodology for supporting IR using F1(w)-score—a novel model for weighting metadata elements based on text structure. The F1(w)-score-based search results show the combination of semantically relevant data, which are otherwise difficult to search for using similarity of search words.


Sign in / Sign up

Export Citation Format

Share Document