query word
Recently Published Documents


TOTAL DOCUMENTS

17
(FIVE YEARS 8)

H-INDEX

4
(FIVE YEARS 1)

Author(s):  
Usha Yadav ◽  
Neelam Duhan

With the evolution of Web 3.0, the traditional algorithm of searching Web 2.0 would become obsolete and underperform in retrieving the precise and accurate information from the growing semantic web. It is very reasonable to presume that common users might not possess any understanding of the ontology used in the knowledge base or SPARQL query. Therefore, providing easy access of this enormous knowledge base to all level of users is challenging. The ability for all level of users to effortlessly formulate structure query such as SPARQL is very diverse. In this paper, semantic web based search methodology is proposed which converts user query in natural language into SPARQL query, which could be directed to domain ontology based knowledge base. Each query word is further mapped to the relevant concept or relations in ontology. Score is assigned to each mapping to find out the best possible mapping for the query generation. Mapping with highest score are taken into consideration along with interrogative or other function to finally formulate the user query into SPARQL query. If there is no search result retrieved from the knowledge base, then instead of returning null to the user, the query is further directed to the Web 3.0. The top “k” documents are considered to further converting them into RDF format using Text2Onto tool and the corpus of semantically structured web documents is build. Alongside, semantic crawl agent is used to get <Subject-Predicate-Object> set from the semantic wiki. The Term Frequency Matrix and Co-occurrence Matrix are applied on the corpus following by singular Value decomposition (SVD) to find the results relevant for the user query. The result evaluations proved that the proposed system is efficient in terms of execution time, precision, recall and f-measures.


Author(s):  
Hamza Ghilas ◽  
Meriem Gagaoua ◽  
Abdelkamel Tari ◽  
Mohamed Cheriet

This paper addresses the challenging task of word spotting in Arabic handwritten documents. We proposed a novel feature that we called Spatial Distribution of Ink at Keypoints (SDIK). The proposed feature captures the characteristics of Arabic handwriting concentrated at endpoints and branch points. SDIK feature quantizes the spatial repartition of ink pixels in the neighborhoods of keypoints. The resulting SDIK features are very fast to match, we take this advantage to match a query word with lines images rather than words images. By this matching mechanism, we overcome the hard task of segmenting an Arabic document into words. The method proposed in this study is tested on historical Arabic document with IBN SINA dataset and on modern handwriting with IFN/ENIT database. The obtained results are great of interest for retrieving query words in an Arabic document.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4648
Author(s):  
Subhranil Kundu ◽  
Samir Malakar ◽  
Zong Woo Geem ◽  
Yoon Young Moon ◽  
Pawan Kumar Singh ◽  
...  

Handwritten keyword spotting (KWS) is of great interest to the document image research community. In this work, we propose a learning-free keyword spotting method following query by example (QBE) setting for handwritten documents. It consists of four key processes: pre-processing, vertical zone division, feature extraction, and feature matching. The pre-processing step deals with the noise found in the word images, and the skewness of the handwritings caused by the varied writing styles of the individuals. Next, the vertical zone division splits the word image into several zones. The number of vertical zones is guided by the number of letters in the query word image. To obtain this information (i.e., number of letters in a query word image) during experimentation, we use the text encoding of the query word image. The user provides the information to the system. The feature extraction process involves the use of the Hough transform. The last step is feature matching, which first compares the features extracted from the word images and then generates a similarity score. The performance of this algorithm has been tested on three publicly available datasets: IAM, QUWI, and ICDAR KWS 2015. It is noticed that the proposed method outperforms state-of-the-art learning-free KWS methods considered here for comparison while evaluated on the present datasets. We also evaluate the performance of the present KWS model using state-of-the-art deep features and it is found that the features used in the present work perform better than the deep features extracted using InceptionV3, VGG19, and DenseNet121 models.


Author(s):  
Wienke Wannagat ◽  
Gesine Waizenegger ◽  
Gerhild Nieding

AbstractIn an experiment with 114 children aged 9–12 years, we compared the ability to establish local and global coherence of narrative texts between auditory and audiovisual (auditory text and pictures) presentation. The participants listened to a series of short narrative texts, in each of which a protagonist pursued a goal. Following each text, we collected the response time to a query word that was either associated with a near or a distant causal antecedent of the final sentence. Analysis of these response times indicated that audiovisual presentation has advantages over auditory presentation for accessing information relevant for establishing both local and global coherence, but there are indications that this effect may be slightly more pronounced for global coherence.


2020 ◽  
Vol 9 (1) ◽  
pp. 97
Author(s):  
Maula Khatami

Journals are articles about research that are very useful among academics and students alike. Every time we learn a new knowledge, we certainly need a guide that is verified and also credible. Students and academics were greatly helped by this journal. With journals help students and academics get references from previous research and get more insights so that they are able to make a related research and can even be improved from previous research. However, there are still many students and academics who find it difficult to find the right journal for their needs. So here the authors make a research system of information retrieval about journal searches by querying words using the vector space model method. In the suffix tree clustering method and the Vector Space Model, each document and keyword that has been carried out by the Text Mining process is then given the weight of each word contained in each existing document with the Term Frequency - Inverse Document Frequency (TF-IDF) weighting algorithm. 


Author(s):  
A. Kutuzov ◽  
◽  
V. Fomin ◽  
V. Mikhailov ◽  
J. Rodina ◽  
...  

We present the ShiftRy web service. It helps to analyze temporal changes in the usage of words in news texts from Russian mass media. For that, we employ diachronic word embedding models trained on large Russian news corpora from 2010 up to 2019. The users can explore the usage history of any given query word, or browse the lists of words ranked by the degree of their semantic drift in any couple of years. Visualizations of the words’ trajectories through time are provided. Importantly, users can obtain corpus examples with the query word before and after the semantic shift (if any). The aim of ShiftRy is to ease the task of studying word history on short-term time spans, and the influence of social and political events on word usage. The service will be updated with new data yearly.


Author(s):  
Milana Grbić

Retrieving information from large document databases is in the focus of scientific research in recent years. In this paper, a parallel algorithm for searching biomedical documents based on the MapReduce technique is presented. The algorithm consists of three phases: preprocessing phase, document representation phase, and searching phase. In the first phase, lemmatization and elimination of stop words are performed. In the second phase, each of the documents is represented as a list of pairs (word, tf-idf index of the word). The third phase represents the main searching procedure. It uses a specially designed ranking criterion, which is based on a combination of the term frequency - inverse document frequency (tf-idf) index and the indicator function for each query word. Four different versions of ranking criteria are proposed and analyzed. The algorithm performances are tested on different subsets of the large and well-known PubMed biomedical document database. The results obtained by the experiments indicate that the proposed parallel algorithm succeeds in finding high-quality results in a reasonable time. Comparing to the sequential variant of the algorithm, the experiments show that the parallel algorithm is more efficient since it finds high-quality solutions in significantly less time.


Author(s):  
Dai Dai ◽  
Xinyan Xiao ◽  
Yajuan Lyu ◽  
Shan Dou ◽  
Qiaoqiao She ◽  
...  

Joint entity and relation extraction is to detect entity and relation using a single model. In this paper, we present a novel unified joint extraction model which directly tags entity and relation labels according to a query word position p, i.e., detecting entity at p, and identifying entities at other positions that have relationship with the former. To this end, we first design a tagging scheme to generate n tag sequences for an n-word sentence. Then a position-attention mechanism is introduced to produce different sentence representations for every query position to model these n tag sequences. In this way, our method can simultaneously extract all entities and their type, as well as all overlapping relations. Experiment results show that our framework performances significantly better on extracting overlapping relations as well as detecting long-range relation, and thus we achieve state-of-the-art performance on two public datasets.


2018 ◽  
Vol 189 ◽  
pp. 03009
Author(s):  
Angelica M. Aquino ◽  
Enrico P. Chavez

Document classification is the process of categorizing documents from many mixed files automatically [1]. In this paper, an approach to classification of documents for admin-case files of Philippine National Police (PNP) using Latent Semantic Indexing (LSI) method is proposed. The model for this that represents term-to-term, document-todocument and term-to-document relationships has been applied. Regular Expression is implemented also to define a search pattern based on character strings which the LSI used to establish the semantic relevance of the character strings to the search term or keyword. The aim of the study is to evaluate the performance of LSI in classifying PNP documents; experimentation was done using software to test the capability of LSI towards text retrieval. Indexing is according to the pattern matched in the collection of text that uses model of SVD. Based on tests, documents were indexed based on file relationships and was able to return a search result as the retrieved information from PNP files. Weights are used to check the accuracy of the method; the positive values identified in query similarity are regarded as the most relevant among the related searches, meaning, the query word matches words in a text file and it returns a query result.


Sign in / Sign up

Export Citation Format

Share Document