A spatial text query scheme based on semantic-aware

Author(s):  
Hongbo Li ◽  
Hong Zhu ◽  
Zongmin Cui
Keyword(s):  
Author(s):  
Duhita Pawar ◽  
Vina M. Lomte

In this paper, a detailed survey on different facet mining techniques, their advantages and disadvantages is carried out. Facets are any word or phrase which summarize an important aspect about the web query. Researchers proposed different efficient techniques which improves the user’s web query search experiences magnificently. Users are happy when they find the relevant information to their query in the top results. The objectives of their research are: (1) To present automated solution to derive the query facets by analyzing the text query; (2) To create taxonomy of query refinement strategies for efficient results; and (3) To personalize search according to user interest.


Author(s):  
Blaž Fortuna ◽  
Nello Cristianini ◽  
John Shawe-Taylor

We present a general method using kernel canonical correlation analysis (KCCA) to learn a semantic of text from an aligned multilingual collection of text documents. The semantic space provides a language-independent representation of text and enables a comparison between the text documents from different languages. In experiments, we apply the KCCA to the cross-lingual retrieval of text documents, where the text query is written in only one language, and to cross-lingual text categorization, where we trained a cross-lingual classifier.


2003 ◽  
Vol 12 (02) ◽  
pp. 161-195 ◽  
Author(s):  
LIANGYOU CHEN ◽  
HASAN M. JAMIL

Similar to most scientific studies, biological analyses demand a great deal of computations and simulations involving sophisticated tools that are often found geographically distributed over the Internet. A worldwide effort in genomics research has resulted in a powerful collection of publicly available sequence analysis tools. These tools often require specialized local services and domain knowledge to function correctly, rendering them unlikely candidates for integration into remote database applications. Thus, integration of heterogeneous "functions" still remains an open problem. Providing a reasonable framework for seamless integration of these tools with database query engines will enable application developers to exploit and harness the power of these effective analysis tools. In this paper, we present an integration framework for such tools by enabling access to them in a user transparent way as part of database queries. In our system, such online tools are abstracted as remote user defined functions (RUDF). An extended SQL DDL language, called the Internet Function Definition Language (IFDL), is presented for the specification and definition of RUDFs. The interface between database system and the Internet is implemented using a layer based on a language called the Hyper Text Query Language (HTQL). The separation of IFDL, DDL, HTQL and SQL DML offers several optimization opportunities and makes it possible to develop an architecture for interoperability of heterogeneous databases with RUDFs in more simple and efficient ways.


2012 ◽  
Vol 13 (2) ◽  
pp. 101-110 ◽  
Author(s):  
Elchin S. Julfayev ◽  
Ryan J. McLaughlin ◽  
Yi-Ping Tao ◽  
William A. McLaughlin

Author(s):  
Siham Jabri ◽  
Azzeddine Dahbi ◽  
Taoufiq Gadi

Pseudo-relevance feedback is a query expansion approach whose terms are selected from a set of top ranked retrieved documents in response to the original query.  However, the selected terms will not be related to the query if the top retrieved documents are irrelevant. As a result, retrieval performance for the expanded query is not improved, compared to the original one. This paper suggests the use of documents selected using Pseudo Relevance Feedback for generating association rules. Thus, an algorithm based on dominance relations is applied. Then the strong correlations between query and other terms are detected, and an oriented and weighted graph called Pseudo-Graph Feedback is constructed. This graph serves for expanding original queries by terms related semantically and selected by the user. The results of the experiments on Text Retrieval Conference (TREC) collection are very significant, and best results are achieved by the proposed approach compared to both the baseline system and an existing technique.


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Adejare Atanda ◽  
Jessica C. Goodell ◽  
Sherry Adams ◽  
Veronica Black

ObjectiveDevelop a free text query to track synthetic cannabinoid-related ED visits.Assess trends in synthetic cannabinoid use from 2013-2018 using spatial and time-series analysis.IntroductionMaryland utilizes ESSENCE for identification of emerging public health threats, including non-fatal overdoses. Synthetic cannabinoids are heterogeneous psychoactive compounds identified as substances of abuse.[1] In March 2018, the Illinois Department of Public Health received reports of unexplained bleeding in patients who reported using these products.[2] As a result, CDC initiated coordination of national surveillance activities for possible cases of coagulopathy associated with synthetic cannabinoids use. By May 2018, state health departments reported 202 cases, including five deaths. [3]On April 3, 2018, Maryland reported its index case - a female in her 20’s who presented to an ED with nausea, blood in her stool, vaginal bleeding, bruising, an elevated internal normalized ratio (> 12.2), and bleeding oral ulcers after quitting use of a synthetic cannabinoid. She was successfully treated with Vitamin K. The first reported mortality in a Maryland resident was a male in his 30’s who called EMS for fever and blood in his urine but subsequently went into cardiac arrest and was unable to be resuscitated. The patient was known to use synthetic cannabinoids. Brodifacoum exposure was confirmed by laboratory testing. As of September 2018, the Maryland Poison Control Center had received reports of 43 cases, and 3 deaths linked to the outbreak.MethodsTo support surveillance and timeliness of synthetic cannabinoids reporting, we developed a case definition by conducting key word searches to identify terms/phrases used by providers in Maryland ED’s to document synthetic cannabinoid visits. This process yielded the following terms: “synthetic marijuana”, “spice”, and “K2”.Subsequently, we created a free text query based on the case definition and variations of the terms/phrases. This query allowed us to capture data on ED visits for synthetic cannabinoid use in the chief complaint (CC), discharge diagnosis (DD), and clinical impression (CI) fields of ESSENCE data.Finally, descriptive and geographic spatial analyses were conducted of synthetic cannabinoid-related morbidity (ED visits) for 2013-2017 (data for 2018 is incomplete); and time trends analyzed for 2013-2018.ResultsFrom 2013 to 2017, a total of 1,097 ED visits across Maryland were synthetic cannabinoid-related (Table 1). The overall crude synthetic cannabinoid-related ED visit rate was 20 per 100,000 population. The number of synthetic cannabinoid-related ED visits increased 8-fold, from 40 in 2013 to 353 in 2017. Females made the most synthetic cannabinoid-related ED visits (n = 861, 78%). Adults aged 15-24 and 25-34 made 349 (32%) and 367 (33%) visits respectively to an ED for a synthetic cannabinoid-related event. Whites and blacks made 466 (42%) and 498 (45%) visits respectively to an ED for a synthetic cannabinoid-related event. People who were non-Hispanic (n= 988, 90%), black (n = 498, 45%), female (n = 861, 78%), and aged 25-34 (367, 33%) visited an ED for a synthetic cannabinoid-related event more than any other demographic group.Time trend analysis shows an increase from baseline in synthetic cannabinoid-related ED visits starting from July 2014 (Figure 1). Three spikes are noted thereafter in April, July, and September 2015 respectively. Consequently, ED visits for synthetic cannabinoid-related events dropped to a new baseline value in December 2015. Two spikes are also noted for synthetic cannabinoid-related ED visits in May and September 2017 respectively with a new baseline established starting January 2018.Spatial analysis shows geographic clustering of synthetic cannabinoid-related morbidity in three Maryland jurisdictions; Baltimore City, Fredrick County, and Washington County (Figure 2).The top five Maryland counties with crude synthetic cannabinoid-related ED visit rates included Allegany, Baltimore City, Frederick, St. Mary’s and Washington; ranging from 87 in Washington county to 38 in St. Mary’s county. The top ten crude synthetic cannabinoid-related ED visit rates per 100,000 population from 2013 to 2017 among all Maryland ZIP codes ranged from 87 in Washington county to 38 in St. Mary’s county.Spatial analysis also shows that hospitals with the greatest burden of synthetic cannabinoid-related ED visits were close to ZIP codes of communities with high crude synthetic cannabinoid-related ED visit rates (Figure 3).ConclusionsData from the ESSENCE program can be considered acceptable for monitoring synthetic cannabinoid-related ED visits in Maryland. It is useful for obtaining near real-time data about synthetic cannabinoid-related events, and as we have shown in our analysis, for the identification of key groups and geographic locations most in need of targeted interventions to reduce morbidity and mortality. Finally, it also provides us with the ability to retrospectively identify outbreaks, and to link data trends to ongoing interventions.References[1] Riederer, Anne et al. Acute Poisonings from Synthetic Cannabinoids — 50 U.S. Toxicology Investigators Consortium Registry Sites, 2010–2015. Centers for Disease Control and Prevention. MMWR. July 2016. Retrieved from: https://www.cdc.gov/mmwr/volumes/65/wr/mm6527a2.htm[2] Horth, Roberta. Notes from the Field: Outbreak of Severe Illness Linked to the Vitamin K Antagonist Brodifacoum and Use of Synthetic Cannabinoids — Illinois, March–April 2018[3] Centers for Disease Control and Prevention. Outbreak of life-threatening coagulopathy associated with synthetic cannabinoids use. May 2018. Retrieved from: https://emergency.cdc.gov/han/han00410.asp


2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Aaron Kite-Powell ◽  
Michael Coletta ◽  
Jamie Smimble

Objective: The objective of this work is to describe the use and performance of the NSSP ESSENCE system by analyzing the structured query language (SQL) logs generated by users of the National Syndromic Surveillance Program’s (NSSP) Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE).Introduction: As system users develop queries within ESSENCE, they step through the user-interface to select data sources and parameters needed for their query. Then they select from the available output options (e.g., time series, table builder, data details). These activities execute a SQL query on the database, the majority of which are saved in a log so that system developers can troubleshoot problems. Secondarily, these data can be used as a form of web analytics to describe user query choices, query volume, query execution time, and develop an understanding of ESSENCE query patterns.Methods: ESSENCE SQL query logs were extracted from April 1, 2016 to August 23th, 2017. Overall query volume was assessed by summarizing volume of queries over time (e.g., by hour, day, and week), and by Site. To better understand system performance the mean, median, and maximum query execution times were summarized over time and by Site. SQL query text was parsed so that we could isolate, 1) Syndromes queried, 2) Sub-syndromes queried, 3) Keyword categories queried, and 4) Free text query terms used. Syndromes, sub-syndromes, and keyword categories were tabulated in total and by Site. Frequencies of free text query terms were analyzed using n-grams, wordclouds, and term co-occurrence relationships. Term co-occurrence network graphs were used to visualize the structure and relationships among terms.Results: There were a total of 354,101 SQL queries generated by users of ESSENCE between April 1, 2016 and August 23rd, 2017. Over this entire time period there was a weekly mean of 4,785 SQL queries performed by users. When looking at 2017 data through August 23rd this figure increases to a mean of 7,618 SQL queries per week for 2017, and since May 2017 the mean number of SQL queries has increased to 10,485 per week. The maximum number of user generated SQL queries in a week was 29,173. The mean, median, and maximum query execution times for all data was 0.61 minutes, 0 minutes, and 365 minutes, respectively. When looking at only queries with a free text component the mean query execution time increases slightly to 0.94 minutes, though the median is still 0 minutes. The peak usage period based on number of SQL queries performed is between 12:00pm and 3:00pm EST.Conclusions: The use of NSSP ESSENCE has grown since implementation. This is the first time the ESSENCE system has been used at a National level with this volume of data, and number of users. Our focus to date has been on successfully on-boarding new Sites so that they can benefit from use of the available tools, providing trainings to new users, and optimizing ESSENCE performance. Routine analysis of the ESSENCE SQL logs can assist us in understanding how the system is being used, how well it is performing, and in evaluating our system optimization efforts.


This paper presents news video retrieval using text query for Gujarati language news videos. Due to the fact that Broadcasted Video in India is lacking in metadata information such as closed captioning, transcriptions etc., retrieval of videos based on text data is trivial task for most of the Indian language video. To retrieve specific story based on text query in regional language is the key idea behind our approach. Broadcast video is segmented to get shots representing small news stories. To represent each shot efficiently, key frame extraction using singular value decomposition and rank of matrix is proposed. Text is extracted from keyframes for further indexing data. Next task is to process text using natural language processing steps like tokenization, punctuation and extra symbols removal as well as stemming of words to root words etc. Due to unavailability of stemming and other methods of preprocessing of text in Guajarati language, we have given basic stemming technique to reduce dictionary size for efficient indexing of text data. With proposed system 82.5 percent accuracy is achieved on Gujarati news video dataset ETV.


Sign in / Sign up

Export Citation Format

Share Document