The case for NLP-enhanced database tuning

2021 ◽  
Vol 14 (7) ◽  
pp. 1159-1165
Author(s):  
Immanuel Trummer

A large body of knowledge on database tuning is available in the form of natural language text. We propose to leverage natural language processing (NLP) to make that knowledge accessible to automated tuning tools. We describe multiple avenues to exploit NLP for database tuning, and outline associated challenges and opportunities. As a proof of concept, we describe a simple prototype system that exploits recent NLP advances to mine tuning hints from Web documents. We show that mined tuning hints improve performance of MySQL and Postgres on TPC-H, compared to the default configuration.

Author(s):  
Sumali Conlon ◽  
Susan Lukose ◽  
Jason G. Hale ◽  
Anil Vinjamur

The Semantic Web will require semantic representations of information that computers can understand when they process business applications. Most Web content is currently represented in formats such as text, that facilitate human understanding, rather than in the more structured formats, that allow automated processing and computer understanding. This chapter explores how natural language processing (NLP) principles, using linguistic analysis, can be employed to extract information from unstructured Web documents and translate it into extensible markup language (XML)—the enabling currency of today’s e-business applications, and the foundation for the emerging Semantic Web languages of tomorrow. Our prototype system is built and tested with online financial documents.


Author(s):  
Shishir K. Shandilya ◽  
Suresh Jain

The explosive increase in Internet usage has attracted technologies for automatically mining the user-generated contents (UGC) from Web documents. These UGC-rich resources have raised new opportunities and challenges to carry out the opinion extraction and mining tasks for opinion summaries. The technology of opinion extraction allows users to retrieve and analyze people’s opinions scattered over Web documents. Opinion mining is a process which is concerned with the opinions generated by the consumers about the product. Opinion Mining aims at understanding, extraction and classification of opinions scattered in unstructured text of online resources. The search engines performs well when one wants to know about any product before purchase, but the filtering and analysis of search results often complex and time-consuming. This generated the need of intelligent technologies which could process these unstructured online text documents through automatic classification, concept recognition, text summarization, etc. These tools are based on traditional natural language techniques, statistical analysis, and machine learning techniques. Automatic knowledge extraction over large text collections like Internet has been a challenging task due to many constraints such as needs of large annotated training data, requirement of extensive manual processing of data, and huge amount of domain-specific terms. Ambient Intelligence (AmI) in wed-enabled technologies supports and promotes the intelligent e-commerce services to enable the provision of personalized, self-configurable, and intuitive applications for facilitating UGC knowledge for buying confidence. In this chapter, we will discuss various approaches of Opinion Mining which combines Ambient Intelligence, Natural Language Processing and Machine Learning methods based on textual and grammatical clues.


2021 ◽  
Author(s):  
Alaa Hussainalsaid

This thesis proposes automatic classification of the emotional content of web documents using Natural Language Processing (NLP) algorithms. We used online articles and general documents to verify the performance of the algorithm, such as general web pages and news articles. The experiments used sentiment analysis that extracts sentiment of web documents. We used unigram and bigram approaches that are known as special types of N-gram, where N=1 and N=2, respectively. The unigram model analyses the probability to hit each word in the corpus independently; however, the bigram model analyses the probability of a word occurring depending on the previous word. Our results show that the unigram model has a better performance compared to the bigram model in terms of automatic classification of the emotional content of web documents.


2021 ◽  
Author(s):  
Alaa Hussainalsaid

This thesis proposes automatic classification of the emotional content of web documents using Natural Language Processing (NLP) algorithms. We used online articles and general documents to verify the performance of the algorithm, such as general web pages and news articles. The experiments used sentiment analysis that extracts sentiment of web documents. We used unigram and bigram approaches that are known as special types of N-gram, where N=1 and N=2, respectively. The unigram model analyses the probability to hit each word in the corpus independently; however, the bigram model analyses the probability of a word occurring depending on the previous word. Our results show that the unigram model has a better performance compared to the bigram model in terms of automatic classification of the emotional content of web documents.


2021 ◽  
Vol 8 (1) ◽  
pp. 421-429
Author(s):  
Yan Puspitarani

Information extraction is part of natural language processing, aiming to find, retrieve, or process information. The data source for information extraction is text. Text cannot be separated from people's daily lives. Through text, a lot of confidential information can be obtained. To produce information, the unstructured text will be converted into structured data. There are many approaches that researchers take to this process. Most of the studies are in English. Therefore, this paper will present current research trends, challenges, and information extraction opportunities using Indonesian.


2012 ◽  
Vol 3 (1) ◽  
pp. 140-143
Author(s):  
Ekta Aggarwal ◽  
Shreeja Nair

Natural Language Processing (NLP) is an area of research and application that explores how computers can be used to understand and manipulate natural language text or speech to do useful things. The paper deals with the concept of database where by the data resources data can be fetched and accessed accordingly with reduced time complexity. The retrieval techniques are pointed out based on the ideas of binary search. A natural language interface refers to words in its own dictionary as well as to the words in the standard dictionary, in order to interpret a query. The main contribution of this investigation is addressing the problem of improving the accuracy of the query translation process by using the information provided by the database schema.  


2019 ◽  
Vol 8 (2) ◽  
pp. 5511-5514

Machine comprehension is a broad research area from Natural Language Processing domain, which deals with making a computerised system understand the given natural language text. Question answering system is one such variant used to find the correct ‘answer’ for a ‘query’ using the supplied ‘context’. Using a sentence instead of the whole context paragraph to determine the ‘answer’ is quite useful in terms of computation as well as accuracy. Sentence selection can, therefore, be considered as a first step to get the answer. This work devises a method for sentence selection that uses cosine similarity and common word count between each sentence of context and question. This removes the extensive training overhead associated with other available approaches, while still giving comparable results. The SQuAD dataset is used for accuracy based performance comparison.


Author(s):  
Hyunmin Cheong ◽  
L. H. Shu

Identifying relevant analogies from biology is a significant challenge in biomimetic design. Our natural-language approach addresses this challenge by developing techniques to search biological information in natural-language format, such as books or papers. This paper presents the application of natural-language processing techniques, such as part-of-speech tags, typed-dependency parsing, and syntactic patterns, to automatically extract and categorize causally related functions from text with biological information. Causally related functions, which specify how one action is enabled by another action, are considered important for both knowledge representation used to model biological information and analogical transfer of biological information performed by designers. An extraction algorithm was developed and scored F-measures of 0.78–0.85 in an initial development test. Because this research approach uses inexpensive and domain-independent techniques, the extraction algorithm has the potential to automatically identify patterns of causally related functions from a large amount of text that contains either biological or design information.


Author(s):  
Renbin Xiao ◽  
Ming Chang ◽  
Hongbin Zhan ◽  
Mu Su

Abstract In view of the existed problems of knowledge acquisition in intelligent systems, a dynamic knowledge extraction method based on Chinese natural language sentence-clustering is put forward, and the corresponding software prototype system is implemented. First of all, the proposed method is introduced in the paper by the way to give its outline. In order to demonstrate an important role of the proposed method, we make a complete case study via the intelligent design of certain machine tool. The design background of such a product is presented and the implementation steps is given in detail to show the whole design process. Through the practical case, we have succeeded in extracting knowledge from natural language text and the effectiveness of the proposed method is verified.


Sign in / Sign up

Export Citation Format

Share Document