Natural Language to SQL query Generation

Kiran Raj R

doi:10.22214/ijraset.2021.35804

Natural Language to SQL query Generation

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35804 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 5069-5072

Author(s):

Kiran Raj R

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Regular Expression ◽

Parts Of Speech ◽

Query Generation ◽

Sql Query ◽

Speech Tagging ◽

The Web

Today, everyone has a personal device to access the web. Every user tries to access the knowledge that they require through internet. Most of the knowledge is within the sort of a database. A user with limited knowledge of database will have difficulty in accessing the data in the database. Hence, there’s a requirement for a system that permits the users to access the knowledge within the database. The proposed method is to develop a system where the input be a natural language and receive an SQL query which is used to access the database and retrieve the information with ease. Tokenization, parts-of-speech tagging, lemmatization, parsing and mapping are the steps involved in the process. The project proposed would give a view of using of Natural Language Processing (NLP) and mapping the query in accordance with regular expression in English language to SQL.

Download Full-text

Text Analysis of Assembly Work Instructions

Volume 1B: 35th Computers and Information in Engineering Conference ◽

10.1115/detc2015-47246 ◽

2015 ◽

Cited By ~ 1

Author(s):

Rahul Sharan Renu ◽

Gregory Mocko

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Lead Times ◽

Parts Of Speech ◽

Assembly Work ◽

And Performance ◽

Quality Of Products ◽

Speech Tagging

The objective of this research is to investigate the requirements and performance of parts-of-speech tagging of assembly work instructions. Natural Language Processing of assembly work instructions is required to perform data mining with the objective of knowledge reuse. Assembly work instructions are key process engineering elements that allow for predictable assembly quality of products and predictable assembly lead times. Authoring of assembly work instructions is a subjective process. It has been observed that most assembly work instructions are not grammatically complete sentences. It is hypothesized that this can lead to false parts-of-speech tagging (by Natural Language Processing tools). To test this hypothesis, two parts-of-speech taggers are used to tag 500 assembly work instructions (obtained from the automotive industry). The first parts-of-speech tagger is obtained from Natural Language Processing Toolkit (nltk.org) and the second parts-of-speech tagger is obtained from Stanford Natural Language Processing Group (nlp.stanford.edu). For each of these taggers, two experiments are conducted. In the first experiment, the assembly work instructions are input to the each tagger in raw form. In the second experiment, the assembly work instructions are preprocessed to make them grammatically complete, and then input to the tagger. It is found that the Stanford Natural Language Processing tagger with the preprocessed assembly work instructions produced the least number of false parts-of-speech tags.

Download Full-text

Morphological Tagging and Lemmatization in the Albanian Language

SEEU Review ◽

10.2478/seeur-2021-0015 ◽

2021 ◽

Vol 16 (2) ◽

pp. 3-16

Author(s):

Diellza Nagavci Mati ◽

Mentor Hamiti ◽

Elissa Mollakuqe

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Parts Of Speech ◽

Fine Grained ◽

Additional Information ◽

Part Of Speech ◽

Word Forms ◽

Downstream Processes ◽

Speech Tagging

Abstract An important element of Natural Language Processing is parts of speech tagging. With fine-grained word-class annotations, the word forms in a text can be enhanced and can also be used in downstream processes, such as dependency parsing. The improved search options that tagged data offers also greatly benefit linguists and lexicographers. Natural language processing research is becoming increasingly popular and important as unsupervised learning methods are developed. There are some aspects of the Albanian language that make the creation of a part-of-speech tag set challenging. This research provides a discussion of those issues linguistic phenomena and presents a proposal for a part-of-speech tag set that can adequately represent them. The corpus contains more than 250,000 tokens, each annotated with a medium-sized tag set. The Albanian language’s syntagmatic aspects are adequately represented. Additionally, in this paper are morphologically and part-of-speech tagged corpora for the Albanian language, as well as lemmatize and neural morphological tagger trained on these corpora. Based on the held-out evaluation set, the model achieves 93.65% accuracy on part-of-speech tagging, The morphological tagging rate was 85.31 % and the lemmatization rate was 88.95%. Furthermore, the TF-IDF technique weighs terms and with the scores are highlighted words that have additional information for the Albanian corpus.

Download Full-text

A Hindi Image Caption Generation Framework Using Deep Learning

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3432246 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-19

Author(s):

Santosh Kumar Mishra ◽

Rijul Dhir ◽

Sriparna Saha ◽

Pushpak Bhattacharyya

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Image Captioning ◽

Textual Description ◽

Proposed Model ◽

Hindi Language ◽

The Given

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .

Download Full-text

Using NLP for Fact Checking: A Survey

Designs ◽

10.3390/designs5030042 ◽

2021 ◽

Vol 5 (3) ◽

pp. 42

Author(s):

Eric Lazarski ◽

Mahmood Al-Khassaweneh ◽

Cynthia Howard

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computer Science ◽

Language Processing ◽

The Internet ◽

Fake News ◽

Fact Checking ◽

The Many ◽

Human Powered ◽

The Web

In recent years, disinformation and “fake news” have been spreading throughout the internet at rates never seen before. This has created the need for fact-checking organizations, groups that seek out claims and comment on their veracity, to spawn worldwide to stem the tide of misinformation. However, even with the many human-powered fact-checking organizations that are currently in operation, disinformation continues to run rampant throughout the Web, and the existing organizations are unable to keep up. This paper discusses in detail recent advances in computer science to use natural language processing to automate fact checking. It follows the entire process of automated fact checking using natural language processing, from detecting claims to fact checking to outputting results. In summary, automated fact checking works well in some cases, though generalized fact checking still needs improvement prior to widespread use.

Download Full-text

A Framework for the Generation of Class Diagram from Text Requirements using Natural language Processing

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/041012021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 25-31

Keyword(s):

Natural Language Processing ◽

Software Development ◽

Natural Language ◽

Language Processing ◽

English Language ◽

Software Requirements ◽

Class Diagram ◽

Requirement Analysis ◽

Part Of Speech ◽

Software Engineers

The software development procedure begins with identifying the requirement analysis. The process levels of the requirements start from analysing the requirements to sketch the design of the program, which is very critical work for programmers and software engineers. Moreover, many errors will happen during the requirement analysis cycle transferring to other stages, which leads to the high cost of the process more than the initial specified process. The reason behind this is because of the specifications of software requirements created in the natural language. To minimize these errors, we can transfer the software requirements to the computerized form by the UML diagram. To overcome this, a device has been designed, which plans can provide semi-automatized aid for designers to provide UML class version from software program specifications using natural Language Processing techniques. The proposed technique outlines the class diagram in a well-known configuration and additionally facts out the relationship between instructions. In this research, we propose to enhance the procedure of producing the UML diagrams by utilizing the Natural Language, which will help the software development to analyze the software requirements with fewer errors and efficient way. The proposed approach will use the parser analyze and Part of Speech (POS) tagger to analyze the user requirements entered by the user in the English language. Then, extract the verbs and phrases, etc. in the user text. The obtained results showed that the proposed method got better results in comparison with other methods published in the literature. The proposed method gave a better analysis of the given requirements and better diagrams presentation, which can help the software engineers. Key words: Part of Speech,UM

Download Full-text

Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2012-001453 ◽

2013 ◽

Vol 20 (5) ◽

pp. 931-939 ◽

Cited By ~ 16

Author(s):

Jeffrey P Ferraro ◽

Hal Daumé ◽

Scott L DuVall ◽

Wendy W Chapman ◽

Henk Harkema ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Adaptation ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Aspect Based Sentiments from Tweets using Co-Ranking Multi-Modal Natural Language Processing Methodologies

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6305.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1061-1068

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Detection System ◽

Word Segmentation ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Sentiment Detection ◽

Analysis System ◽

Speech Tagging

Now-a-days people interest to spend their time in social sites especially twitters to post lot of tweets in every day. The posted tweets are used by many users to get the knowledge about the particular applications, products and other search engine queries. With the help of the posted tweets, their emotions and sentiments are derived which are used to get opinion about particular event. Lot of traditional sentiment detection system that has been developed but they failed to analyze huge volume of tweets and online contents with temporal patterns were also difficult to analyze. To overcome the above issues, the co-ranking multi-modal natural language processing based sentiment analysis system was developed to detect the emotions from the posted tweets. Initially, tweets of different events are collected from social sites which are processed by natural language procedures such as Stemming, Lemmatization, Part-of-speech tagging, word segmentation and parsing are applied to get the words related to posted tweets for deriving the sentiments. From the extracted emotions, co-ranking process is applied to get the opinion effectively related to particular event. Then the efficiency of the system is examined using experimental results and discussions. The introduced system recognize the sentiments from tweets with 98.80% of accuracy.

Download Full-text

Improving Brill's tagger lexical and transformation rule for Afaan Oromo language

10.7287/peerj.preprints.1225v1 ◽

2015 ◽

Author(s):

Abraham G Ayana

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Transformation Rule ◽

Initial State ◽

Training Corpus ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging

Natural Language Processing (NLP) refers to Human-like language processing which reveals that it is a discipline within the field of Artificial Intelligence (AI). However, the ultimate goal of research on Natural Language Processing is to parse and understand language, which is not fully achieved yet. For this reason, much research in NLP has focused on intermediate tasks that make sense of some of the structure inherent in language without requiring complete understanding. One such task is part-of-speech tagging, or simply tagging. Lack of standard part of speech tagger for Afaan Oromo will be the main obstacle for researchers in the area of machine translation, spell checkers, dictionary compilation and automatic sentence parsing and constructions. Even though several works have been done in POS tagging for Afaan Oromo, the performance of the tagger is not sufficiently improved yet. Hence,the aim of this thesis is to improve Brill’s tagger lexical and transformation rule for Afaan Oromo POS tagging with sufficiently large training corpus. Accordingly, Afaan Oromo literatures on grammar and morphology are reviewed to understand nature of the language and also to identify possible tagsets. As a result, 26 broad tagsets were identified and 17,473 words from around 1100 sentences containing 6750 distinct words were tagged for training and testing purpose. From which 258 sentences are taken from the previous work. Since there is only a few ready made standard corpuses, the manual tagging process to prepare corpus for this work was challenging and hence, it is recommended that a standard corpus is prepared. Transformation-based Error driven learning are adapted for Afaan Oromo part of speech tagging. Different experiments are conducted for the rule based approach taking 20% of the whole data for testing. A comparison with the previously adapted Brill’s Tagger made. The previously adapted Brill’s Tagger shows an accuracy of 80.08% whereas the improved Brill’s Tagger result shows an accuracy of 95.6% which has an improvement of 15.52%. Hence, it is found that the size of the training corpus, the rule generating system in the lexical rule learner, and moreover, using Afaan Oromo HMM tagger as initial state tagger have a significant effect on the improvement of the tagger.

Download Full-text

Using of Natural Language Processing Techniques in Suicide Research

Emerging Science Journal ◽

10.28991/esj-2017-01120 ◽

2017 ◽

Vol 1 (2) ◽

pp. 89 ◽

Cited By ~ 1

Author(s):

Azam Orooji ◽

Mostafa Langarizadeh

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Medical Information ◽

Inclusion Criteria ◽

Data Set ◽

Completed Suicide ◽

Teenagers And Young Adults ◽

Processing Techniques ◽

The Web

It is estimated that each year many people, most of whom are teenagers and young adults die by suicide worldwide. Suicide receives special attention with many countries developing national strategies for prevention. Since, more medical information is available in text, Preventing the growing trend of suicide in communities requires analyzing various textual resources, such as patient records, information on the web or questionnaires. For this purpose, this study systematically reviews recent studies related to the use of natural language processing techniques in the area of people’s health who have completed suicide or are at risk. After electronically searching for the PubMed and ScienceDirect databases and studying articles by two reviewers, 21 articles matched the inclusion criteria. This study revealed that, if a suitable data set is available, natural language processing techniques are well suited for various types of suicide related research.

Download Full-text

Related Blogs’ Summarization With Natural Language Processing

The Computer Journal ◽

10.1093/comjnl/bxaa110 ◽

2020 ◽

Author(s):

Niyati Baliyan ◽

Aarti Sharma

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

The Web ◽

Content Generation

Abstract There is plethora of information present on the web, on a given topic, in different forms i.e. blogs, articles, websites, etc. However, not all of the information is useful. Perusing and going through all of the information to get the understanding of the topic is a very tiresome and time-consuming task. Most of the time we end up investing in reading content that we later understand was not of importance to us. Due to the lack of capacity of the human to grasp vast quantities of information, relevant and crisp summaries are always desirable. Therefore, in this paper, we focus on generating a new blog entry containing the summary of multiple blogs on the same topic. Different approaches of clustering, modelling, content generation and summarization are applied to reach the intended goal. This system also eliminates the repetitive content giving savings on time and quantity, thereby making learning more comfortable and effective. Overall, a significant reduction in the number of words in the new blog generated by the system is observed by using the proposed novel methodology.

Download Full-text