scholarly journals Recent Advances in Intelligent Source Code Generation: A Survey on Natural Language Based Studies

Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1174
Author(s):  
Chen Yang ◽  
Yan Liu ◽  
Changqing Yin

Source Code Generation (SCG) is a prevalent research field in the automation software engineering sector that maps specific descriptions to various sorts of executable code. Along with the numerous intensive studies, diverse SCG types that integrate different scenarios and contexts continue to emerge. As the ultimate purpose of SCG, Natural Language-based Source Code Generation (NLSCG) is growing into an attractive and challenging field, as the expressibility and extremely high abstraction of the input end. The booming large-scale dataset generated by open-source code repositories and Q&A resources, the innovation of machine learning algorithms, and the development of computing capacity make the NLSCG field promising and give more opportunities to the model implementation and perfection. Besides, we observed an increasing interest stream of NLSCG relevant studies recently, presenting quite various technical schools. However, many studies are bound to specific datasets with customization issues, producing occasional successful solutions with tentative technical methods. There is no systematic study to explore and promote the further development of this field. We carried out a systematic literature survey and tool research to find potential improvement directions. First, we position the role of NLSCG among various SCG genres, and specify the generation context empirically via software development domain knowledge and programming experiences; second, we explore the selected studies collected by a thoughtfully designed snowballing process, clarify the NLSCG field and understand the NLSCG problem, which lays a foundation for our subsequent investigation. Third, we model the research problems from technical focus and adaptive challenges, and elaborate insights gained from the NLSCG research backlog. Finally, we summarize the latest technology landscape over the transformation model and depict the critical tactics used in the essential components and their correlations. This research addresses the challenges of bridging the gap between natural language processing and source code analytics, outlines different dimensions of NLSCG research concerns and technical utilities, and shows a bounded technical context of NLSCG to facilitate more future studies in this promising area.

2019 ◽  
Vol 45 (3) ◽  
pp. 559-601 ◽  
Author(s):  
Edoardo Maria Ponti ◽  
Helen O’Horan ◽  
Yevgeni Berzak ◽  
Ivan Vulić ◽  
Roi Reichart ◽  
...  

Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.


2020 ◽  
Vol 4 (2) ◽  
pp. 52-66
Author(s):  
Anas Hamid Alokla ◽  
◽  
Walaa Khaled Gad ◽  
Mustafa .M Aref ◽  
Abdel-badeeh .M Salem ◽  
...  

Source Code Generation (SCG) is the sub-domain of the Automatic Programming (AP) that helps programmers to program using high-level abstraction. Recently, many researchers investigated many techniques to access SCG. The problem is to use the appropriate technique to generate the source code due to its purposes and the inputs. This paper introduces a review and an analysis related SCG techniques. Moreover, comparisons are presented for: techniques mapping, Natural Language Processing (NLP), knowledgebase, ontology, Specification Configuration Template (SCT) model and deep learning.


2012 ◽  
Vol 2012 ◽  
pp. 1-17 ◽  
Author(s):  
Anne E. Thessen ◽  
Hong Cui ◽  
Dmitry Mozzherin

Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science.


2021 ◽  
pp. 1063293X2098297
Author(s):  
Ivar Örn Arnarsson ◽  
Otto Frost ◽  
Emil Gustavsson ◽  
Mats Jirstrand ◽  
Johan Malmqvist

Product development companies collect data in form of Engineering Change Requests for logged design issues, tests, and product iterations. These documents are rich in unstructured data (e.g. free text). Previous research affirms that product developers find that current IT systems lack capabilities to accurately retrieve relevant documents with unstructured data. In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering Change Request documents. The aim is to radically decrease the time needed to effectively search for related engineering documents, organize search results, and create labeled clusters from these documents by utilizing Natural Language Processing algorithms. A domain knowledge expert at the case company evaluated the results and confirmed that the algorithms we applied managed to find relevant document clusters given the queries tested.


2021 ◽  
Author(s):  
Xinxu Shen ◽  
Troy Houser ◽  
David Victor Smith ◽  
Vishnu P. Murty

The use of naturalistic stimuli, such as narrative movies, is gaining popularity in many fields, characterizing memory, affect, and decision-making. Narrative recall paradigms are often used to capture the complexity and richness of memory for naturalistic events. However, scoring narrative recalls is time-consuming and prone to human biases. Here, we show the validity and reliability of using a natural language processing tool, the Universal Sentence Encoder (USE), to automatically score narrative recall. We compared the reliability in scoring made between two independent raters (i.e., hand-scored) and between our automated algorithm and individual raters (i.e., automated) on trial-unique, video clips of magic tricks. Study 1 showed that our automated segmentation approaches yielded high reliability and reflected measures yielded by hand-scoring, and further that the results using USE outperformed another popular natural language processing tool, GloVe. In study two, we tested whether our automated approach remained valid when testing individual’s varying on clinically-relevant dimensions that influence episodic memory, age and anxiety. We found that our automated approach was equally reliable across both age groups and anxiety groups, which shows the efficacy of our approach to assess narrative recall in large-scale individual difference analysis. In sum, these findings suggested that machine learning approaches implementing USE are a promising tool for scoring large-scale narrative recalls and perform individual difference analysis for research using naturalistic stimuli.


10.29007/pc58 ◽  
2018 ◽  
Author(s):  
Julia Lavid ◽  
Marta Carretero ◽  
Juan Rafael Zamorano

In this paper we set forth an annotation model for dynamic modality in English and Spanish, given its relevance not only for contrastive linguistic purposes, but also for its impact on practical annotation tasks in the Natural Language Processing (NLP) community. An annotation scheme is proposed, which captures both the functional-semantic meanings and the language-specific realisations of dynamic meanings in both languages. The scheme is validated through a reliability study performed on a randomly selected set of one hundred and twenty sentences from the MULTINOT corpus, resulting in a high degree of inter-annotator agreement. We discuss our main findings and give attention to the difficult cases as they are currently being used to develop detailed guidelines for the large-scale annotation of dynamic modality in English and Spanish.


2020 ◽  
Author(s):  
Joshua Conrad Jackson ◽  
Joseph Watts ◽  
Johann-Mattis List ◽  
Ryan Drabble ◽  
Kristen Lindquist

Humans have been using language for thousands of years, but psychologists seldom consider what natural language can tell us about the mind. Here we propose that language offers a unique window into human cognition. After briefly summarizing the legacy of language analyses in psychological science, we show how methodological advances have made these analyses more feasible and insightful than ever before. In particular, we describe how two forms of language analysis—comparative linguistics and natural language processing—are already contributing to how we understand emotion, creativity, and religion, and overcoming methodological obstacles related to statistical power and culturally diverse samples. We summarize resources for learning both of these methods, and highlight the best way to combine language analysis techniques with behavioral paradigms. Applying language analysis to large-scale and cross-cultural datasets promises to provide major breakthroughs in psychological science.


Author(s):  
Kaan Ant ◽  
Ugur Sogukpinar ◽  
Mehmet Fatif Amasyali

The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.


Author(s):  
Rashida Ali ◽  
Ibrahim Rampurawala ◽  
Mayuri Wandhe ◽  
Ruchika Shrikhande ◽  
Arpita Bhatkar

Internet provides a medium to connect with individuals of similar or different interests creating a hub. Since a huge hub participates on these platforms, the user can receive a high volume of messages from different individuals creating a chaos and unwanted messages. These messages sometimes contain a true information and sometimes false, which leads to a state of confusion in the minds of the users and leads to first step towards spam messaging. Spam messages means an irrelevant and unsolicited message sent by a known/unknown user which may lead to a sense of insecurity among users. In this paper, the different machine learning algorithms were trained and tested with natural language processing (NLP) to classify whether the messages are spam or ham.


Sign in / Sign up

Export Citation Format

Share Document