scholarly journals Syntax-Guided Controlled Generation of Paraphrases

2020 ◽  
Vol 8 ◽  
pp. 330-345
Author(s):  
Ashutosh Kumar ◽  
Kabir Ahuja ◽  
Raghuram Vadapalli ◽  
Partha Talukdar

Given a sentence (e.g., “I like mangoes”) and a constraint (e.g., sentiment flip), the goal of controlled text generation is to produce a sentence that adapts the input sentence to meet the requirements of the constraint (e.g., “I hate mangoes”). Going beyond such simple constraints, recent work has started exploring the incorporation of complex syntactic-guidance as constraints in the task of controlled paraphrase generation. In these methods, syntactic-guidance is sourced from a separate exemplar sentence. However, these prior works have only utilized limited syntactic information available in the parse tree of the exemplar sentence. We address this limitation in the paper and propose Syntax Guided Controlled Paraphraser (SGCP), an end-to-end framework for syntactic paraphrase generation. We find that Sgcp can generate syntax-conforming sentences while not compromising on relevance. We perform extensive automated and human evaluations over multiple real-world English language datasets to demonstrate the efficacy of Sgcp over state-of-the-art baselines. To drive future research, we have made Sgcp’s source code available. 1

Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 81 ◽  
Author(s):  
Madeleine E. Bartlett ◽  
Cristina Costescu ◽  
Paul Baxter ◽  
Serge Thill

The last few decades have seen widespread advances in technological means to characterise observable aspects of human behaviour such as gaze or posture. Among others, these developments have also led to significant advances in social robotics. At the same time, however, social robots are still largely evaluated in idealised or laboratory conditions, and it remains unclear whether the technological progress is sufficient to let such robots move “into the wild”. In this paper, we characterise the problems that a social robot in the real world may face, and review the technological state of the art in terms of addressing these. We do this by considering what it would entail to automate the diagnosis of Autism Spectrum Disorder (ASD). Just as for social robotics, ASD diagnosis fundamentally requires the ability to characterise human behaviour from observable aspects. However, therapists provide clear criteria regarding what to look for. As such, ASD diagnosis is a situation that is both relevant to real-world social robotics and comes with clear metrics. Overall, we demonstrate that even with relatively clear therapist-provided criteria and current technological progress, the need to interpret covert behaviour cannot yet be fully addressed. Our discussions have clear implications for ASD diagnosis, but also for social robotics more generally. For ASD diagnosis, we provide a classification of criteria based on whether or not they depend on covert information and highlight present-day possibilities for supporting therapists in diagnosis through technological means. For social robotics, we highlight the fundamental role of covert behaviour, show that the current state-of-the-art is unable to characterise this, and emphasise that future research should tackle this explicitly in realistic settings.


2021 ◽  
Author(s):  
Yossi Gil ◽  
Dor Ma’ayan

<div><div><div><p>Mutation score is widely accepted to be a reliable measurement for the effectiveness of software tests. Recent studies, however, show that mutation analysis is extremely costly and hard to use in practice. We present a novel direct prediction model of mutation score using neural networks. Relying solely on static code features that do not require generation of mutants or execution of the tests, we predict mutation score with an accuracy better than a quintile. When we include statement coverage as a feature, our accuracy rises to about a decile. Using a similar approach, we also improve the state-of-the-art results for binary test effectiveness prediction and introduce an intuitive, easy-to-calculate set of features superior to previously studied sets. We also publish the largest dataset of test-class level mutation score and static code features data to date, for future research. Finally, we discuss how our approach could be integrated into real-world systems, IDEs, CI tools, and testing frameworks.</p></div></div></div>


2021 ◽  
Author(s):  
◽  
Tinh Le

<p>This study examines the English language needs of mechanical engineers in Vietnam. A high demand for proficiency in English is increasing in ASEAN countries, including Vietnam. Vietnam in general and the important field of mechanical engineering, in particular, attracts many foreign investors and multinational organisations and this creates plurilingual and pluricultural workplaces where English is used as a lingua franca.  Drawing on sociolinguistic theory, this pragmatic mixed method needs analysis study examines the English language communication needs of Vietnamese mechanical engineers at four workplaces in Vietnam. It investigates the kinds of real-world English skills required by Vietnamese mechanical engineers to function effectively in the workplace, the social factors that affect the use of English and the effects of breakdowns or other issues in communication in English. It draws on needs analysis models which have evolved from English for Specific Purposes, including those devised by Munby (1978) and more recently by The Common European Framework (CEF) Professional Profiles to establish key communicative events. To answer the study’s pragmatic questions about language use for practical purposes in the lingua franca, plurilingual and pluricultural workplace it also borrows from the theoretically eclectic model of the Wellington Workplace Project, a model grounded in the first language context (L1), and other more sociological studies of the relationship of language and power in international workplaces.  The study employed questionnaire, semi-structured interview and observation for data collection. Questionnaires were completed by 22 managers of mechanical engineers and 71 professional mechanical engineers. Based on the initial questionnaire analysis, 12 participants from the two groups took part in the follow-up semi-structured interviews. Observations in four worksites provided rich data about the real-world use of English.  The findings indicated a high frequency of English language use and the range of real-world English required by Vietnamese mechanical engineers for a range of communicative events including ordering spare parts, interpreting technical drawing and bidding for contracts. Mechanical engineers needed plurilingual and pluricultural competence to negotiate a range of accent, intonation and idiom in the lingua franca and plurilingual context. Minimal use of functional occupational language was sometimes sufficient for communication for the purpose of ‘getting things done’, but not always. Communication issues had financial consequences for the company, sometimes disastrous ones. Looking at the findings through the lens of arising communication issues helped to reveal some of the underlying power relationships in the workplace and some negative impacts on workplace solidarity.  These findings demonstrate the urgency of the need for increased English language skills for mechanical engineers in Vietnam and for the wider economy of Vietnam. English was found to function as a source of ‘expert power’ and in a wider implication this revealed a hidden or ‘shadow’ power structure within the workplace affected by English language proficiency. People were empowered when they possessed a good level of English, which could help them save not only their own face but also the face of the company.  More positively adaptive communicative strategies helped both mechanical engineers and their managers avoid communication issues. Adapting language for the purpose of ‘getting things done’ in turn interacted with low and high solidarity relationships. There was arguably an acceptance of a level of rudeness or abruptness in these workplace contexts. A high tolerance for the need to negotiate meaning in what could be described as not only a lingua franca but also a ‘poor English’ workplace context was sometimes observed. This tolerance sometimes but not always extended to the mobility of plurilingual repertoires such as code-switching, and some code-switching into Vietnamese was also observed on the part of long-term foreign managers. Humour also emerged as a dimension of high solidarity longer-term workplace relationships between Vietnamese mechanical engineers and foreign managers, even when all parties had limited English.  The study argues that understanding why mechanical engineers needed specific types of English and the effect of the social dimensions of this language could help lessen issues in communication. The consequences of miscommunication should be addressed in the English-language training process. Students should be strategically prepared to meet the the high communication demands of the lingua franca and plurilingual workplace which requires both English for technical communication and English for social communication.</p>


2009 ◽  
Vol 8 (4) ◽  
pp. 254-262 ◽  
Author(s):  
William Ribarsky ◽  
Brian Fisher ◽  
William M. Pottenger

There has been progress in the science of analytical reasoning and in meeting the recommendations for future research that were laid out when the field of visual analytics was established. Researchers have also developed a group of visual analytics tools and methods that embody visual analytics principles and attack important and challenging real-world problems. However, these efforts are only the beginning and much study remains to be done. This article examines the state of the art in visual analytics methods and reasoning and gives examples of current tools and capabilities. It shows that the science of visual analytics needs interdisciplinary efforts, indicates some of the disciplines that should be involved and presents an approach to how they might work together. Finally, the article describes some gaps, opportunities and future directions in developing new theories and models that can be enacted in methods and design principles and applied to significant and complex practical problems and data.


Author(s):  
O. H. Skurzhanskyi ◽  
A. A. Marchenko

The article is devoted to the review of conditional test generation, one of the most promising fields of natural language processing and artificial intelligence. Specifically, we explore monolingual local sequence transduction tasks: paraphrase generation, grammatical and spelling errors correction, text simplification. To give a better understanding of the considered tasks, we show examples of good rewrites. Then we take a deep look at such key aspects as publicly available datasets with the splits (training, validation, and testing), quality metrics for proper evaluation, and modern solutions based primarily on modern neural networks. For each task, we analyze its main characteristics and how they influence the state-of-the-art models. Eventually, we investigate the most significant shared features for the whole group of tasks in general and for approaches that provide solutions for them.


2020 ◽  
Vol 34 (05) ◽  
pp. 8303-8310
Author(s):  
Yuan Li ◽  
Chunyuan Li ◽  
Yizhe Zhang ◽  
Xiujun Li ◽  
Guoqing Zheng ◽  
...  

Learning to generate text with a given label is a challenging task because natural language sentences are highly variable and ambiguous. It renders difficulties in trade-off between sentence quality and label fidelity. In this paper, we present CARA to alleviate the issue, where two auxiliary classifiers work simultaneously to ensure that (1) the encoder learns disentangled features and (2) the generator produces label-related sentences. Two practical techniques are further proposed to improve the performance, including annealing the learning signal from the auxiliary classifier, and enhancing the encoder with pre-trained language models. To establish a comprehensive benchmark fostering future research, we consider a suite of four datasets, and systematically reproduce three representative methods. CARA shows consistent improvement over the previous methods on the task of label-conditional text generation, and achieves state-of-the-art on the task of attribute transfer.


Author(s):  
Divesh Kubal ◽  
Hemant Palivela

Paraphrase Generation is one of the most important and challenging tasks in the field of Natural Language Generation. The paraphrasing techniques help to identify or to extract/generate phrases/sentences conveying the similar meaning. The paraphrasing task can be bifurcated into two sub-tasks namely, Paraphrase Identification (PI) and Paraphrase Generation (PG). Most of the existing proposed state-of-the-art systems have the potential to solve only one problem at a time. This paper proposes a light-weight unified model that can simultaneously classify whether given pair of sentences are paraphrases of each other and the model can also generate multiple paraphrases given an input sentence. Paraphrase Generation module aims to generate fluent and semantically similar paraphrases and the Paraphrase Identification systemaims to classify whether sentences pair are paraphrases of each other or not. The proposed approach uses an amalgamation of data sampling or data variety with a granular fine-tuned Text-To-Text Transfer Transformer (T5) model. This paper proposes a unified approach which aims to solve the problems of Paraphrase Identification and generation by using carefully selected data-points and a fine-tuned T5 model. The highlight of this study is that the same light-weight model trained by keeping the objective of Paraphrase Generation can also be used for solving the Paraphrase Identification task. Hence, the proposed system is light-weight in terms of the model&rsquo;s size along with the data used to train the model which facilitates the quick learning of the model without having to compromise with the results. The proposed system is then evaluated against the popular evaluation metrics like BLEU (BiLingual Evaluation Understudy):, ROUGE (Recall-Oriented Understudy for Gisting Evaluation), METEOR, WER (Word Error Rate), and GLEU (Google-BLEU) for Paraphrase Generation and classification metrics like accuracy, precision, recall and F1-score for Paraphrase Identification system. The proposed model achieves state-of-the-art results on both the tasks of Paraphrase Identification and paraphrase Generation.


2021 ◽  
Author(s):  
Yossi Gil ◽  
Dor Ma’ayan

<div><div><div><p>Mutation score is widely accepted to be a reliable measurement for the effectiveness of software tests. Recent studies, however, show that mutation analysis is extremely costly and hard to use in practice. We present a novel direct prediction model of mutation score using neural networks. Relying solely on static code features that do not require generation of mutants or execution of the tests, we predict mutation score with an accuracy better than a quintile. When we include statement coverage as a feature, our accuracy rises to about a decile. Using a similar approach, we also improve the state-of-the-art results for binary test effectiveness prediction and introduce an intuitive, easy-to-calculate set of features superior to previously studied sets. We also publish the largest dataset of test-class level mutation score and static code features data to date, for future research. Finally, we discuss how our approach could be integrated into real-world systems, IDEs, CI tools, and testing frameworks.</p></div></div></div>


2021 ◽  
Vol 7 ◽  
pp. e625
Author(s):  
Artem Kruglov ◽  
Dragos Strugar ◽  
Giancarlo Succi

Context Tailoring mechanisms allow performance dashboards to vary their appearance as a response to changing requirements (e.g., adapting to multiple users or multiple domains). Objective We analyze existing research on tailored dashboards and investigate different proposed approaches. Methodology We performed a systematic literature review. Our search processes yielded a total of 1,764 papers, out of which we screened 1,243 and ultimately used six for data collection. Results Tailored dashboards, while being introduced almost thirty years ago, did not receive much research attention. However, the area is expanding in recent years and we observed common patterns in novel tailoring mechanisms. Since none of the existing solutions have been running for extended periods of time in real-world scenarios, this lack of empirical data is a likely cause of vaguely described research designs and important practical issues being overlooked. Implications Based on our findings we propose types of tailoring mechanisms taking into account the timing and nature of recommendations. This classification is grounded in empirical data and serves as a step ahead to a more unifying way of looking at tailoring capabilities in the context of dashboards. Finally, we outline a set of recommendations for future research, as well as a series of steps to follow to make studies more attractive to practitioners.


Author(s):  
Xing Hu ◽  
Ge Li ◽  
Xin Xia ◽  
David Lo ◽  
Shuai Lu ◽  
...  

Code summarization, aiming to generate succinct natural language description of source code, is extremely useful for code search and code comprehension. It has played an important role in software maintenance and evolution. Previous approaches generate summaries by retrieving summaries from similar code snippets. However, these approaches heavily rely on whether similar code snippets can be retrieved, how similar the snippets are, and fail to capture the API knowledge in the source code, which carries vital information about the functionality of the source code. In this paper, we propose a novel approach, named TL-CodeSum, which successfully uses API knowledge learned in a different but related task to code summarization. Experiments on large-scale real-world industry Java projects indicate that our approach is effective and outperforms the state-of-the-art in code summarization.


Sign in / Sign up

Export Citation Format

Share Document