scholarly journals Preparing languages for natural language generation using Wikidata lexicographical data

2021 ◽  
Author(s):  
Mahir Morshed

In the lead-up to the launch of Abstract Wikipedia, a sufficient body of linguistic information, based on which the text within for a given language can be generated, must be in place so that different sets of functions, some working with concepts and others turning these into word sequences, can work together to produce something natural in that language. To achieve that information body's development requires more thorough consideration of a number of linguistic aspects sooner rather than later. This session will thus discuss aspects of language planning with respect to Wikidata lexicographical data and natural language generation, including the compositionality and manipulability of lexical units, the breadth and interconnectedness of units of meaning, and the treatment of variation among a language’s lects broadly construed. Special reference to the handling of each of these aspects for Bengali and those linguistic varieties often grouped with it will be presented.

2021 ◽  
Author(s):  
Sara Thomas

How do you recover after a crisis?  This session will reflect on the work done by and with the sco.wiki community to recover and rebuild after the negative international press attention that surrounded the wiki in 2020. I’ll talk about on- and off- wiki community development, partnership development, the challenges that still face the project, and hopes for the future. I’ll also reflect on care in volunteer management, and why we should always remember that there are real people behind keyboards.  As Scotland Programme Coordinator for Wikimedia UK, I’ve been involved in supporting the community post-crisis, and have been impressed and heartened by the volume of work which has taken place since sco.wiki hit the headlines. I’d like to take this opportunity to tell the story of a group of editors and Scots speakers who are determined that the wiki should survive, grow, and thrive.  Abstract id. 11: In the lead-up to the launch of Abstract Wikipedia, a sufficient body of linguistic information, based on which the text within for a given language can be generated, must be in place so that different sets of functions, some working with concepts and others turning these into word sequences, can work together to produce something natural in that language. To achieve that information body's development requires more thorough consideration of a number of linguistic aspects sooner rather than later.  This session will thus discuss aspects of language planning with respect to Wikidata lexicographical data and natural language generation, including the compositionality and manipulability of lexical units, the breadth and interconnectedness of units of meaning, and the treatment of variation among a language’s lects broadly construed. Special reference to the handling of each of these aspects for Bengali and those linguistic varieties often grouped with it will be presented. 


Informatics ◽  
2021 ◽  
Vol 8 (1) ◽  
pp. 20
Author(s):  
Giovanni Bonetta ◽  
Marco Roberti ◽  
Rossella Cancelliere ◽  
Patrick Gallinari

In this paper, we analyze the problem of generating fluent English utterances from tabular data, focusing on the development of a sequence-to-sequence neural model which shows two major features: the ability to read and generate character-wise, and the ability to switch between generating and copying characters from the input: an essential feature when inputs contain rare words like proper names, telephone numbers, or foreign words. Working with characters instead of words is a challenge that can bring problems such as increasing the difficulty of the training phase and a bigger error probability during inference. Nevertheless, our work shows that these issues can be solved and efforts are repaid by the creation of a fully end-to-end system, whose inputs and outputs are not constrained to be part of a predefined vocabulary, like in word-based models. Furthermore, our copying technique is integrated with an innovative shift mechanism, which enhances the ability to produce outputs directly from inputs. We assess performance on the E2E dataset, the benchmark used for the E2E NLG challenge, and on a modified version of it, created to highlight the rare word copying capabilities of our model. The results demonstrate clear improvements over the baseline and promising performance compared to recent techniques in the literature.


Author(s):  
Nilesh Ade ◽  
Noor Quddus ◽  
Trent Parker ◽  
S.Camille Peres

One of the major implications of Industry 4.0 will be the application of digital procedures in process industries. Digital procedures are procedures that are accessed through a smart gadget such as a tablet or a phone. However, like paper-based procedures their usability is limited by their access. The issue of accessibility is magnified in tasks such as loading a hopper car with plastic pellets wherein the operators typically place the procedure at a safe distance from the worksite. This drawback can be tackled in the case of digital procedures using artificial intelligence-based voice enabled conversational agent (chatbot). As a part of this study, we have developed a chatbot for assisting digital procedure adherence. The chatbot is trained using the possible set of queries from the operator and text from the digital procedures through deep learning and provides responses using natural language generation. The testing of the chatbot is performed using a simulated conversation with an operator performing the task of loading a hopper car.


Sign in / Sign up

Export Citation Format

Share Document