An Interactive Tutoring System for Learning Language Processing and Compiler Design

Author(s):  
Rafael del Vado Vírseda
2020 ◽  
Author(s):  
Heresh Shahani ◽  
Harish Pallila ◽  
Musoke Sendaula ◽  
Saroj Biswas

Author(s):  
A. Evtushenko

Machine learning language models are combinations of algorithms and neural networks designed for text processing composed in natural language (Natural Language Processing, NLP).  In 2020, the largest language model from the artificial intelligence research company OpenAI, GPT-3, was released, the maximum number of parameters of which reaches 175 billion. The parameterization of the model increased by more than 100 times made it possible to improve the quality of generated texts to a level that is hard to distinguish from human-written texts. It is noteworthy that this model was trained on a training dataset mainly collected from open sources on the Internet, the volume of which is estimated at 570 GB.  This article discusses the problem of memorizing critical information, in particular, personal data of individual, at the stage of training large language models (GPT-2/3 and derivatives), and also describes an algorithmic approach to solving this problem, which consists in additional preprocessing training dataset and refinement of the model inference in the context of generating pseudo-personal data and embedding into the results of work on the tasks of summarization, text generation, formation of answers to questions and others from the field of seq2seq.


2008 ◽  
Vol 2008 ◽  
pp. 1-9 ◽  
Author(s):  
Mi-Young Kim

Interactions between proteins and genes are considered essential in the description of biomolecular phenomena, and networks of interactions are applied in a system's biology approach. Recently, many studies have sought to extract information from biomolecular text using natural language processing technology. Previous studies have asserted that linguistic information is useful for improving the detection of gene interactions. In particular, syntactic relations among linguistic information are good for detecting gene interactions. However, previous systems give a reasonably good precision but poor recall. To improve recall without sacrificing precision, this paper proposes a three-phase method for detecting gene interactions based on syntactic relations. In the first phase, we retrieve syntactic encapsulation categories for each candidate agent and target. In the second phase, we construct a verb list that indicates the nature of the interaction between pairs of genes. In the last phase, we determine direction rules to detect which of two genes is the agent or target. Even without biomolecular knowledge, our method performs reasonably well using a small training dataset. While the first phase contributes to improve recall, the second and third phases contribute to improve precision. In the experimental results using ICML 05 Workshop on Learning Language in Logic (LLL05) data, our proposed method gave an F-measure of 67.2% for the test data, significantly outperforming previous methods. We also describe the contribution of each phase to the performance.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2712
Author(s):  
JongYoon Lim ◽  
Inkyu Sa ◽  
Ho Seok Ahn ◽  
Norina Gasteiger ◽  
Sanghyub John Lee ◽  
...  

Sentiment prediction remains a challenging and unresolved task in various research fields, including psychology, neuroscience, and computer science. This stems from its high degree of subjectivity and limited input sources that can effectively capture the actual sentiment. This can be even more challenging with only text-based input. Meanwhile, the rise of deep learning and an unprecedented large volume of data have paved the way for artificial intelligence to perform impressively accurate predictions or even human-level reasoning. Drawing inspiration from this, we propose a coverage-based sentiment and subsentence extraction system that estimates a span of input text and recursively feeds this information back to the networks. The predicted subsentence consists of auxiliary information expressing a sentiment. This is an important building block for enabling vivid and epic sentiment delivery (within the scope of this paper) and for other natural language processing tasks such as text summarisation and Q&A. Our approach outperforms the state-of-the-art approaches by a large margin in subsentence prediction (i.e., Average Jaccard scores from 0.72 to 0.89). For the evaluation, we designed rigorous experiments consisting of 24 ablation studies. Finally, our learned lessons are returned to the community by sharing software packages and a public dataset that can reproduce the results presented in this paper.


2012 ◽  
Vol 45 (2) ◽  
pp. 499-515 ◽  
Author(s):  
Danielle S. McNamara ◽  
Scott A. Crossley ◽  
Rod Roscoe

Author(s):  
Dara Tafazoli ◽  
Elena Gómez María ◽  
Cristina A. Huertas Abril

Intelligent computer-assisted language learning (ICALL) is a multidisciplinary area of research that combines natural language processing (NLP), intelligent tutoring system (ITS), second language acquisition (SLA), and foreign language teaching and learning (FLTL). Intelligent tutoring systems (ITS) are able to provide a personalized approach to learning by assuming the role of a real teacher/expert who adapts and steers the learning process according to the specific needs of each learner. This article reviews and discusses the issues surrounding the development and use of ITSs for language learning and teaching. First, the authors look at ICALL history: its evolution from CALL. Second, issues in ICALL research and integration will be discussed. Third, they will explain how artificial intelligence (AI) techniques are being implemented in language education as ITS and intelligent language tutoring systems (ITLS). Finally, the successful integration and development of ITLS will be explained in detail.


2020 ◽  
Author(s):  
Saroj Biswas ◽  
Musoke Sendaula ◽  
Sesha Yeruva ◽  
Krishana Priya Sannidhi ◽  
Ravi Shankar Dwivedula

Sign in / Sign up

Export Citation Format

Share Document