scholarly journals Generating Sequence Diagrams from Arabic User Requirements using MADA+TOKAN Tool

Author(s):  
Nermeen Alami ◽  
Nabil Arman ◽  
Faisal Khamayseh

A new semi-automated approach for generating sequence diagrams from Arabic user requirements is presented. In this novel approach, the Arabic user requirements are parsed using a natural language processing tool called MADA+TOKAN to generate the Part Of Speech (POS) tags of the parsed user requirements, then a set of heuristics are applied on the resulted tags to obtain the sequence diagram components; objects, messages and work flow transitions (messages). The generated sequence diagram is expressed using Extensible Markup Language (XMI) to be drawn using sequence diagrams drawing tools. Our approach achieves better results than students in generating sequence diagrams. It also has better accuracy in generating the participants and less accuracy in generating messages exchanged between participants. The proposed approach is validated using a set of experiments involving a set of real cases evaluated by a group of software engineers and a group of graduate students who are familiar with sequence diagrams

2021 ◽  
pp. 1-42
Author(s):  
Maha J. Althobaiti

Abstract The wide usage of multiple spoken Arabic dialects on social networking sites stimulates increasing interest in Natural Language Processing (NLP) for dialectal Arabic (DA). Arabic dialects represent true linguistic diversity and differ from modern standard Arabic (MSA). In fact, the complexity and variety of these dialects make it insufficient to build one NLP system that is suitable for all of them. In comparison with MSA, the available datasets for various dialects are generally limited in terms of size, genre and scope. In this article, we present a novel approach that automatically develops an annotated country-level dialectal Arabic corpus and builds lists of words that encompass 15 Arabic dialects. The algorithm uses an iterative procedure consisting of two main components: automatic creation of lists for dialectal words and automatic creation of annotated Arabic dialect identification corpus. To our knowledge, our study is the first of its kind to examine and analyse the poor performance of the MSA part-of-speech tagger on dialectal Arabic contents and to exploit that in order to extract the dialectal words. The pointwise mutual information association measure and the geographical frequency of word occurrence online are used to classify dialectal words. The annotated dialectal Arabic corpus (Twt15DA), built using our algorithm, is collected from Twitter and consists of 311,785 tweets containing 3,858,459 words in total. We randomly selected a sample of 75 tweets per country, 1125 tweets in total, and conducted a manual dialect identification task by native speakers. The results show an average inter-annotator agreement score equal to 64%, which reflects satisfactory agreement considering the overlapping features of the 15 Arabic dialects.


2015 ◽  
Vol 24 (2) ◽  
pp. 277-286 ◽  
Author(s):  
Nabil Arman ◽  
Sari Jabbarin

AbstractAutomated software engineering has attracted a large amount of research efforts. The use of object-oriented methods for software systems development has made it necessary to develop approaches that automate the construction of different Unified Modeling Language (UML) models in a semiautomated approach from textual user requirements. UML use case models represent an essential artifact that provides a perspective of the system under analysis or development. The development of such use case models is very crucial in an object-oriented development method. The main principles used in obtaining these models are described. A natural language processing tool is used to parse different statements of the user requirements written in Arabic to obtain lists of nouns, noun phrases, verbs, verb phrases, etc., that aid in finding potential actors and use cases. A set of steps that represent our approach for constructing a use case model are presented. Finally, the proposed approach is validated using an experiment involving a group of graduate students who are familiar with use case modeling.


Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 56
Author(s):  
Hongwei Li ◽  
Hongyan Mao ◽  
Jingzi Wang

Part-of-Speech (POS) tagging is one of the most important tasks in the field of natural language processing (NLP). POS tagging for a word depends not only on the word itself but also on its position, its surrounding words, and their POS tags. POS tagging can be an upstream task for other NLP tasks, further improving their performance. Therefore, it is important to improve the accuracy of POS tagging. In POS tagging, bidirectional Long Short-Term Memory (Bi-LSTM) is commonly used and achieves good performance. However, Bi-LSTM is not as powerful as Transformer in leveraging contextual information, since Bi-LSTM simply concatenates the contextual information from left-to-right and right-to-left. In this study, we propose a novel approach for POS tagging to improve the accuracy. For each token, all possible POS tags are obtained without considering context, and then rules are applied to prune out these possible POS tags, which we call rule-based data preprocessing. In this way, the number of possible POS tags of most tokens can be reduced to one, and they are considered to be correctly tagged. Finally, POS tags of the remaining tokens are masked, and a model based on Transformer is used to only predict the masked POS tags, which enables it to leverage bidirectional contexts. Our experimental result shows that our approach leads to better performance than other methods using Bi-LSTM.


2021 ◽  
Vol 23 (07) ◽  
pp. 1247-1255
Author(s):  
Dr. R. N. Kulkarni ◽  
◽  
C. K. Srinivasa ◽  

Unified Modelling Language (UML) is currently accepted as a defacto standard language for modeling the software in the software industry. It will allow to implement object oriented concepts to model the software system. It provides a complete pictographic representation of software. Broadly these UML diagrams are classified into two groups viz. Structural diagrams and Behavioral diagrams. The sequence diagrams and Activity diagrams belongs to the second group i.e. behavioral diagrams. The sequence diagram represents the sequence of messages flowing from one object to another and activity diagram represents the flow of activities one after the other in a system. In this paper, we are proposing an automated tool which transforms the sequence diagram (which is represented in the table format) into activity diagram. The sequence diagram which is represented in the three column table called sequence table comprises various components of sequence diagram like objects, interactions, messages, alternations, iterations, loops, etc. The proposed tool reads the sequence table and converts the entire table components into the equivalent Activity table. Further the tool reads the activity table and then transforms to its equivalent activity diagram.


Author(s):  
SHARANBASAPPA HONNASHETTY ◽  
MALLAMMA V REDDY ◽  
DR. M. HANUMANTHAPPA

In order to build a natural language processing system first the words are placed into a structured form that leads to a syntactically correct sentence. Syntactic analysis of a sentence is performed by parsing technique. This paper explores the novel approach that how the shift reduce parsing technique is used for translating English sentences into a grammatically correct Kannada sentences by reordering of English parse tree structure, generating and implementing phrase structure grammar(PSG) for kannada sentences. Recursive Descent Parsing technique is used to generate English phrase tree structure and terminal symbols are tagged with Kannada equivalent words then Shift-Reduce Parsing technique is used to construct a Kannada sentence. Part-of-Speech (POS) tagger is used to tag Kannada words to English words. It is implemented by using supervised machine learning approach


Author(s):  
G Deena ◽  
K Raja ◽  
K Kannan

: In this competing world, education has become part of everyday life. The process of imparting the knowledge to the learner through education is the core idea in the Teaching-Learning Process (TLP). An assessment is one way to identify the learner’s weak spot of the area under discussion. An assessment question has higher preferences in judging the learner's skill. In manual preparation, the questions are not assured in excellence and fairness to assess the learner’s cognitive skill. Question generation is the most important part of the teaching-learning process. It is clearly understood that generating the test question is the toughest part. Methods: Proposed an Automatic Question Generation (AQG) system which automatically generates the assessment questions dynamically from the input file. Objective: The Proposed system is to generate the test questions that are mapped with blooms taxonomy to determine the learner’s cognitive level. The cloze type questions are generated using the tag part-of-speech and random function. Rule-based approaches and Natural Language Processing (NLP) techniques are implemented to generate the procedural question of the lowest blooms cognitive levels. Analysis: The outputs are dynamic in nature to create a different set of questions at each execution. Here, input paragraph is selected from computer science domain and their output efficiency are measured using the precision and recall.


2020 ◽  
Vol 11 (1) ◽  
pp. 24
Author(s):  
Jin Tao ◽  
Kelly Brayton ◽  
Shira Broschat

Advances in genome sequencing technology and computing power have brought about the explosive growth of sequenced genomes in public repositories with a concomitant increase in annotation errors. Many protein sequences are annotated using computational analysis rather than experimental verification, leading to inaccuracies in annotation. Confirmation of existing protein annotations is urgently needed before misannotation becomes even more prevalent due to error propagation. In this work we present a novel approach for automatically confirming the existence of manually curated information with experimental evidence of protein annotation. Our ensemble learning method uses a combination of recurrent convolutional neural network, logistic regression, and support vector machine models. Natural language processing in the form of word embeddings is used with journal publication titles retrieved from the UniProtKB database. Importantly, we use recall as our most significant metric to ensure the maximum number of verifications possible; results are reported to a human curator for confirmation. Our ensemble model achieves 91.25% recall, 71.26% accuracy, 65.19% precision, and an F1 score of 76.05% and outperforms the Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT) model with fine-tuning using the same data.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Bo Sun ◽  
Fei Zhang ◽  
Jing Li ◽  
Yicheng Yang ◽  
Xiaolin Diao ◽  
...  

Abstract Background With the development and application of medical information system, semantic interoperability is essential for accurate and advanced health-related computing and electronic health record (EHR) information sharing. The openEHR approach can improve semantic interoperability. One key improvement of openEHR is that it allows for the use of existing archetypes. The crucial problem is how to improve the precision and resolve ambiguity in the archetype retrieval. Method Based on the query expansion technology and Word2Vec model in Nature Language Processing (NLP), we propose to find synonyms as substitutes for original search terms in archetype retrieval. Test sets in different medical professional level are used to verify the feasibility. Result Applying the approach to each original search term (n = 120) in test sets, a total of 69,348 substitutes were constructed. Precision at 5 (P@5) was improved by 0.767, on average. For the best result, the P@5 was up to 0.975. Conclusions We introduce a novel approach that using NLP technology and corpus to find synonyms as substitutes for original search terms. Compared to simply mapping the element contained in openEHR to an external dictionary, this approach could greatly improve precision and resolve ambiguity in retrieval tasks. This is helpful to promote the application of openEHR and advance EHR information sharing.


Sign in / Sign up

Export Citation Format

Share Document