scholarly journals Textual analysis or natural language parsing? A software engineering perspective

Author(s):  
Sebastiano Panichella

The problem of designing effective methodology to summarize, and analyze the amount of textual information produced by developers remains particularly challenging especially when the goal is to help developers in making better development/maintenance decisions. Moreover, contrasting results might be obtained depending on the communication channel being mined and the technique adopted for its analysis. In our work we investigate the usage of Natural Language Parsing (NLP) and Textual Analysis (TA) techniques to automatically classify development content. Results of our study highlight the superiority of NLP techniques over the traditional TA techniques when used to analyze the textual data produced in software development. We also show the benefits of NLP when used to enhance software engineering recommenders.

2015 ◽  
Author(s):  
Sebastiano Panichella

The problem of designing effective methodology to summarize, and analyze the amount of textual information produced by developers remains particularly challenging especially when the goal is to help developers in making better development/maintenance decisions. Moreover, contrasting results might be obtained depending on the communication channel being mined and the technique adopted for its analysis. In our work we investigate the usage of Natural Language Parsing (NLP) and Textual Analysis (TA) techniques to automatically classify development content. Results of our study highlight the superiority of NLP techniques over the traditional TA techniques when used to analyze the textual data produced in software development. We also show the benefits of NLP when used to enhance software engineering recommenders.


1988 ◽  
Vol 27 (02) ◽  
pp. 67-72 ◽  
Author(s):  
W. Dorda ◽  
B. Haidl ◽  
P. Sachs

SummaryMany clinical data are in natural language form (diagnoses, therapies, etc.). There is great interest in making these data retrievable to form samples of patients for scientific investigations (statistical analyses, courses of diseases, etc.). To perform this task, “medical natural language data” have to be prepared and stored in a retrieval-oriented database. In this paper, the advantages of processing textual data are shown in contrast to coding. Accordingly, in our system WAREL medical thesauri (like ICD 9 or SNOMED) are not used for codification; they are taken as a knowledge base during the retrieval and for testing the quality of the data during documentation. The fundamental methods (computerized textual analysis and different algorithms for comparing texts) are explained in detail, and their realization within the system WAREL is illustrated (WAREL stands for Wiener Allgemeines Relationenschema).


2017 ◽  
Vol 01 (01) ◽  
pp. 1630012
Author(s):  
Taehyung Wang ◽  
Astushi Kitazawa ◽  
Phillip Sheu

One of the most challenging task in software development is developing software requirements. There are two types of software requirements — user requirement (mostly described by natural language) and system requirements (also called as system specifications and described by formal or semi-formal methods). Therefore, there is a gap between these two types of requirements because of inherently unique features between natural language and formal or semi-formal methods. We describe a semantic software engineering methodology using the design principles of SemanticObjects for object-relational software development with an example. We also survey other semantic approaches and methods for software and Web application development.


2021 ◽  
Vol 11 (6) ◽  
pp. 2663
Author(s):  
Zhengru Shen ◽  
Marco Spruit

The summary of product characteristics from the European Medicines Agency is a reference document on medicines in the EU. It contains textual information for clinical experts on how to safely use medicines, including adverse drug reactions. Using natural language processing (NLP) techniques to automatically extract adverse drug reactions from such unstructured textual information helps clinical experts to effectively and efficiently use them in daily practices. Such techniques have been developed for Structured Product Labels from the Food and Drug Administration (FDA), but there is no research focusing on extracting from the Summary of Product Characteristics. In this work, we built a natural language processing pipeline that automatically scrapes the summary of product characteristics online and then extracts adverse drug reactions from them. Besides, we have made the method and its output publicly available so that it can be reused and further evaluated in clinical practices. In total, we extracted 32,797 common adverse drug reactions for 647 common medicines scraped from the Electronic Medicines Compendium. A manual review of 37 commonly used medicines has indicated a good performance, with a recall and precision of 0.99 and 0.934, respectively.


2021 ◽  
Vol 20 (8) ◽  
pp. 1574-1594
Author(s):  
Aleksandr R. NEVREDINOV

Subject. When evaluating enterprises, maximum accuracy and comprehensiveness of analysis are important, although the use of various indicators of organization’s financial condition and external factors provide a sufficiently high accuracy of forecasting. Many researchers are increasingly focusing on the natural language processing to analyze various text sources. This subject is extremely relevant against the needs of companies to quickly and extensively analyze their activities. Objectives. The study aims at exploring the natural language processing methods and sources of textual information about companies that can be used in the analysis, and developing an approach to the analysis of textual information. Methods. The study draws on methods of analysis and synthesis, systematization, formalization, comparative analysis, theoretical and methodological provisions contained in domestic and foreign scientific works on text analysis, including for purposes of company evaluation. Results. I offer and test an approach to using non-numeric indicators for company analysis. The paper presents a unique model, which is created on the basis of existing developments that have shown their effectiveness. I also substantiate the use of this approach to analyze a company’s condition and to include the analysis results in models for overall assessment of the state of companies. Conclusions. The findings improve scientific and practical understanding of techniques for the analysis of companies, the ways of applying text analysis, using machine learning. They can be used to support management decision-making to automate the analysis of their own and other companies in the market, with which they interact.


Author(s):  
Hsun-Ping Hsieh ◽  
JiaWei Jiang ◽  
Tzu-Hsin Yang ◽  
Renfen Hu

The success of mediation is affected by many factors, such as the context of the quarrel, personality of both parties, and the negotiation skill of the mediator, which lead to uncertainty for the predicting work. This paper takes a different approach from previous legal prediction research. It analyzes and predicts whether two parties in a dispute can reach an agreement peacefully through the conciliation of mediation. With the inference result, we can know if the mediation is a more practical and time-saving method to solve the dispute. Existing works about legal case prediction mostly focus on prosecution or criminal cases. In this work, we propose a LSTM-based framework, called LSTMEnsembler, to predict mediation results by assembling multiple classifiers. Among these classifiers, some are powerful for modeling the numerical and categorical features of case information, e.g., XGBoost and LightGBM; and, some are effective for dealing with textual data, e.g., TextCNN and BERT. The proposed LSTMEnsembler aims to not only combine the effectiveness of different classifiers intelligently, but also capture temporal dependencies from previous cases to boost the performance of mediation prediction. Our experimental results show that our proposed LSTMEnsembler can achieve 85.6% for F-measure on real-world mediation data.


Sign in / Sign up

Export Citation Format

Share Document