Practical Programming for NLP

Author(s):  
Patrick Jeuniaux ◽  
Andrew Olney ◽  
Sidney D’Mello

This chapter is aimed at students and researchers who are eager to learn about practical programmatic solutions to natural language processing (NLP) problems. In addition to introducing the readers to programming basics, programming tools, and complete programs, we also hope to pique their interest to actively explore the broad and fascinating field of automatic natural language processing. Part I introduces programming basics and the Python programming language. Part II takes a step by step approach in illustrating the development of a program to solve a NLP problem. Part III provides some hints to help readers initiate their own NLP programming projects.

2021 ◽  
Vol 1 (193) ◽  
pp. 371-376
Author(s):  
Nataliia Lazebna ◽  
◽  

The dynamic nature of the Python programming language, the accumulation of a certain linguosemiotic basis indicates the similarity of this language with the English language, which is the international one and mediates human communication in both real and virtual worlds. In this study, the English language is positioned as the linguistic basis of Python language of programming, which is widely used in industry, research, natural language processing, textual information retrieval, textual data processing, texts corpora, and more. English language, its lexical features, text representation and interaction with logical and functional basis in the context of Python programming language are considered further in this research. Thus, the unity of verbal units and symbols in the modern English-language digital discourse indicates both the order and variability of the constituents therein. The functionality of linguosemiotic elements produces a network of relationships, where each of these integrated elements can produce from a word or symbol a holistic set of units, which are extrapolated in the English-language digital discourse and mediates human communication with a machine. An overview of the basic properties of Python language, such as values, types, expressions, and operations are in focus of the study. Though users understand the responses of Python interpreter, there is a need to follow certain instructions and codes. To facilitate work with this programming language and prescribed English-language commands, it is necessary to involve linguists to cooperate with programmers to invent a certain logical and reasonable principle of Python commands operation.


Online business has opened up several avenues for researchers and computer scientists to initiate new research models. The business activities that the customers accomplish certainly produce abundant information /data. Analysis of the data/information will obviously produce useful inferences and many declarations. These inferences may support the system in improving the quality of service, understand the current market requirement, Trend of the business, future need of the society and so on. In this connection the current paper is trying to propose a feature extraction technique named as Business Sentiment Quotient (BSQ). BSQ involves word2vec[1] word embedding technique from Natural Language Processing. Number of tweets related to business are accessed from twitter and processed to estimate BSQ using python programming language. BSQ may be utilized for further Machine Learning Activities.


2021 ◽  
pp. 56-62
Author(s):  
V. G. Ssmolnyakov

The article discusses the metodology for solving the task No. 8 of the Unified State Exam in informatics and ICT in two ways: by mathematical combinatorial calculation and writing a program in the Python programming language. The purpose of this methodology is the successful completion of task No. 8 (until 2021 — No. 10) in the Unified State Exam in informatics and ICT by graduates. The article is of an interdisciplinary nature, touches upon issues at the intersection of mathematics and informatics. The relevance of the work is due to the fact that tasks of this type are annually present in the Unified State Exam in informatics and ICT, but the success of this task is too low for tasks of the basic level of complexity. The use of programming tools in the Unified State Exam in informatics and ICT is available starting in 2021. The scientific novelty of the work lies in the use of the Python programming language to solve tasks of this type. The peculiarity of the metodology lies in the gradual increase in the complexity of the algorithms and the "modular" application of parts of the code, which allows using the "modules" of previous tasks to solve subsequent ones. Specific versions of the programs are proposed, a comparative analysis of methods for various prototypes of the corresponding tasks is given. As a result, it was determined that task No. 8 can be effectively solved by the programming method.


Author(s):  
PASCUAL JULIÁN-IRANZO ◽  
FERNANDO SÁENZ-PÉREZ

Abstarct This paper introduces techniques to integrate WordNet into a Fuzzy Logic Programming system. Since WordNet relates words but does not give graded information on the relation between them, we have implemented standard similarity measures and new directives allowing the proximity equations linking two words to be generated with an approximation degree. Proximity equations are the key syntactic structures which, in addition to a weak unification algorithm, make a flexible query-answering process possible in this kind of programming language. This addition widens the scope of Fuzzy Logic Programming, allowing certain forms of lexical reasoning, and reinforcing Natural Language Processing (NLP) applications.


2021 ◽  
Vol 6 (1) ◽  
pp. 77-85
Author(s):  
Bohdan Tsebryk ◽  
◽  
Alexey Botchkaryov

The problem of developing a software service with a plug-in architecture for assessing the readability of text has been considered. The problem of text readability assessment has been analyzed. Approaches to the development of a software service for text readability assessment have been considered. The structure of the service for text readability assessment has been proposed. The structure of the service has been implemented using the Python programming language and the library Natural Language Toolkit (NLTK). The results of testing the service for text readability assessment have been presented.


2016 ◽  
Vol 20 (2) ◽  
Author(s):  
Grigori Sidorov ◽  
Martín Ibarra Romero ◽  
Ilia Markov ◽  
Rafael Guzman Cabrera ◽  
Liliana Chanona-Hernández ◽  
...  

2021 ◽  
Vol 28 (1) ◽  
Author(s):  
C.I. Ejiofor ◽  
L.C. Ochei

Spam mail has indeed become a global dilemma due to its coevolutionary nature. It has resulted in the loss of organizational resources, possibly financial cost incurred as well as time spent in addressing spam related issues. This has pushed organizations and researchers to the pinnacle of research with the aim of identifying needed solutions. This research paper explores the rich capabilities of Convolutional Neural Network (CNN) for predicting spam mail taking cognizant natural language capabilities. Spam mail prediction was simulated using a simulator built utilizing python programming language to capture the fundamentals of CNN. The CNN training was actualized using 10 epochs. The 1st epoch offers a training time of 4mins, 39s with a loss of 1.7578, accuracy of 0.3508, value loss of 1.2130 and value accuracy 0f 0.5719 while the 10th epoch presents a training time of 4mins, 6s with a loss of 0.5896, accuracy of 0.7936, value loss of 0.8941 and value accuracy of 0.6986.


2019 ◽  
Vol 29 (1) ◽  
pp. 1388-1407 ◽  
Author(s):  
Ayad Tareq Imam ◽  
Ayman Jameel Alnsour

Abstract Although current computer-aided software engineering tools support developers in composing a program, there is no doubt that more flexible supportive tools are needed to address the increases in the complexity of programs. This need can be met by automating the intellectual activities that are carried out by humans when composing a program. This paper aims to automate the composition of a programming language code from pseudocode, which is viewed here as a translation process for a natural language text, as pseudocode is a formatted text in natural English language. Based on this view, a new automatic code generator is developed that can convert pseudocode to C# programming language code. This new automatic code generator (ACG), which is called CodeComposer, uses natural language processing (NLP) techniques such as verb classification, thematic roles, and semantic role labeling (SRL) to analyze the pseudocode. The resulting analysis of linguistic information from these techniques is used by a semantic rule-based mapping machine to perform the composition process. CodeComposer can be viewed as an intelligent computer-aided software engineering (I_CASE) tool. An evaluation of the accuracy of CodeComposer using a binomial technique shows that it has a precision of 88%, a recall of 91%, and an F-measure of 89%.


Sign in / Sign up

Export Citation Format

Share Document