Extraction of Construction Quality Requirements from Textual Specifications via Natural Language Processing

Author(s):  
JungHo Jeon ◽  
Xin Xu ◽  
Yuxi Zhang ◽  
Liu Yang ◽  
Hubo Cai

Construction inspection is an essential component of the quality assurance programs of state transportation agencies (STAs), and the guidelines for this process reside in lengthy textual specifications. In the current practice, engineers and inspectors must manually go through these documents to plan, conduct, and document their inspections, which is time-consuming, very subjective, inconsistent, and prone to error. A promising alternative to this manual process is the application of natural language processing (NLP) techniques (e.g., text parsing, sentence classification, and syntactic analysis) to automatically extract construction inspection requirements from textual documents and present them as straightforward check questions. This paper introduces an NLP-based method that: 1) extracts individual sentences from the construction specification; 2) preprocesses the resulting sentences; 3) applies Word2Vec and GloVe algorithms to extract vector features; 4) uses a convolutional neural network (CNN) and recurrent neural network to classify sentences; and 5) converts the requirement sentences into check questions via syntactic analysis. The overall methodology was assessed using the Indiana Department of Transportation (DOT) specification as a test case. Our results revealed that the CNN + GloVe combination led to the highest accuracy, at 91.9%, and the lowest loss, at 11.7%. To further validate its use across STAs nationwide, we applied it to the construction specification of the South Carolina DOT as a test case, and our average accuracy was 92.6%.

2022 ◽  
Vol 2022 ◽  
pp. 1-8
Author(s):  
YiTao Zhou

As one of the core tasks in the field of natural language processing, syntactic analysis has always been a hot topic for researchers, including tasks such as Questions and Answer (Q&A), Search String Comprehension, Semantic Analysis, and Knowledge Base Construction. This paper aims to study the application of deep learning and neural network in natural language syntax analysis, which has significant research and application value. This paper first studies a transfer-based dependent syntax analyzer using a feed-forward neural network as a classifier. By analyzing the model, we have made meticulous parameters of the model to improve its performance. This paper proposes a dependent syntactic analysis model based on a long-term memory neural network. This model is based on the feed-forward neural network model described above and will be used as a feature extractor. After the feature extractor is pretrained, we use a long short-term memory neural network as a classifier of the transfer action, and the characteristics extracted by the syntactic analyzer as its input to train a recursive neural network classifier optimized by sentences. The classifier can not only classify the current pattern feature but also multirich information such as analysis of state history. Therefore, the model is modeled in the analysis process of the entire sentence in syntactic analysis, replacing the method of modeling independent analysis. The experimental results show that the model has achieved greater performance improvement than baseline methods.


2020 ◽  
Author(s):  
Vadim V. Korolev ◽  
Artem Mitrofanov ◽  
Kirill Karpov ◽  
Valery Tkachenko

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.


2018 ◽  
pp. 35-38
Author(s):  
O. Hyryn

The article deals with natural language processing, namely that of an English sentence. The article describes the problems, which might arise during the process and which are connected with graphic, semantic, and syntactic ambiguity. The article provides the description of how the problems had been solved before the automatic syntactic analysis was applied and the way, such analysis methods could be helpful in developing new analysis algorithms. The analysis focuses on the issues, blocking the basis for the natural language processing — parsing — the process of sentence analysis according to their structure, content and meaning, which aims to analyze the grammatical structure of the sentence, the division of sentences into constituent components and defining links between them.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Fridah Katushemererwe ◽  
Andrew Caines ◽  
Paula Buttery

AbstractThis paper describes an endeavour to build natural language processing (NLP) tools for Runyakitara, a group of four closely related Bantu languages spoken in western Uganda. In contrast with major world languages such as English, for which corpora are comparatively abundant and NLP tools are well developed, computational linguistic resources for Runyakitara are in short supply. First therefore, we need to collect corpora for these languages, before we can proceed to the design of a spell-checker, grammar-checker and applications for computer-assisted language learning (CALL). We explain how we are collecting primary data for a new Runya Corpus of speech and writing, we outline the design of a morphological analyser, and discuss how we can use these new resources to build NLP tools. We are initially working with Runyankore–Rukiga, a closely-related pair of Runyakitara languages, and we frame our project in the context of NLP for low-resource languages, as well as CALL for the preservation of endangered languages. We put our project forward as a test case for the revitalization of endangered languages through education and technology.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Siyuan Zhao ◽  
Zhiwei Xu ◽  
Limin Liu ◽  
Mengjie Guo ◽  
Jing Yun

Convolutional neural network (CNN) has revolutionized the field of natural language processing, which is considerably efficient at semantics analysis that underlies difficult natural language processing problems in a variety of domains. The deceptive opinion detection is an important application of the existing CNN models. The detection mechanism based on CNN models has better self-adaptability and can effectively identify all kinds of deceptive opinions. Online opinions are quite short, varying in their types and content. In order to effectively identify deceptive opinions, we need to comprehensively study the characteristics of deceptive opinions and explore novel characteristics besides the textual semantics and emotional polarity that have been widely used in text analysis. In this paper, we optimize the convolutional neural network model by embedding the word order characteristics in its convolution layer and pooling layer, which makes convolutional neural network more suitable for short text classification and deceptive opinions detection. The TensorFlow-based experiments demonstrate that the proposed detection mechanism achieves more accurate deceptive opinion detection results.


2017 ◽  
Vol 56 (05) ◽  
pp. 377-389 ◽  
Author(s):  
Xingyu Zhang ◽  
Joyce Kim ◽  
Rachel E. Patzer ◽  
Stephen R. Pitts ◽  
Aaron Patzer ◽  
...  

SummaryObjective: To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements.Methods: Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient’s reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model.Results: Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.7310.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN.Conclusions: The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient’s reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.


2021 ◽  
Author(s):  
Carolinne Roque e Faria ◽  
Cinthyan Renata Sachs Camerlengo de Barb

Technology is becoming expressively popular among agribusiness producers and is progressing in all agricultural area. One of the difficulties in this context is to handle data in natural language to solve problems in the field of agriculture. In order to build up dialogs and provide rich researchers, the present work uses Natural Language Processing (NLP) techniques to develop an automatic and effective computer system to interact with the user and assist in the identification of pests and diseases in the soybean farming, stored in a database repository to provide accurate diagnoses to simplify the work of the agricultural professional and also for those who deal with a lot of information in this area. Information on 108 pests and 19 diseases that damage Brazilian soybean was collected from Brazilian bibliographic manuals with the purpose to optimize the data and improve production, using the spaCy library for syntactic analysis of NLP, which allowed the pre-process the texts, recognize the named entities, calculate the similarity between the words, verify dependency parsing and also provided the support for the development requirements of the CAROLINA tool (Robotized Agronomic Conversation in Natural Language) using the language belonging to the agricultural area.


10.2196/23230 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23230
Author(s):  
Pei-Fu Chen ◽  
Ssu-Ming Wang ◽  
Wei-Chih Liao ◽  
Lu-Cheng Kuo ◽  
Kuan-Chih Chen ◽  
...  

Background The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. Objective This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders.


Sign in / Sign up

Export Citation Format

Share Document