Transition based neural network dependency parsing of Tibetan

In order to improve the performance of Tibetan natural language processing applications such as machine translation, sentiment analysis and other tasks, this article proposes a neural network-based method for syntactic analysis of Tibetan language dependence. Part of the corpus of Qinghai Normal University’s part-of-speech tag set is marked by the corresponding mapping relationship is transformed into the corpus annotated by the national standard part-of-speech tag set. At the same time, the CoNLL format Tibetan language dependency syntax tree library is constructed, and the method of shift-reduce plus neural network is adopted to systematically study and analyze the Tibetan language dependency syntax. Thereby improving the quality of Tibetan dependency syntactic analysis, and its accuracy rate reaches UAS:94.59%

Download Full-text

Extraction of Construction Quality Requirements from Textual Specifications via Natural Language Processing

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211001385 ◽

2021 ◽

pp. 036119812110013

Author(s):

JungHo Jeon ◽

Xin Xu ◽

Yuxi Zhang ◽

Liu Yang ◽

Hubo Cai

Keyword(s):

Neural Network ◽

South Carolina ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Syntactic Analysis ◽

Test Case ◽

Promising Alternative ◽

Construction Inspection ◽

Construction Specification

Construction inspection is an essential component of the quality assurance programs of state transportation agencies (STAs), and the guidelines for this process reside in lengthy textual specifications. In the current practice, engineers and inspectors must manually go through these documents to plan, conduct, and document their inspections, which is time-consuming, very subjective, inconsistent, and prone to error. A promising alternative to this manual process is the application of natural language processing (NLP) techniques (e.g., text parsing, sentence classification, and syntactic analysis) to automatically extract construction inspection requirements from textual documents and present them as straightforward check questions. This paper introduces an NLP-based method that: 1) extracts individual sentences from the construction specification; 2) preprocesses the resulting sentences; 3) applies Word2Vec and GloVe algorithms to extract vector features; 4) uses a convolutional neural network (CNN) and recurrent neural network to classify sentences; and 5) converts the requirement sentences into check questions via syntactic analysis. The overall methodology was assessed using the Indiana Department of Transportation (DOT) specification as a test case. Our results revealed that the CNN + GloVe combination led to the highest accuracy, at 91.9%, and the lowest loss, at 11.7%. To further validate its use across STAs nationwide, we applied it to the construction specification of the South Carolina DOT as a test case, and our average accuracy was 92.6%.

Download Full-text

Sistem Deteksi Struktur Kalimat Bahasa Arab Menggunakan Algoritma Light Stemming

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v19i1.509 ◽

2019 ◽

Vol 19 (1) ◽

pp. 109-118

Author(s):

Mudafiq Riyan Pratama ◽

Muhammad Yunus

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Failure Rate ◽

Language Processing ◽

Syntactic Analysis ◽

Human Language ◽

Accuracy Rate

To understand Arabic, it is necessary to study nahwu theory. Nahwu theory is the study of structure of Arabic sentences. By learning nahwu, being able to distinguish subjects, predicates and objects in sentences. One of the fields in the computer that studies about human language processing is NLP (Natural Language Processing), which is natural human language processing through syntactic analysis of sentences structure. One method to analyzing syntactic of sentences is stemming. One variant of the stemming algorithm is Light Stemming algorithm. Light Stemming algorithm is an algorithm that only removes prefix and sufix. Based on the tests conducted, the light stemming algorithm is able to detect nahwu with an accuracy rate of 82.22%. The 17.78% failure rate occurs because words that do not have an affix will be detected automatically the type is fi’il (verb), even though the fact may be that the type is isim (noun), and the failure of the detection results is also due to not being able to stemming the words that have affix in the middle (infix), because indeed the process of light stemming only eliminates the sufix and prefix.

Download Full-text

Natural Language Processing with Improved Deep Learning Neural Networks

Scientific Programming ◽

10.1155/2022/6028693 ◽

2022 ◽

Vol 2022 ◽

pp. 1-8

Author(s):

YiTao Zhou

Keyword(s):

Neural Network ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Syntactic Analysis ◽

Feed Forward Neural Network ◽

Term Memory ◽

Feed Forward ◽

Feature Extractor

As one of the core tasks in the field of natural language processing, syntactic analysis has always been a hot topic for researchers, including tasks such as Questions and Answer (Q&A), Search String Comprehension, Semantic Analysis, and Knowledge Base Construction. This paper aims to study the application of deep learning and neural network in natural language syntax analysis, which has significant research and application value. This paper first studies a transfer-based dependent syntax analyzer using a feed-forward neural network as a classifier. By analyzing the model, we have made meticulous parameters of the model to improve its performance. This paper proposes a dependent syntactic analysis model based on a long-term memory neural network. This model is based on the feed-forward neural network model described above and will be used as a feature extractor. After the feature extractor is pretrained, we use a long short-term memory neural network as a classifier of the transfer action, and the characteristics extracted by the syntactic analyzer as its input to train a recursive neural network classifier optimized by sentences. The classifier can not only classify the current pattern feature but also multirich information such as analysis of state history. Therefore, the model is modeled in the analysis process of the entire sentence in syntactic analysis, replacing the method of modeling independent analysis. The experimental results show that the model has achieved greater performance improvement than baseline methods.

Download Full-text

Arabic Part-of-Speech Tagger, an Approach Based on Neural Network Modelling

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.29.14009 ◽

2018 ◽

Vol 7 (2.29) ◽

pp. 742

Author(s):

Rabab Ali Abumalloh ◽

Hasan Muaidi Al-Serhan ◽

Othman Bin Ibrahim ◽

Waheeb Abu-Ulbeh

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Artificial Neural ◽

Speech Tagging

POS-tagging gained the interest of researchers in computational linguistics sciences in the recent years. Part-of-speech tagging systems assign the proper grammatical tag or morpho-syntactical category labels automatically to every word in the corpus per its appearance on the text. POS-tagging serves as a fundamental and preliminary step in linguistic analysis which can help in developing many natural language processing applications such as: word processing systems, spell checking systems, building dictionaries and in parsing systems. Arabic language gained the interest of researchers which led to increasing demand for Arabic natural language processing systems. Artiﬁcial neural networks has been applied in many applications such as speech recognition and part of speech prediction, but it is considered as a new approach in Part-of-speech tagging. In this research, we developed an Arabic POS-tagger using artificial neural network. A corpus of 20,620 words, which were manually assigned to the appropriate tags was developed and used to train the artificial neural network and to test the part of speech tagger systems’ overall performance. The accuracy of the developed tagger reaches 89.04% using the testing dataset. While, it reaches 98.94% using the training dataset. By combining the two datasets, the accuracy rate for the whole system is 96.96%.

Download Full-text

Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network

Computational Intelligence and Neuroscience ◽

10.1155/2020/8823906 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Shaolin Zhu ◽

Yong Yang ◽

Chun Xu

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Shared Task ◽

Translation Systems

Collecting parallel sentences from nonparallel data is a long-standing natural language processing research problem. In particular, parallel training sentences are very important for the quality of machine translation systems. While many existing methods have shown encouraging results, they cannot learn various alignment weights in parallel sentences. To address this issue, we propose a novel parallel hierarchical attention neural network which encodes monolingual sentences versus bilingual sentences and construct a classifier to extract parallel sentences. In particular, our attention mechanism structure can learn different alignment weights of words in parallel sentences. Experimental results show that our model can obtain state-of-the-art performance on the English-French, English-German, and English-Chinese dataset of BUCC 2017 shared task about parallel sentences’ extraction.

Download Full-text

Fast Neural Network Engine for Natural Science Language Processing: A Drug-Search Case.

10.26434/chemrxiv.12800348 ◽

2020 ◽

Author(s):

Vadim V. Korolev ◽

Artem Mitrofanov ◽

Kirill Karpov ◽

Valery Tkachenko

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Natural Science ◽

Therapeutic Agent ◽

Semantic Relations ◽

Chemical Data ◽

Processing Methods ◽

Modern Natural

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.

Download Full-text

PRINCIPAL PROBLEMS OF NATURAL LANGUAGE PROCESSING SYSTEMS

Studia Philologica ◽

10.28925/2311-2425.2018.11.5 ◽

2018 ◽

pp. 35-38

Author(s):

O. Hyryn

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Syntactic Analysis ◽

Syntactic Ambiguity ◽

Grammatical Structure ◽

English Sentence ◽

Analysis Methods ◽

The Way

The article deals with natural language processing, namely that of an English sentence. The article describes the problems, which might arise during the process and which are connected with graphic, semantic, and syntactic ambiguity. The article provides the description of how the problems had been solved before the automatic syntactic analysis was applied and the way, such analysis methods could be helpful in developing new analysis algorithms. The analysis focuses on the issues, blocking the basis for the natural language processing — parsing — the process of sentence analysis according to their structure, content and meaning, which aims to analyze the grammatical structure of the sentence, the division of sentences into constituent components and defining links between them.

Download Full-text

What's my App?

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/3466826.3466841 ◽

2021 ◽

Vol 48 (4) ◽

pp. 41-44

Author(s):

Dena Markudova ◽

Martino Trevisan ◽

Paolo Garza ◽

Michela Meo ◽

Maurizio M. Munafo ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Real Time ◽

Language Processing ◽

Traffic Management ◽

Quality Of Experience ◽

Broadband Internet ◽

Management Policies ◽

Control Traffic

With the spread of broadband Internet, Real-Time Communication (RTC) platforms have become increasingly popular and have transformed the way people communicate. Thus, it is fundamental that the network adopts traffic management policies that ensure appropriate Quality of Experience to users of RTC applications. A key step for this is the identification of the applications behind RTC traffic, which in turn allows to allocate adequate resources and make decisions based on the specific application's requirements. In this paper, we introduce a machine learning-based system for identifying the traffic of RTC applications. It builds on the domains contacted before starting a call and leverages techniques from Natural Language Processing (NLP) to build meaningful features. Our system works in real-time and is robust to the peculiarities of the RTP implementations of different applications, since it uses only control traffic. Experimental results show that our approach classifies 5 well-known meeting applications with an F1 score of 0.89.

Download Full-text

Towards Accurate Deceptive Opinions Detection Based on Word Order-Preserving CNN

Mathematical Problems in Engineering ◽

10.1155/2018/2410206 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Siyuan Zhao ◽

Zhiwei Xu ◽

Limin Liu ◽

Mengjie Guo ◽

Jing Yun

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Convolutional Neural Network ◽

Language Processing ◽

Word Order ◽

Text Analysis ◽

Important Application ◽

Detection Mechanism ◽

Short Text

Convolutional neural network (CNN) has revolutionized the field of natural language processing, which is considerably efficient at semantics analysis that underlies difficult natural language processing problems in a variety of domains. The deceptive opinion detection is an important application of the existing CNN models. The detection mechanism based on CNN models has better self-adaptability and can effectively identify all kinds of deceptive opinions. Online opinions are quite short, varying in their types and content. In order to effectively identify deceptive opinions, we need to comprehensively study the characteristics of deceptive opinions and explore novel characteristics besides the textual semantics and emotional polarity that have been widely used in text analysis. In this paper, we optimize the convolutional neural network model by embedding the word order characteristics in its convolution layer and pooling layer, which makes convolutional neural network more suitable for short text classification and deceptive opinions detection. The TensorFlow-based experiments demonstrate that the proposed detection mechanism achieves more accurate deceptive opinion detection results.

Download Full-text

Prediction of Emergency Department Hospital Admission Based on Natural Language Processing and Neural Networks

Methods of Information in Medicine ◽

10.3414/me17-01-0024 ◽

2017 ◽

Vol 56 (05) ◽

pp. 377-389 ◽

Cited By ~ 21

Author(s):

Xingyu Zhang ◽

Joyce Kim ◽

Rachel E. Patzer ◽

Stephen R. Pitts ◽

Aaron Patzer ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Emergency Department ◽

Logistic Regression ◽

Natural Language Processing ◽

Natural Language ◽

Hospital Admission ◽

Language Processing ◽

Predictive Accuracy ◽

Free Text

SummaryObjective: To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements.Methods: Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient’s reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model.Results: Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.7310.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN.Conclusions: The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient’s reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.

Download Full-text