AUTOMATED FUNCTIONAL ANALYSIS OF PATENTS FOR PRODUCING DESIGN INSIGHT

AbstractPatent analysis is a popular topic of research. However, designers do not engage with patents in the early design stage, as patents are time-consuming to read and understand due to their intricate structure and the legal terminologies used. Manually produced graphical representations of patent working principles for improving designers’ awareness of prior art have been demonstrated in previous research. In this paper, an automated approach is presented, utilising Natural Language Processing (NLP) techniques to identify the invention working principle from the patent independent claims and produce a visualisation. The outcomes of this automated approach are compared with previous manually produced examples. The results indicate over 40% match between the automatic and manual approach, which is a good basis for further development. The comparison suggests that the automated approach works well for features and relationships that are expressed explicitly and consistently but begin to lose accuracy when applied to complex sentences. The comparison also suggests that the accuracy of the proposed automated approach can be improved by using a trained part-of-speech (POS) tagger, improved parsing grammar and an ontology.

Download Full-text

Part-of-Speech Tagging via Deep Neural Networks for Northern-Ethiopic Languages

Information Technology And Control ◽

10.5755/j01.itc.49.4.26808 ◽

2020 ◽

Vol 49 (4) ◽

pp. 482-494

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Senait Gebremichael Tesfagergish

Keyword(s):

Neural Network ◽

Neural Networks ◽

Language Processing ◽

Deep Neural Networks ◽

Short Term Memory ◽

Parameter Tuning ◽

Feed Forward Neural Network ◽

Pos Tagging ◽

Part Of Speech ◽

Pos Tagger

Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to 92% that is 65% above the random baseline.

Download Full-text

Aspect-Based Sentiment Analysis of Online Product Reviews

Advances in Business Information Systems and Analytics - Handbook of Research on Advanced Data Mining Techniques and Applications for Business Intelligence ◽

10.4018/978-1-5225-2031-3.ch010 ◽

2017 ◽

pp. 175-191 ◽

Cited By ~ 1

Author(s):

Vinod Kumar Mishra ◽

Himanshu Tiruwa

Keyword(s):

Sentiment Analysis ◽

Computational Linguistics ◽

Language Processing ◽

Future Trend ◽

Product Reviews ◽

Customer Reviews ◽

Part Of Speech ◽

Lexical Approach ◽

Sentiment Score ◽

Pos Tagger

Sentiment analysis is a part of computational linguistics concerned with extracting sentiment and emotion from text. It is also considered as a task of natural language processing and data mining. Sentiment analysis mainly concentrate on identifying whether a given text is subjective or objective and if it is subjective, then whether it is negative, positive or neutral. This chapter provide an overview of aspect based sentiment analysis with current and future trend of research on aspect based sentiment analysis. This chapter also provide a aspect based sentiment analysis of online customer reviews of Nokia 6600. To perform aspect based classification we are using lexical approach on eclipse platform which classify the review as a positive, negative or neutral on the basis of features of product. The Sentiwordnet is used as a lexical resource to calculate the overall sentiment score of each sentence, pos tagger is used for part of speech tagging, frequency based method is used for extraction of the aspects/features and used negation handling for improving the accuracy of the system.

Download Full-text

Developing Core Technologies for Resource-Scarce Nguni Languages

Information ◽

10.3390/info12120520 ◽

2021 ◽

Vol 12 (12) ◽

pp. 520

Author(s):

Jakobus S. du Toit ◽

Martin J. Puttkammer

Keyword(s):

South African ◽

Language Processing ◽

Rule Based ◽

Local Resource ◽

African Languages ◽

Linguistic Resources ◽

Part Of Speech ◽

Parallel Data ◽

Further Development

The creation of linguistic resources is crucial to the continued growth of research and development efforts in the field of natural language processing, especially for resource-scarce languages. In this paper, we describe the curation and annotation of corpora and the development of multiple linguistic technologies for four official South African languages, namely isiNdebele, Siswati, isiXhosa, and isiZulu. Development efforts included sourcing parallel data for these languages and annotating each on token, orthographic, morphological, and morphosyntactic levels. These sets were in turn used to create and evaluate three core technologies, viz. a lemmatizer, part-of-speech tagger, morphological analyzer for each of the languages. We report on the quality of these technologies which improve on previously developed rule-based technologies as part of a similar initiative in 2013. These resources are made publicly accessible through a local resource agency with the intention of fostering further development of both resources and technologies that may benefit the NLP industry in South Africa.

Download Full-text

Development of Part of Speech Tagger for Assamese Using HMM

International Journal of Synthetic Emotions ◽

10.4018/ijse.2018010102 ◽

2018 ◽

Vol 9 (1) ◽

pp. 23-32

Author(s):

Surjya Kanta Daimary ◽

Vishal Goyal ◽

Madhumita Barbora ◽

Umrinderpal Singh

Keyword(s):

South Asian ◽

Language Processing ◽

Word Order ◽

Good Accuracy ◽

Stochastic Approach ◽

Point Of View ◽

Part Of Speech ◽

Pos Tagger ◽

Asian Languages ◽

Free Word

This article presents the work on the Part-of-Speech Tagger for Assamese based on Hidden Markov Model (HMM). Over the years, a lot of language processing tasks have been done for Western and South-Asian languages. However, very little work is done for Assamese language. So, with this point of view, the POS Tagger for Assamese using Stochastic Approach is being developed. Assamese is a free word-order, highly agglutinate and morphological rich language, thus developing POS Tagger with good accuracy will help in development of other NLP task for Assamese. For this work, an annotated corpus of 271,890 words with a BIS tagset consisting of 38 tag labels is used. The model is trained on 256,690 words and the remaining words are used in testing. The system obtained an accuracy of 89.21% and it is being compared with other existing stochastic models.

Download Full-text

Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3488381 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-24

Author(s):

Sunita Warjri ◽

Partha Pakray ◽

Saralin A. Lyngdoh ◽

Arnab Kumar Maji

Keyword(s):

Deep Learning ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Research Work ◽

Pos Tagging ◽

Part Of Speech ◽

Corpus Size ◽

Increase In Accuracy ◽

Pos Tagger

Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a good performance of the tagger. Our main contribution in this research work is the designed Khasi POS corpus. Till date, there has been no form of any kind of Khasi corpus developed or formally developed. In the present designed Khasi POS corpus, each word is tagged manually using the designed tagset. Methods of deep learning have been used to experiment with our designed Khasi POS corpus. The POS tagger based on BiLSTM, combinations of BiLSTM with CRF, and character-based embedding with BiLSTM are presented. The main challenges of understanding and handling Natural Language toward Computational linguistics to encounter are anticipated. In the presently designed corpus, we have tried to solve the problems of ambiguities of words concerning their context usage, and also the orthography problems that arise in the designed POS corpus. The designed Khasi corpus size is around 96,100 tokens and consists of 6,616 distinct words. Initially, while running the first few sets of data of around 41,000 tokens in our experiment the taggers are found to yield considerably accurate results. When the Khasi corpus size has been increased to 96,100 tokens, we see an increase in accuracy rate and the analyses are more pertinent. As results, accuracy of 96.81% is achieved for the BiLSTM method, 96.98% for BiLSTM with CRF technique, and 95.86% for character-based with LSTM. Concerning substantial research from the NLP perspectives for Khasi, we also present some of the recently existing POS taggers and other NLP works on the Khasi language for comparative purposes.

Download Full-text

Development of Part of Speech Tagger for Assamese Using HMM

Natural Language Processing ◽

10.4018/978-1-7998-0951-7.ch054 ◽

2020 ◽

pp. 1139-1148

Author(s):

Surjya Kanta Daimary ◽

Vishal Goyal ◽

Madhumita Barbora ◽

Umrinderpal Singh

Keyword(s):

Language Processing ◽

Stochastic Models ◽

Hidden Markov ◽

Stochastic Approach ◽

Point Of View ◽

Part Of Speech ◽

Pos Tagger ◽

Asian Languages ◽

Free Word ◽

Assamese Language

Download Full-text

Improving accuracy of Part-of-Speech (POS) tagging using hidden markov model and morphological analysis for Myanmar Language

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i2.pp2023-2030 ◽

2020 ◽

Vol 10 (2) ◽

pp. 2023

Author(s):

Dim Lam Cing ◽

Khin Mar Soe

Keyword(s):

Language Processing ◽

High Performance ◽

Markov Models ◽

Hidden Markov ◽

Morphological Structure ◽

Word Segmentation ◽

Pos Tagging ◽

Part Of Speech ◽

Improving Accuracy ◽

Pos Tagger

In Natural Language Processing (NLP), Word segmentation and Part-of-Speech (POS) tagging are fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). But, as the Myanmar language's complex morphological structure, the OOV problem still exists. To keep away from error and improve segmentation by utilizing POS data, segmentation and labeling should be possible at the same time.The main goal of developing POS tagger for any Language is to improve accuracy of tagging and remove ambiguity in sentences due to language structure. This paper focuses on developing word segmentation and Part-of- Speech (POS) Tagger for Myanmar Language. This paper presented the comparison of separate word segmentation and POS tagging with joint word segmentation and POS tagging.

Download Full-text

Development of an online tense and aspect identifier for English

CALL for widening participation: short papers from EUROCALL 2020 ◽

10.14705/rpnet.2020.48.1161 ◽

2020 ◽

pp. 36-41

Author(s):

John Blake

Keyword(s):

User Interface ◽

Language Processing ◽

Teaching Approaches ◽

Online Tool ◽

Tense And Aspect ◽

Auxiliary Verbs ◽

Complex Sentences ◽

Part Of Speech ◽

Web App ◽

Positive Results

This article describes the development of a tense and aspect identifier, an online tool designed to help learners of English by harnessing a natural language processing pipeline to automatically classify verb groups into one of 12 grammatical tenses. Currently, there is no website or application that can automatically identify tense in context, and the tense and aspect identifier fills that niche. Learners can use the tool to see how grammatical tenses are used in context. Finite verb groups are automatically identified, and the relevant words in the verb group are highlighted and colorized according to the tense identified. The latest deployed system can identify tenses in simple, compound, and complex sentences. False positive results occur when there is ellipsis of auxiliary verbs or when the tagger assigns the incorrect part-of-speech tag. The user interface of the tense identifier is a web app created using the Flask framework and deployed from the Heroku platform. The tool can be used for inductive and deductive teaching approaches, or even to check for tense consistency in a thesis.

Download Full-text

Punjabi Pos Tagger: Rule Based and HMM

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0106 ◽

2017 ◽

Vol 7 (7) ◽

pp. 193

Author(s):

Umrinderpal Singh ◽

Vishal Goyal

Keyword(s):

Information Retrieval ◽

Language Processing ◽

State Of The Art ◽

Input Word ◽

Rule Based ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Unseen Data ◽

Pos Tagger ◽

Speech Tagging

The Part of Speech tagger system is used to assign a tag to every input word in a given sentence. The tags may include different part of speech tag for a particular language like noun, pronoun, verb, adjective, conjunction etc. and may have subcategories of all these tags. Part of Speech tagging is a basic and a preprocessing task of most of the Natural Language Processing (NLP) applications such as Information Retrieval, Machine Translation, and Grammar Checking etc. The task belongs to a larger set of problems, namely, sequence labeling problems. Part of Speech tagging for Punjabi is not widely explored territory. We have discussed Rule Based and HMM based Part of Speech tagger for Punjabi along with the comparison of their accuracies of both approaches. The System is developed using 35 different standard part of speech tag. We evaluate our system on unseen data with state-of-the-art accuracy 93.3%.

Download Full-text

Semantic Relation Classification via Bidirectional LSTM Networks with Entity-Aware Attention Using Latent Entity Typing

Symmetry ◽

10.3390/sym11060785 ◽

2019 ◽

Vol 11 (6) ◽

pp. 785 ◽

Cited By ~ 13

Author(s):

Joohong Lee ◽

Sangwoo Seo ◽

Yong Suk Choi

Keyword(s):

Language Processing ◽

State Of The Art ◽

Neural Model ◽

Named Entity ◽

Part Of Speech ◽

Classification Tasks ◽

Relation Classification ◽

Pos Tagger ◽

High Level ◽

Dependency Parser

Classifying semantic relations between entity pairs in sentences is an important task in natural language processing (NLP). Most previous models applied to relation classification rely on high-level lexical and syntactic features obtained by NLP tools such as WordNet, the dependency parser, part-of-speech (POS) tagger, and named entity recognizers (NER). In addition, state-of-the-art neural models based on attention mechanisms do not fully utilize information related to the entity, which may be the most crucial feature for relation classification. To address these issues, we propose a novel end-to-end recurrent neural model that incorporates an entity-aware attention mechanism with a latent entity typing (LET) method. Our model not only effectively utilizes entities and their latent types as features, but also builds word representations by applying self-attention based on symmetrical similarity of a sentence itself. Moreover, the model is interpretable by visualizing applied attention mechanisms. Experimental results obtained with the SemEval-2010 Task 8 dataset, which is one of the most popular relation classification tasks, demonstrate that our model outperforms existing state-of-the-art models without any high-level features.

Download Full-text