Subtractive Mountain Clustering Algorithm Applied to a Chatbot to Assist Elderly People in Medication Intake

Mapping Intimacies ◽

10.5121/csit.2021.111420 ◽

2021 ◽

Author(s):

Neuza Claro ◽

Paulo A. Salgado ◽

T-P Azevedo Perdicoulis

Keyword(s):

Elderly People ◽

Language Processing ◽

Clustering Algorithm ◽

Package Insert ◽

Limiting Factor ◽

Natural Languages ◽

Medication Intake ◽

Subtractive Cluster ◽

Stored Information ◽

Relevant Word

Errors in medication intake among elderly people are very common. One of the main causes for this is their loss of ability to retain information. The high amount of medicine intake required by the advanced age is another limiting factor. Thence, the design of an interactive aid system, preferably using natural language, to help the older population with medication is in demand. A chatbot based on a subtractive cluster algorithm, included in unsupervised learned models, is the chosen solution since the processing of natural languages is a necessary step in view to construct a chatbot able to answer questions that older people may pose upon themselves concerning a particular drug. In this work, the subtractive mountain clustering algorithm has been adapted to the problem of natural languages processing. This algorithm version allows for the association of a set of words into clusters. After finding the centre of every cluster — the most relevant word, all the others are aggregated according to a defined metric adapted to the language processing realm. All the relevant stored information is processed, as well as the questions, by the algorithm. The correct processing of the text enables the chatbot to produce answers that relate to the posed queries. To validate the method, we use the package insert of a drug as the available information and formulate associated questions.

Download Full-text

Natural Language Processing through the Subtractive Mountain Clustering Algorithm - A Medication Intake Chatbot

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2021.10503 ◽

2021 ◽

Vol 10 (5) ◽

pp. 17-36

Author(s):

Paulo A. Salgado ◽

T-P Azevedo Perdicoulis

Keyword(s):

Natural Language ◽

Language Processing ◽

Clustering Algorithm ◽

Package Insert ◽

Limiting Factor ◽

Natural Languages ◽

Medication Intake ◽

Subtractive Cluster ◽

Stored Information ◽

Relevant Word

In this work, the subtractive mountain clustering algorithm has been adapted to the problem of natural languages processing in view to construct a chatbot that answers questions posed by the user. The implemented algorithm version allosws for the association of a set of words into clusters. After finding the centre of every cluster — the most relevant word, all the others are aggregated according to a defined metric adapted to the language processing realm. All the relevant stored information (necessary to answer the questions) is processed, as well as the questions, by the algorithm. The correct processing of the text enables the chatbot to produce answers that relate to the posed queries. Since we have in view a chatbot to help elder people with medication, to validate the method, we use the package insert of a drug as the available information and formulate associated questions. Errors in medication intake among elderly people are very common. One of the main causes for this is their loss of ability to retain information. The high amount of medicine intake required by the advanced age is another limiting factor. Thence, the design of an interactive aid system, preferably using natural language, to help the older population with medication is in demand. A chatbot based on a subtractive cluster algorithm is the chosen solution.

Download Full-text

Similarity detection of English text and teaching evaluation based on improved TCUSS clustering algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189576 ◽

2020 ◽

pp. 1-11

Author(s):

Yu Wang

Keyword(s):

Language Processing ◽

Calculation Method ◽

Clustering Algorithm ◽

Research Result ◽

Structural Features ◽

English Text ◽

Support Vector ◽

Similarity Calculation ◽

Other Information ◽

Calculation Task

The semantic similarity calculation task of English text has important influence on other fields of natural language processing and has high research value and application prospect. At present, research on the similarity calculation of short texts has achieved good results, but the research result on long text sets is still poor. This paper proposes a similarity calculation method that combines planar features with structured features and uses support vector regression models. Moreover, this paper uses PST and PDT to represent the syntax, semantics and other information of the text. In addition, through the two structural features suitable for text similarity calculation, this paper proposes a similarity calculation method combining structural features with Tree-LSTM model. Experiments show that this method provides a new idea for interest network extraction.

Download Full-text

Formalising Natural Languages: Applications to Natural Language Processing and Digital Humanities

10.1007/978-3-030-70629-6 ◽

2021 ◽

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Digital Humanities ◽

Natural Languages

Download Full-text

Natural Language Processing by Enhanced Honey Encryption Technique

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l1048.10812s19 ◽

2019 ◽

Vol 8 (12S) ◽

pp. 159-163

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Cyber Attacks ◽

Binary Form ◽

Brute Force ◽

Natural Languages ◽

Cipher Text ◽

The Right ◽

Binary Strings

Traditional encryption systems and techniques have always been vulnerable to brute force cyber-attacks. This is due to bytes encoding of characters utf8 also known as ASCII characters. Therefore, an opponent who intercepts a cipher text and attempts to decrypt the signal by applying brute force with a faulty pass key can detect some of the decrypted signals by employing a mixture of symbols that are not uniformly dispersed and contain no meaningful significance. Honey encoding technique is suggested to curb this classical authentication weakness by developing cipher-texts that provide correct and evenly dispersed but untrue plaintexts after decryption with a false key. This technique is only suitable for passkeys and PINs. Its adjustment in order to promote the encoding of the texts of natural languages such as electronic mails, records generated by man, still remained an open-end drawback. Prevailing proposed schemes to expand the encryption of natural language messages schedule exposes fragments of the plaintext embedded with coded data, thus they are more prone to cipher text attacks. In this paper, amending honey encoded system is proposed to promote natural language message encryption. The main aim was to create a framework that would encrypt a signal fully in binary form. As an end result, most binary strings semantically generate the right texts to trick an opponent who tries to decipher an error key in the cipher text. The security of the suggested system is assessed..

Download Full-text

Ontological Organization and Bioinformatic Analysis of Adverse Drug Reactions From Package Inserts: Development and Usability Study

Journal of Medical Internet Research ◽

10.2196/20443 ◽

2020 ◽

Vol 22 (7) ◽

pp. e20443

Author(s):

Xiaoying Li ◽

Xin Lin ◽

Huiling Ren ◽

Jinjing Guo

Keyword(s):

Language Processing ◽

Adverse Reactions ◽

Package Insert ◽

Bioinformatic Analysis ◽

Two Dimensions ◽

Semantic Relations ◽

Semantic Retrieval ◽

Formal Representation ◽

Package Inserts ◽

Drug Databases

Background Licensed drugs may cause unexpected adverse reactions in patients, resulting in morbidity, risk of mortality, therapy disruptions, and prolonged hospital stays. Officially approved drug package inserts list the adverse reactions identified from randomized controlled clinical trials with high evidence levels and worldwide postmarketing surveillance. Formal representation of the adverse drug reaction (ADR) enclosed in semistructured package inserts will enable deep recognition of side effects and rational drug use, substantially reduce morbidity, and decrease societal costs. Objective This paper aims to present an ontological organization of traceable ADR information extracted from licensed package inserts. In addition, it will provide machine-understandable knowledge for bioinformatics analysis, semantic retrieval, and intelligent clinical applications. Methods Based on the essential content of package inserts, a generic ADR ontology model is proposed from two dimensions (and nine subdimensions), covering the ADR information and medication instructions. This is followed by a customized natural language processing method programmed with Python to retrieve the relevant information enclosed in package inserts. After the biocuration and identification of retrieved data from the package insert, an ADR ontology is automatically built for further bioinformatic analysis. Results We collected 165 package inserts of quinolone drugs from the National Medical Products Administration and other drug databases in China, and built a specialized ADR ontology containing 2879 classes and 15,711 semantic relations. For each quinolone drug, the reported ADR information and medication instructions have been logically represented and formally organized in an ADR ontology. To demonstrate its usage, the source data were further bioinformatically analyzed. For example, the number of drug-ADR triples and major ADRs associated with each active ingredient were recorded. The 10 ADRs most frequently observed among quinolones were identified and categorized based on the 18 categories defined in the proposal. The occurrence frequency, severity, and ADR mitigation method explicitly stated in package inserts were also analyzed, as well as the top 5 specific populations with contraindications for quinolone drugs. Conclusions Ontological representation and organization using officially approved information from drug package inserts enables the identification and bioinformatic analysis of adverse reactions caused by a specific drug with regard to predefined ADR ontology classes and semantic relations. The resulting ontology-based ADR knowledge source classifies drug-specific adverse reactions, and supports a better understanding of ADRs and safer prescription of medications.

Download Full-text

SCIENTIFIC NAMED ENTITY RECOGNITION WITH THE HELP OF MODERN METHODS

Bulletin Series of Physics & Mathematical Sciences ◽

10.51889/2021-3.1728-7901.11 ◽

2021 ◽

Vol 75 (3) ◽

pp. 94-99

Author(s):

A.M. Yelenov ◽

◽

A.B. Jaxylykova ◽

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Support Vector ◽

Scientific Article ◽

Natural Languages ◽

Named Entity ◽

Learning Area

This research focuses on a comparative study of the Named Entity Recognition task for scientific article texts. Natural language processing could be considered as one of the cornerstones in the machine learning area which devotes its attention to the problems connected with the understanding of different natural languages and linguistic analysis. It was already shown that current deep learning techniques have a good performance and accuracy in such areas as image recognition, pattern recognition, computer vision, that could mean that such technology probably would be successful in the neuro-linguistic programming area too and lead to a dramatic increase on the research interest on this topic. For a very long time, quite trivial algorithms have been used in this area, such as support vector machines or various types of regression, basic encoding on text data was also used, which did not provide high results. The following dataset was used to process the experiment models: Dataset Scientific Entity Relation Core. The algorithms used were Long short-term memory, Random Forest Classifier with Conditional Random Fields, and Named-entity recognition with Bidirectional Encoder Representations from Transformers. In the findings, the metrics scores of all models were compared to each other to make a comparison. This research is devoted to the processing of scientific articles, concerning the machine learning area, because the subject is not investigated on enough properly level.The consideration of this task can help machines to understand natural languages better, so that they can solve other neuro-linguistic programming tasks better, enhancing scores in common sense.

Download Full-text

SEMCL

International Journal of Knowledge and Systems Science ◽

10.4018/jkss.2010070101 ◽

2010 ◽

Vol 1 (3) ◽

pp. 1-19 ◽

Cited By ~ 2

Author(s):

Weisen Guo ◽

Steven B. Kraines

Keyword(s):

Semantic Web ◽

Natural Language ◽

Knowledge Sharing ◽

Language Processing ◽

Global Knowledge ◽

Semantic Web Technologies ◽

Natural Languages ◽

Web Technologies ◽

Language Knowledge ◽

Cross Language

To promote global knowledge sharing, one should solve the problem that knowledge representation in diverse natural languages restricts knowledge sharing effectively. Traditional knowledge sharing models are based on natural language processing (NLP) technologies. The ambiguity of natural language is a problem for NLP; however, semantic web technologies can circumvent the problem by enabling human authors to specify meaning in a computer-interpretable form. In this paper, the authors propose a cross-language semantic model (SEMCL) for knowledge sharing, which uses semantic web technologies to provide a potential solution to the problem of ambiguity. Also, this model can match knowledge descriptions in diverse languages. First, the methods used to support searches at the semantic predicate level are given, and the authors present a cross-language approach. Finally, an implementation of the model for the general engineering domain is discussed, and a scenario describing how the model implementation handles semantic cross-language knowledge sharing is given.

Download Full-text

Visual Sensemaking of Massive Crowdsourced Data for Design Ideation

Proceedings of the Design Society: International Conference on Engineering Design ◽

10.1017/dsi.2019.44 ◽

2019 ◽

Vol 1 (1) ◽

pp. 409-418

Author(s):

Yuejun He ◽

Bradley Camburn ◽

Jianxi Luo ◽

Maria C. Yang ◽

Kristin L. Wood

Keyword(s):

Language Processing ◽

Future Research ◽

Natural Languages ◽

Concept Space ◽

Word Clouds ◽

Crowdsourced Data ◽

New Ideas ◽

Massive Number ◽

Future Research Directions ◽

Rich Information

AbstractTextual idea data from online crowdsourcing contains rich information of the concepts that underlie the original ideas and can be recombined to generate new ideas. But representing such information in a way that can stimulate new ideas is not a trivial task, because crowdsourced data are often vast and in unstructured natural languages. This paper introduces a method that uses natural language processing to summarize a massive number of idea descriptions and represents the underlying concept space as word clouds with a core-periphery structure to inspire recombinations of such concepts into new ideas. We report the use of this method in a real public-sector-sponsored project to explore ideas for future transportation system design. Word clouds that represent the concept space underlying original crowdsourced ideas are used as ideation aids and stimulate many new ideas with varied novelty, usefulness and feasibility. The new ideas suggest that the proposed method helps expand the idea space. Our analysis of these ideas and a survey with the designers who generated them shed light on how people perceive and use the word clouds as ideation aids and suggest future research directions.

Download Full-text

Extractive Based Single Document Text Summarization Using Clustering Approach

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v3.i2.pp73-78 ◽

2014 ◽

Vol 3 (2) ◽

pp. 73

Author(s):

Pankaj Kailas Bhole ◽

A. J. Agrawal

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Computational Intelligence ◽

Clustering Algorithm ◽

Text Summarization ◽

Large Set ◽

Meaningful Information ◽

Time Reading ◽

Clustering Approach

Text summarization is an old challenge in text mining but in dire need of researcher’s attention in the areas of computational intelligence, machine learning and natural language processing. We extract a set of features from each sentence that helps identify its importance in the document. Every time reading full text is time consuming. Clustering approach is useful to decide which type of data present in document. In this paper we introduce the concept of k-mean clustering for natural language processing of text for word matching and in order to extract meaningful information from large set of offline documents, data mining document clustering algorithm are adopted.

Download Full-text

A clustering framework for lexical normalization of Roman Urdu

Natural Language Engineering ◽

10.1017/s1351324920000285 ◽

2020 ◽

pp. 1-31

Author(s):

Abdul Rafae Khan ◽

Asim Karim ◽

Hassan Sajjad ◽

Faisal Kamiran ◽

Jia Xu

Keyword(s):

Language Processing ◽

Clustering Algorithm ◽

String Matching ◽

Feature Learning ◽

Similarity Function ◽

Feature Based ◽

Real World Datasets ◽

Roman Script ◽

Maximum Similarity ◽

Textual Content

Abstract Roman Urdu is an informal form of the Urdu language written in Roman script, which is widely used in South Asia for online textual content. It lacks standard spelling and hence poses several normalization challenges during automatic language processing. In this article, we present a feature-based clustering framework for the lexical normalization of Roman Urdu corpora, which includes a phonetic algorithm UrduPhone, a string matching component, a feature-based similarity function, and a clustering algorithm Lex-Var. UrduPhone encodes Roman Urdu strings to their pronunciation-based representations. The string matching component handles character-level variations that occur when writing Urdu using Roman script. The similarity function incorporates various phonetic-based, string-based, and contextual features of words. The Lex-Var algorithm is a variant of the k-medoids clustering algorithm that groups lexical variations of words. It contains a similarity threshold to balance the number of clusters and their maximum similarity. The framework allows feature learning and optimization in addition to the use of predefined features and weights. We evaluate our framework extensively on four real-world datasets and show an F-measure gain of up to 15% from baseline methods. We also demonstrate the superiority of UrduPhone and Lex-Var in comparison to respective alternate algorithms in our clustering framework for the lexical normalization of Roman Urdu.

Download Full-text