species2vec: A novel method for species representation

2010 ◽

Cited By ~ 1

Author(s):

Iraj Mantegh ◽

Nazanin S. Darbandi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Programming Languages ◽

Language Processing ◽

New Method ◽

Robot Programming ◽

Task Planning ◽

End User ◽

Knowledge Based ◽

Manufacturing Applications

Robotic alternative to many manual operations falls short in application due to the difficulties in capturing the manual skill of an expert operator. One of the main problems to be solved if robots are to become flexible enough for various manufacturing needs is that of end-user programming. An end-user with little or no technical expertise in robotics area needs to be able to efficiently communicate its manufacturing task to the robot. This paper proposes a new method for robot task planning using some concepts of Artificial Intelligence. Our method is based on a hierarchical knowledge representation and propositional logic, which allows an expert user to incrementally integrate process and geometric parameters with the robot commands. The objective is to provide an intelligent and programmable agent such as a robot with a knowledge base about the attributes of human behaviors in order to facilitate the commanding process. The focus of this work is on robot programming for manufacturing applications. Industrial manipulators work with low level programming languages. This work presents a new method based on Natural Language Processing (NLP) that allows a user to generate robot programs using natural language lexicon and task information. This will enable a manufacturing operator (for example for painting) who may be unfamiliar with robot programming to easily employ the agent for the manufacturing tasks.

Download Full-text

Improving Brill's tagger lexical and transformation rule for Afaan Oromo language

10.7287/peerj.preprints.1225v1 ◽

2015 ◽

Author(s):

Abraham G Ayana

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Transformation Rule ◽

Initial State ◽

Training Corpus ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging

Natural Language Processing (NLP) refers to Human-like language processing which reveals that it is a discipline within the field of Artificial Intelligence (AI). However, the ultimate goal of research on Natural Language Processing is to parse and understand language, which is not fully achieved yet. For this reason, much research in NLP has focused on intermediate tasks that make sense of some of the structure inherent in language without requiring complete understanding. One such task is part-of-speech tagging, or simply tagging. Lack of standard part of speech tagger for Afaan Oromo will be the main obstacle for researchers in the area of machine translation, spell checkers, dictionary compilation and automatic sentence parsing and constructions. Even though several works have been done in POS tagging for Afaan Oromo, the performance of the tagger is not sufficiently improved yet. Hence,the aim of this thesis is to improve Brill’s tagger lexical and transformation rule for Afaan Oromo POS tagging with sufficiently large training corpus. Accordingly, Afaan Oromo literatures on grammar and morphology are reviewed to understand nature of the language and also to identify possible tagsets. As a result, 26 broad tagsets were identified and 17,473 words from around 1100 sentences containing 6750 distinct words were tagged for training and testing purpose. From which 258 sentences are taken from the previous work. Since there is only a few ready made standard corpuses, the manual tagging process to prepare corpus for this work was challenging and hence, it is recommended that a standard corpus is prepared. Transformation-based Error driven learning are adapted for Afaan Oromo part of speech tagging. Different experiments are conducted for the rule based approach taking 20% of the whole data for testing. A comparison with the previously adapted Brill’s Tagger made. The previously adapted Brill’s Tagger shows an accuracy of 80.08% whereas the improved Brill’s Tagger result shows an accuracy of 95.6% which has an improvement of 15.52%. Hence, it is found that the size of the training corpus, the rule generating system in the lexical rule learner, and moreover, using Afaan Oromo HMM tagger as initial state tagger have a significant effect on the improvement of the tagger.

Download Full-text

Multi-Sense Embeddings per Word

10.31219/osf.io/udfhn ◽

2020 ◽

Author(s):

Masashi Sugiyama

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) ◽

10.1109/isvlsi.2019.00033 ◽

2019 ◽

Author(s):

Mohammed Alawad ◽

Georgia Tourassi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Embeddings ◽

Computationally Efficient ◽

Efficient Learning

Download Full-text

A New Method to Identify Short-Text Authors Using Combinations of Machine Learning and Natural Language Processing Techniques

Procedia Computer Science ◽

10.1016/j.procs.2019.09.197 ◽

2019 ◽

Vol 159 ◽

pp. 428-436

Author(s):

Biveeken Vijayakumar ◽

Muhammad Marwan Muhammad Fuad

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

New Method ◽

Short Text ◽

Processing Techniques

Download Full-text

Learning adaptive representations for entity recognition in the biomedical domain

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00238-0 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ivano Lauriola ◽

Fabio Aiolli ◽

Alberto Lavelli ◽

Fabio Rinaldi

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Learning Algorithms ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Hybrid Architecture ◽

Biomedical Domain ◽

Word Embeddings

Abstract Background Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task. Results This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F1 score. Conclusions Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.

Download Full-text

Evaluation of Dimensionality Reduction and Truncation Techniques for Word Embeddings

10.5753/eniac.2018.4477 ◽

2018 ◽

Author(s):

Paulo Henrique Calado Aoun ◽

Andre C. A. Nascimento ◽

Adenilton J. Da Silva

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Dimensionality Reduction ◽

Mobile Devices ◽

Language Processing ◽

Word Embeddings ◽

Reduction Strategies ◽

Truncation Techniques

The use of word embeddings is becoming very common in many Natural Language Processing tasks. Most of the time, these require computacional resources that can not be found in most part of the current mobile devices. In this work, we evaluate a combination of numeric truncation and dimensionality reduction strategies in order to obtain smaller vectorial representations without substancial losses in performance.

Download Full-text

Estimation of the number of clusters on d-dimensional sphere

Artificial Intelligence Research ◽

10.5430/air.v10n1p57 ◽

2021 ◽

Vol 10 (1) ◽

pp. 57

Author(s):

Kazuhisa Fujita

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

New Method ◽

Dimensional Sphere ◽

Number Of Clusters ◽

Spherical Data ◽

Model Based ◽

Von Mises

Spherical data is distributed on the sphere. The data appears in various fields such as meteorology, biology, and natural language processing. However, a method for analysis of spherical data does not develop enough yet. One of the important issues is an estimation of the number of clusters in spherical data. To address the issue, I propose a new method called the Spherical X-means (SX-means) that can estimate the number of clusters on d-dimensional sphere. The SX-means is the model-based method assuming that the data is generated from a mixture of von Mises-Fisher distributions. The present paper explains the proposed method and shows its performance of estimation of the number of clusters.

Download Full-text

NMT Multi-Sense Embeddings per Word

10.31219/osf.io/k623t ◽

2019 ◽

Author(s):

William Jin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Domain specific word embeddings for natural language processing in radiology

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2020.103665 ◽

2021 ◽

Vol 113 ◽

pp. 103665

Author(s):

Timothy L. Chen ◽

Max Emerling ◽

Gunvant R. Chaudhari ◽

Yeshwant R. Chillakuru ◽

Youngho Seo ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Word Embeddings ◽

Domain Specific

Download Full-text

species2vec: A novel method for species representation

Knowledge-Based Task Planning Using Natural Language Processing for Robotic Manufacturing

Improving Brill's tagger lexical and transformation rule for Afaan Oromo language

Multi-Sense Embeddings per Word

Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing

A New Method to Identify Short-Text Authors Using Combinations of Machine Learning and Natural Language Processing Techniques

Learning adaptive representations for entity recognition in the biomedical domain

Evaluation of Dimensionality Reduction and Truncation Techniques for Word Embeddings

Estimation of the number of clusters on d-dimensional sphere

NMT Multi-Sense Embeddings per Word

Domain specific word embeddings for natural language processing in radiology

Export Citation Format