New approach for Arabic named entity recognition on social media based on feature selection using genetic algorithm

Brahim Ait Benali; Soukaina Mihi; Ismail El Bazi; Nabil Laachfoubi

doi:10.11591/ijece.v11i2.pp1485-1497

New approach for Arabic named entity recognition on social media based on feature selection using genetic algorithm

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i2.pp1485-1497 ◽

2021 ◽

Vol 11 (2) ◽

pp. 1485

Author(s):

Brahim Ait Benali ◽

Soukaina Mihi ◽

Ismail El Bazi ◽

Nabil Laachfoubi

Keyword(s):

Genetic Algorithm ◽

Social Media ◽

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Support Vector ◽

Text Data ◽

Impact Performance ◽

Named Entity ◽

Feature Selection Approach

Many features can be extracted from the massive volume of data in different types that are available nowadays on social media. The growing demand for multimedia applications was an essential factor in this regard, particularly in the case of text data. Often, using the full feature set for each of these activities can be time-consuming and can also negatively impact performance. It is challenging to find a subset of features that are useful for a given task due to a large number of features. In this paper, we employed a feature selection approach using the genetic algorithm to identify the optimized feature set. Afterward, the best combination of the optimal feature set is used to identify and classify the Arabic named entities (NEs) based on support vector. Experimental results show that our system reaches a state-of-the-art performance of the Arab NER on social media and significantly outperforms the previous systems.

Download Full-text

Arabic Named Entity Recognition on Social Media based on feature selection techniques usi ng SVM-RFE

2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS) ◽

10.1109/icds50568.2020.9268762 ◽

2020 ◽

Author(s):

Brahim AIT BEN ALI ◽

Soukaina MIHI ◽

Ismail EL BAZI ◽

Nabil LAACHFOUBI

Keyword(s):

Social Media ◽

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Feature Selection Techniques

Download Full-text

Optimizing Genetic Algorithm in Feature Selection for Named Entity Recognition

Proceedings of the Sixth International Symposium on Information and Communication Technology - SoICT 2015 ◽

10.1145/2833258.2833262 ◽

2015 ◽

Cited By ~ 1

Author(s):

Huong Thanh Le ◽

Luan Van Tran ◽

Xuan Hoai Nguyen ◽

Thi Hien Nguyen

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Selection For

Download Full-text

Hybrid Feature Selection Approach for Arabic Named Entity Recognition

Computational Linguistics and Intelligent Text Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-75477-2_32 ◽

2018 ◽

pp. 452-464

Author(s):

Miran Shahine ◽

Mohamed Sakre

Keyword(s):

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

Named Entity Recognition for Code Mixed Social Media Sentences

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2021040102 ◽

2021 ◽

Vol 13 (2) ◽

pp. 23-36

Author(s):

Yashvardhan Sharma ◽

Rupal Bhargava ◽

Bapiraju Vamsi Tadikonda

Keyword(s):

Social Media ◽

Language Processing ◽

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Support Vector ◽

Internet Applications ◽

Named Entity ◽

Code Mixing

With the increase of internet applications and social media platforms there has been an increase in the informal way of text communication. People belonging to different regions tend to mix their regional language with English on social media text. This has been the trend with many multilingual nations now and is commonly known as code mixing. In code mixing, multiple languages are used within a statement. The problem of named entity recognition (NER) is a well-researched topic in natural language processing (NLP), but the present NER systems tend to perform inefficiently on code-mixed text. This paper proposes three approaches to improve named entity recognizers for handling code-mixing. The first approach is based on machine learning techniques such as support vector machines and other tree-based classifiers. The second approach is based on neural networks and the third approach uses long short-term memory (LSTM) architecture to solve the problem.

Download Full-text

Automatic feature selection for named entity recognition using genetic algorithm

Proceedings of the Fourth Symposium on Information and Communication Technology - SoICT '13 ◽

10.1145/2542050.2542056 ◽

2013 ◽

Cited By ~ 5

Author(s):

Huong Thanh Le ◽

Luan Van Tran

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Selection For ◽

Automatic Feature Selection

Download Full-text

A Kernel-Based Approach for Biomedical Named Entity Recognition

The Scientific World JOURNAL ◽

10.1155/2013/950796 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 8

Author(s):

Rakesh Patra ◽

Sujan Kumar Saha

Keyword(s):

Kernel Function ◽

Text Processing ◽

Named Entity Recognition ◽

Kernel Functions ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Named Entity ◽

Tree Kernel

Support vector machine (SVM) is one of the popular machine learning techniques used in various text processing tasks including named entity recognition (NER). The performance of the SVM classifier largely depends on the appropriateness of the kernel function. In the last few years a number of task-specific kernel functions have been proposed and used in various text processing tasks, for example, string kernel, graph kernel, tree kernel and so on. So far very few efforts have been devoted to the development of NER task specific kernel. In the literature we found that the tree kernel has been used in NER task only for entity boundary detection or reannotation. The conventional tree kernel is unable to execute the complete NER task on its own. In this paper we have proposed a kernel function, motivated by the tree kernel, which is able to perform the complete NER task. To examine the effectiveness of the proposed kernel, we have applied the kernel function on the openly available JNLPBA 2004 data. Our kernel executes the complete NER task and achieves reasonable accuracy.

Download Full-text

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

BioMed Research International ◽

10.1155/2016/4248026 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Abbas Akkasi ◽

Ekrem Varoğlu ◽

Nazife Dimililer

Keyword(s):

Conditional Random Fields ◽

Named Entity Recognition ◽

Classification Performance ◽

Entity Recognition ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

Rule Based ◽

Named Entity ◽

Vector Machines

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.

Download Full-text

Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2018.2856625 ◽

2018 ◽

Vol 26 (11) ◽

pp. 2142-2152 ◽

Cited By ~ 8

Author(s):

Jingjing Xu ◽

Hangfeng He ◽

Xu Sun ◽

Xuancheng Ren ◽

Sujian Li

Keyword(s):

Social Media ◽

Named Entity Recognition ◽

Unified Model ◽

Entity Recognition ◽

Named Entity ◽

Cross Domain ◽

Chinese Social Media

Download Full-text

SCIENTIFIC NAMED ENTITY RECOGNITION WITH THE HELP OF MODERN METHODS

Bulletin Series of Physics & Mathematical Sciences ◽

10.51889/2021-3.1728-7901.11 ◽

2021 ◽

Vol 75 (3) ◽

pp. 94-99

Author(s):

A.M. Yelenov ◽

◽

A.B. Jaxylykova ◽

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Support Vector ◽

Scientific Article ◽

Natural Languages ◽

Named Entity ◽

Learning Area

This research focuses on a comparative study of the Named Entity Recognition task for scientific article texts. Natural language processing could be considered as one of the cornerstones in the machine learning area which devotes its attention to the problems connected with the understanding of different natural languages and linguistic analysis. It was already shown that current deep learning techniques have a good performance and accuracy in such areas as image recognition, pattern recognition, computer vision, that could mean that such technology probably would be successful in the neuro-linguistic programming area too and lead to a dramatic increase on the research interest on this topic. For a very long time, quite trivial algorithms have been used in this area, such as support vector machines or various types of regression, basic encoding on text data was also used, which did not provide high results. The following dataset was used to process the experiment models: Dataset Scientific Entity Relation Core. The algorithms used were Long short-term memory, Random Forest Classifier with Conditional Random Fields, and Named-entity recognition with Bidirectional Encoder Representations from Transformers. In the findings, the metrics scores of all models were compared to each other to make a comparison. This research is devoted to the processing of scientific articles, concerning the machine learning area, because the subject is not investigated on enough properly level.The consideration of this task can help machines to understand natural languages better, so that they can solve other neuro-linguistic programming tasks better, enhancing scores in common sense.

Download Full-text

The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media

Progress in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-319-65340-2_65 ◽

2017 ◽

pp. 803-814

Author(s):

Filipe Batista ◽

Álvaro Figueira

Keyword(s):

Social Media ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Complementary Nature

Download Full-text