A Comparative Analysis of Machine Learning Techniques for Spam Detection

Internet provides a medium to connect with individuals of similar or different interests creating a hub. Since a huge hub participates on these platforms, the user can receive a high volume of messages from different individuals creating a chaos and unwanted messages. These messages sometimes contain a true information and sometimes false, which leads to a state of confusion in the minds of the users and leads to first step towards spam messaging. Spam messages means an irrelevant and unsolicited message sent by a known/unknown user which may lead to a sense of insecurity among users. In this paper, the different machine learning algorithms were trained and tested with natural language processing (NLP) to classify whether the messages are spam or ham.

Download Full-text

Grammatical categories determination for Turkish and Kazakh languages based on machine learning algorithms and fulfilling dictionaries of link grammar parser

Eastern-European Journal of Enterprise Technologies ◽

10.15587/1729-4061.2021.238743 ◽

2021 ◽

Vol 5 (2 (113)) ◽

pp. 55-65

Author(s):

Aigerim Yerimbetova ◽

Madina Tussupova ◽

Madina Sambetbayeva ◽

Mussa Turdalyuly ◽

Bakzhan Sakenov

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Parts Of Speech ◽

Grammatical Categories ◽

Learning Techniques

This research is aimed at identifying the parts of speech for the Kazakh and Turkish languages in an information retrieval system. The proposed algorithms are based on machine learning techniques. In this paper, we consider the binary classification of words according to parts of speech. We decided to take the most popular machine learning algorithms. In this paper, the following approaches and well-known machine learning algorithms are studied and considered. We defined 7 dictionaries and tagged 135 million words in Kazakh and 9 dictionaries and 50 million words in the Turkish language. The main problem considered in the paper is to create algorithms for the execution of dictionaries of the so-called Link Grammar Parser (LGP) system, in particular for the Kazakh and Turkish languages, using machine learning techniques. The focus of the research is on the review and comparison of machine learning algorithms and methods that have accomplished results on various natural language processing tasks such as grammatical categories determination. For the operation of the LGP system, a dictionary is created in which a connector for each word is indicated – the type of connection that can be created using this word. The authors considered methods of filling in LGP dictionaries using machine learning. The complexities of natural language processing, however, do not exclude the possibility of identifying narrower tasks that can already be solved algorithmically: for example, determining parts of speech or splitting texts into logical groups. However, some features of natural languages significantly reduce the effectiveness of these solutions. Thus, taking into account all word forms for each word in the Kazakh and Turkish languages increases the complexity of text processing by an order of magnitude

Download Full-text

Machine Learning Techniques for Biomedical Natural Language Processing: A comprehensive Review

IEEE Access ◽

10.1109/access.2021.3119621 ◽

2021 ◽

pp. 1-1

Author(s):

Essam H. Houssein ◽

Rehab E. Mohamed ◽

Abdelmgeid A. Ali

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Learning Techniques ◽

Comprehensive Review ◽

Learning Techniques

Download Full-text

Combining Machine Learning Techniques and Natural Language Processing to Infer Emotions Using Spanish Twitter Corpus

Communications in Computer and Information Science - Highlights on Practical Applications of Agents and Multi-Agent Systems ◽

10.1007/978-3-642-38061-7_15 ◽

2013 ◽

pp. 149-157 ◽

Cited By ~ 5

Author(s):

Gonzalo Blázquez Gil ◽

Antonio Berlanga de Jesús ◽

José M. Molina Lopéz

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Advanced Machine Learning Techniques in Natural Language Processing for Indian Languages

Smart Techniques for a Smarter Planet - Studies in Fuzziness and Soft Computing ◽

10.1007/978-3-030-03131-2_7 ◽

2019 ◽

pp. 117-144 ◽

Cited By ~ 1

Author(s):

Vaishali Gupta ◽

Nisheeth Joshi ◽

Iti Mathur

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Learning Techniques ◽

Indian Languages ◽

Learning Techniques

Download Full-text

Deep Learning Approaches for Textual Sentiment Analysis

Handbook of Research on Emerging Trends and Applications of Machine Learning - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9643-1.ch009 ◽

2020 ◽

pp. 171-182 ◽

Cited By ~ 1

Author(s):

Tamanna Sharma ◽

Anu Bajaj ◽

Om Prakash Sangwan

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Machine Learning Techniques ◽

Computational Technique ◽

Learning Techniques

Sentiment analysis is computational measurement of attitude, opinions, and emotions (like positive/negative) with the help of text mining and natural language processing of words and phrases. Incorporation of machine learning techniques with natural language processing helps in analysing and predicting the sentiments in more precise manner. But sometimes, machine learning techniques are incapable in predicting sentiments due to unavailability of labelled data. To overcome this problem, an advanced computational technique called deep learning comes into play. This chapter highlights latest studies regarding use of deep learning techniques like convolutional neural network, recurrent neural network, etc. in sentiment analysis.

Download Full-text

Content Analysis of Extracted Suicide Texts From Social Media Networks by Using Natural Language Processing and Machine Learning Techniques

2021 IEEE International Conference on Smart Information Systems and Technologies (SIST) ◽

10.1109/sist50301.2021.9466001 ◽

2021 ◽

Author(s):

Bakhtiyor Meraliyev ◽

Kurmangazy Kongratbayev ◽

Nazerke Sultanova

Keyword(s):

Machine Learning ◽

Social Media ◽

Content Analysis ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Learning Techniques ◽

Social Media Networks ◽

Learning Techniques

Download Full-text

Learning adaptive representations for entity recognition in the biomedical domain

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00238-0 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ivano Lauriola ◽

Fabio Aiolli ◽

Alberto Lavelli ◽

Fabio Rinaldi

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Machine Learning Algorithms ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Hybrid Architecture ◽

Biomedical Domain ◽

Word Embeddings

Abstract Background Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task. Results This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F1 score. Conclusions Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.

Download Full-text