VULQARİZMLƏRİN MAŞIN TƏLİMİ ƏSASINDA AŞKARLANMASINA BİR YANAŞMA

Fərqanə Abdullayeva;  ; Sabirə Ocaqverdiyeva;

doi:10.25045/jpit.v12.i2.08

VULQARİZMLƏRİN MAŞIN TƏLİMİ ƏSASINDA AŞKARLANMASINA BİR YANAŞMA

Problems of Information Technology ◽

10.25045/jpit.v12.i2.08 ◽

2021 ◽

Vol 12 (2) ◽

pp. 89-98

Author(s):

Fərqanə Abdullayeva ◽

◽

Sabirə Ocaqverdiyeva ◽

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

N Gram

Məqalədə veb-kontentlərdə vulqarizmlərin maşın təlimi əsasında aşkarlanması üçün bir yanaşma işlənmişdir. Veb-səhifələrdə zərərli məzmun daşıyan kontentlərin sayının artması zərərli məzmundan qorunma məsələsini aktuallaşdırır. İstifadəçilərin, əsasən də uşaq və yeniyetmələrin İnternetdə vulqarizmlərlə (qeyri-etik danışıq, jarqon ifadə, söyüş, təhqir və s.) qarşılaşması onların psixologiyasına öz mənfi təsirini göstərir. Həm onlayn mediada, həm də sosial mediada (Twitter və Facebook və s.) vulqar söz, söz birləşməsi və ifadələrin aşkarlanması üçün daha etibarlı avtomatik mətn aşkarlama metodlarının inkişaf etdirilməsi bu problemin həlli üçün çox böyük əhəmiyyət daşıyır. Təqdim olunan məqalədə N-grams+TF-IDF əlamətlərindən istifadə etməklə vulqarizmlərin aşkarlanması üçün yanaşma təklif edilmişdir. Burada əvvəlcədən məlum olan vulqar sözlərə N-gram+TF-IDF əsaslı əlamətlərin çıxarılması üsulu tətbiq olunaraq ədədi vektorlar generasiya olunmuşdur. Generasiya edilmiş ədədi vektor Naive Bayes alqoritmlərinin girişinə ötürülmüşdür. Müxtəlif əlamətlərdən istifadə etməklə aparılan eksperimentlərin nəticəsində unigram+TF-IDF əlamətləri əsasında klassifikasiya daha üstün nəticələr vermişdir. Vulqarizmlərin aşkarlanması üçün təklif edilən bu yanaşma uşaq və yeniyetmələrin danışıq mədəniyyətinin və insanlarla ünsiyyətinin formalaşmasında əhəmiyyətlidir. Bu yanaşma uşaqların İnternetdən əldə edilən zərərli məzmundan qorunmasında faydalıdır və uşaq təhlükəsizlik mərkəzlərində, təhsil sistemində istifadə edilə bilər.

Download Full-text

ANALISIS SENTIMEN PADA PEMERINTAHAN TERPILIH PADA PILPRES 2019 DITWITTER MENGGUNAKAN ALGORITME NAÏVEBAYES

JURTEKSI ◽

10.33330/jurteksi.v7i1.851 ◽

2020 ◽

Vol 7 (1) ◽

pp. 101-106

Author(s):

Febby Apri Wenando ◽

Regiolina Hayami ◽

Agung Jefrianto Anggrawan

Keyword(s):

Presidential Election ◽

Naive Bayes ◽

Vice President ◽

Naïve Bayes ◽

Weighting Method ◽

The Social ◽

Twitter Account ◽

N Gram ◽

Bayes Algorithm ◽

Modeling Data

Abstract: The Presidential general election on 2019 became one of the most popular topics on twitter nowdays. The society give their opinion about the pair of candidates that they are support through the social media. This research was predicts about the society sentimens toward the candidates of President and Vice President of Republic of Indonesia. The data was used based on the tweet on the @jokowi twitter account. The retrieval of data by using the Tweepy library with the Python 2.7 programming language. This research was classified became of two of society sentiments classes, namely positive and negative. The modeling was used of the weighting method Unigram, Bigram, Trigram, N-Gram (1-2) and N-Gram (1-3) that used the Naïve Bayes Algorithm on the Weka Application. The modeling data was used by the dataset of 646 sentences. The highest results of this reseach were obtained by Unigram Weighting, namely: 81.4% accuracy, 81.5% precision, 81.3% recall with a time of 0.3 s.Keywords: classification, naïve bayes, 2019 presidential election, twitter, unigram Abstrak: Pemilihan Umum tentang Pilpres 2019 menjadi salah satu topik yang ramai diperbincangkan di Twitter. Adu pendapat di sosial media oleh masyarakat mengandung opini terhadap pasangan calon yang didukungnya. Penelitian ini memprediksi sentimen masyarakat kepada pasangan calon Presiden dan Wakil Presiden Republik Indonesia. Data yang digunakan adalah tweet yang ada pada akun Twitter @jokowi. Pengambilan data menggunakan library Tweepy dengan bahasa pemrograman Python 2.7. Penelitian ini mengklasifikasi sentimen masyarakat menjadi 2 kelas, yaitu positif dan negatif. Kemudian dilakukan pemodelan dengan metode pembobotan Unigram, Bigram, Trigram, N-Gram (1-2) Dan N-Gram (1-3) menggunakan Algoritme Naïve Bayes pada Aplikasi Weka. Pembuatan model menggunakan dataset yang berjumlah 646 kalimat. Hasil tertinggi yang diperoleh pada penelitian ini adalah dengan menggunakan Pembobotan Unigram, yaitu : akurasi 81,4%, presisi 81,5 % , recall 81,3 % dengan catatan waktu 0,3s.Kata kunci: klasifikasi, naïve bayes, pilpres 2019, twitter, unigram.

Download Full-text

Pemanfaatan Big Data Media Sosial Dalam Menganalisa Kemenangan Pilkada

Majalah Ilmiah Teknologi Elektro ◽

10.24843/mite.2019.v18i01.p15 ◽

2019 ◽

Vol 18 (1) ◽

pp. 101

Author(s):

Dewa Ayu Putri Wulandari ◽

Made Sudarma ◽

Nyoman Paramaita

Keyword(s):

Big Data ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

N Gram

Pemilihan Calon Gubernur dan Wakil Gubernur Bali 2018 akan melalui beberapa tahapan pemilu mulai dari penentuan bakal calon Gubernur dan Wakil Gubernur Bali hingga tahapan penghitungan suara. Dalam pemilihan Gubernur dan Wakil Gubernur Bali masyarakat dapat terlibat langsung dalam tahapan pemungutan suara yang akan dilaksanakan pada tanggal 27 Juni 2018 (KPU, 2018). Sehingga dapat memunculkan banyak komentar atau pendapat, tidak hanya komentar positif dan netral tapi juga komentar yang negatif. Penelitian ini diharapkan mampu untuk melakukan riset atas komentar masyarakat yang mengandung sentimen baik atau positif, sama sekali tidak mengandung senrimen atau netral dan mengandung sentimen buruk atau negatif. Dalam penelitian ini metode digunakan untuk preprocessingdata menggunakan tokenisasi N-gram. N-gram adalah token yang terdiri dari tiga kata setiap satu token. Pada tahap stemming menggunakan algoritma Nzief Adriani. Untuk proses klasifikasinya menggunakan metode Naïve Bayes Classifier (NBC).Pada pengujian data calon Gubernur akurasi tertinggi diperoleh dari klasifikasi data KBS-Ace pada data yang diambil dari Twitter dengan nilai akurasi 89%, presisi 91% dan recall 94% dan akurasi terendah pada saat proses kalsifikasi data KBS-Ace pada media sosial Facebook. Kata Kunci—Analisa Sentimen, Calon Gubernur Bali 2018, Naive Bayes Classifier

Download Full-text

Classification of Javanese Language Level on Articles Using Multinomial Naive Bayes and N-Gram Methods

Journal of Physics Conference Series ◽

10.1088/1742-6596/1306/1/012049 ◽

2019 ◽

Vol 1306 ◽

pp. 012049

Author(s):

A P Ardhana ◽

D E Cahyani ◽

Winarno

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Language Level ◽

N Gram

Download Full-text

Classification of Text Documents based on Naive Bayes using N-Gram Features

2018 International Conference on Artificial Intelligence and Data Processing (IDAP) ◽

10.1109/idap.2018.8620853 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mehmet BAYGIN

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Text Documents ◽

N Gram

Download Full-text

Pengaruh N-Gram terhadap Klasifikasi Buku menggunakan Ekstraksi dan Seleksi Fitur pada Multinomial Naïve Bayes

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i1.2672 ◽

2021 ◽

Vol 5 (1) ◽

pp. 264

Author(s):

Esti Mulyani ◽

Fachrul Pralienka Bani Muhamad ◽

Kurnia Adi Cahyanto

Keyword(s):

Naive Bayes ◽

Automatic Classification ◽

Naïve Bayes ◽

Main Task ◽

Test Results ◽

Book Title ◽

Feature Extraction And Selection ◽

N Gram ◽

Bayes Algorithm

Libraries have the main task in the processing of library materials by classifying books according to certain ways. Dewey Decimal Classification (DDC) is the method most commonly used in the world to determine book classification (labeling) in libraries. The advantages of this DDC method are universal and more systematic. However, this method is less efficient considering the large number of books that must be classified in a library, as well as labeling that must follow label updates on the DDC. An automatic classification system will be the perfect solution to this problem. Automatic classification can be done by applying the text mining method. In this study, searching for words in the book title was carried out with N-Gram (Unigram, Bigram, Trigram) as a feature generation. The features that have been raised are then selected for features. The process of book title classification is carried out using the Naïve Bayes Multinomial algorithm. This study examines the effect of Unigram, Bigram, Trigram on the classification of book titles using the feature extraction and selection feature on Multinomial Naïve Bayes algorithm. The test results show Unigram has the highest accuracy value of 74.4%.

Download Full-text

THE IMPLEMENTATION OF THE MACHINE LEARNING ALGORITHM FOR THE SENTIMENT ANALYSIS OF INDONESIA’S 2019 PRESIDENTIAL ELECTION

IIUM Engineering Journal ◽

10.31436/iiumej.v22i1.1532 ◽

2021 ◽

Vol 22 (1) ◽

pp. 78-92

Author(s):

GA Buntoro ◽

R Arifin ◽

GN Syaifuddiin ◽

A Selamat ◽

O Krejcar ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Presidential Election ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Machine Learning Algorithm ◽

Presidential Candidates ◽

N Gram

In 2019, citizens of Indonesia participated in the democratic process of electing a new president, vice president, and various legislative candidates for the country. The 2019 Indonesian presidential election was very tense in terms of the candidates' campaigns in cyberspace, especially on social media sites such as Facebook, Twitter, Instagram, Google+, Tumblr, LinkedIn, etc. The Indonesian people used social media platforms to express their positive, neutral, and also negative opinions on the respective presidential candidates. The campaigning of respective social media users on their choice of candidates for regents, governors, and legislative positions up to presidential candidates was conducted via the Internet and online media. Therefore, the aim of this paper is to conduct sentiment analysis on the candidates in the 2019 Indonesia presidential election based on Twitter datasets. The study used datasets on the opinions expressed by the Indonesian people available on Twitter with the hashtags (#) containing "Jokowi and Prabowo." We conducted data pre-processing using a selection of comments, data cleansing, text parsing, sentence normalization and tokenization based on the given text in the Indonesian language, determination of class attributes, and, finally, we classified the Twitter posts with the hashtags (#) using Naïve Bayes Classifier (NBC) and a Support Vector Machine (SVM) to achieve an optimal and maximum optimization accuracy. The study provides benefits in terms of helping the community to research opinions on Twitter that contain positive, neutral, or negative sentiments. Sentiment Analysis on the candidates in the 2019 Indonesian presidential election on Twitter using non-conventional processes resulted in cost, time, and effort savings. This research proved that the combination of the SVM machine learning algorithm and alphabetic tokenization produced the highest accuracy value of 79.02%. While the lowest accuracy value in this study was obtained with a combination of the NBC machine learning algorithm and N-gram tokenization with an accuracy value of 44.94%. ABSTRAK: Pada tahun 2019 rakyat Indonesia telah terlibat dalam proses demokrasi memilih presiden baru, wakil presiden, dan berbagai calon legislatif negara. Pemilihan presiden Indonesia 2019 sangat tegang dalam kempen calon di ruang siber, terutama di laman media sosial seperti Facebook, Twitter, Instagram, Google+, Tumblr, LinkedIn, dll. Rakyat Indonesia menggunakan platfom media sosial bagi menyatakan pendapat positif, berkecuali, dan juga negatif terhadap calon presiden masing-masing. Kampen pencalonan menteri, gabenor, dan perundangan hingga pencalonan presiden dilakukan melalui media internet dan atas talian. Oleh itu, kajian ini dilakukan bagi menilai sentimen terhadap calon pemilihan presiden Indonesia 2019 berdasarkan kumpulan data Twitter. Kajian ini menggunakan kumpulan data yang diungkapkan oleh rakyat Indonesia yang terdapat di Twitter dengan hashtag (#) yang mengandungi "Jokowi dan Prabowo." Proses data dibuat menggunakan pilihan komentar, pembersihan data, penguraian teks, normalisasi kalimat, dan tokenisasi teks dalam bahasa Indonesia, penentuan atribut kelas, dan akhirnya, pengklasifikasian catatan Twitter dengan hashtag (#) menggunakan Klasifikasi Naïve Bayes (NBC) dan Mesin Vektor Sokongan (SVM) bagi mencapai ketepatan optimum dan maksimum. Kajian ini memberikan faedah dari segi membantu masyarakat meneliti pendapat di Twitter yang mengandungi sentimen positif, neutral, atau negatif. Analisis Sentimen terhadap calon dalam pemilihan presiden Indonesia 2019 di Twitter menggunakan proses bukan konvensional menghasilkan penjimatan kos, waktu, dan usaha. Penyelidikan ini membuktikan bahawa gabungan algoritma pembelajaran mesin SVM dan tokenisasi abjad menghasilkan nilai ketepatan tertinggi iaitu 79.02%. Manakala nilai ketepatan terendah dalam kajian ini diperoleh dengan kombinasi algoritma pembelajaran mesin NBC dan tokenisasi N-gram dengan nilai ketepatan 44.94%.

Download Full-text

Authorship Attribution for Bengali Language Using the Fusion of N-Gram and Naive Bayes Algorithms

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2018.10.02 ◽

2018 ◽

Vol 10 (10) ◽

pp. 11-21

Author(s):

D. M. Anisuzzaman ◽

◽

Abdus Salam

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Authorship Attribution ◽

N Gram ◽

Bengali Language

Download Full-text

Using Character N-gram Features and Multinomial Naïve Bayes for Sentiment Polarity Detection in Bengali Tweets

2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT) ◽

10.1109/eait.2018.8470415 ◽

2018 ◽

Cited By ~ 6

Author(s):

Kamal Sarkar

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

N Gram

Download Full-text

Klasifikasi Rating Otomatis pada Dokumen Teks Ulasan Produk Elektronik Menggunakan Metode N-gram dan Naïve Bayes

Jurnal Informatika Universitas Pamulang ◽

10.32493/informatika.v5i3.6110 ◽

2020 ◽

Vol 5 (3) ◽

pp. 295

Author(s):

Rahmawan Bagus Trianto ◽

Andri Triyono ◽

Dhika Malita Puspita Arum

Keyword(s):

Feature Extraction ◽

Naive Bayes ◽

Automatic Classification ◽

Naïve Bayes ◽

Lack Of Information ◽

N Gram ◽

Bayes Algorithm ◽

Online Product Ratings ◽

Product Description

Online product ratings usually provide descriptive reviews and also reviews in the form of ratings. Likewise, what was done at the Lazada online store. Descriptive review can provide a clear view compared to a rating review to other potential buyers. However, in reality there is a mismatch between the description review and the rating given. This creates a lack of information for sellers as well as potential buyers. Automatic classification of buyer descriptive reviews is proposed in this study so that there is a match between descriptive reviews and rating reviews. This automatic classification descriptive review uses the Naive Bayes algorithm with n-gram feature extraction and TF-IDF word weighting. The results of this study obtained the best accuracy of 94.06%, a recall of 91.73% and precision of 90.71% in Bigram feature extraction. With this accuracy value it can be used as a reference or model for classifying product description reviews, so that the feedback process between sellers and buyers can run well.

Download Full-text

N-Gram Features for Unsupervised WSD with an Underlying Naïve Bayes Model

The Naïve Bayes Model for Unsupervised Word Sense Disambiguation - SpringerBriefs in Statistics ◽

10.1007/978-3-642-33693-5_5 ◽

2012 ◽

pp. 55-68

Author(s):

Florentina T. Hristea

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Bayes Model ◽

N Gram ◽

Naïve Bayes Model

Download Full-text