scholarly journals An in-depth exploration of Bangla blog post classification

2021 ◽  
Vol 10 (2) ◽  
pp. 742-749
Author(s):  
Tanvirul Islam ◽  
Ashik Iqbal Prince ◽  
Md. Mehedee Zaman Khan ◽  
Md. Ismail Jabiullah ◽  
Md. Tarek Habib

Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order to organize Bangla blog posts in a standard way so that users can easily find their required articles of interest. In this research, nine supervised learning models which are Support Vector Machine (SVM), multinomial naïve Bayes (MNB), multi-layer perceptron (MLP), k-nearest neighbours (k-NN), stochastic gradient descent (SGD), decision tree, perceptron, ridge classifier and random forest are utilized and compared for classification of Bangla blog post. Moreover, the performance on predicting blog posts against eight categories, three feature extraction techniques are applied, namely unigram TF-IDF (term frequency-inverse document frequency), bigram TF-IDF, and trigram TF-IDF. The majority of the classifiers show above 80% accuracy. Other performance evaluation metrics also show good results while comparing the selected classifiers.

Author(s):  
Wahyu Adi Prabowo ◽  
Fitriani Azizah

Social media has become a new method of today’s communication in a new digitalize era. Children and adults have used social media a lot in interacting with others. Therefore social media has shifted conventional communication into digital one. This digital development on social media is a serious problem that must be faced because it has been found that there are more and more acts of cyberbullying. This act of cyberbullying can attack the psychic, causing depression up to suicide. The dangers of cyberbullying are troubling and cause concern to the community. Therefore, this study will analyze the sentiment on the comments contained on social media to find out the value of sentiment from comments on social media platforms. The comment data will be processed at the preprocessing stage, Term Frequency-Inverse Document Frequency (TF-IDF), and the Support Vector Machine (SVM) classification method. Comment data to be classified as 1500 data taken using crawling data through libraries in python programming and divided into 80% data training and 20% data testing. Based on the results of the test, the accuracy value is 93%, the precision value is 95%, and the recall value is 97%. In this research, a system model design is also carried out where the system can be integrated with the browser to open a user page on the classification of comments that have been input into the system.


Author(s):  
E. Sri Vishva ◽  
D. Aju

Fundamentally, phishing is a common cybercrime that is indulged by the intruders or hackers on naive and credible individuals and make them to reveal their unique and sensitive information through fictitious websites. The primary intension of this kind of cybercrime is to gain access to the ad hominem or classified information from the recipients. The obtained data comprises of information that can very well utilized to recognize an individual. The purloined personal or sensitive information is commonly marketed in the online dark market and subsequently these information will be bought by the personal identity brigands. Depending upon the sensitivity and the importance of the stolen information, the price of a single piece of purloined information would vary from few dollars to thousands of dollars. Machine learning (ML) as well as Deep Learning (DL) are powerful methods to analyse and endeavour against these phishing attacks. A machine learning based phishing detection system is proposed to protect the website and users from such attacks. In order to optimize the results in a better way, the TF-IDF (Term Frequency-Inverse Document Frequency) value of webpages is employed within the system. ML methods such as LR (Logistic Regression), RF (Random Forest), SVM (Support Vector Machine), NB (Naive Bayes) and SGD (Stochastic Gradient Descent) are applied for training and testing the obtained dataset. Henceforth, a robust phishing website detection system is developed with 90.68% accuracy.


2020 ◽  
Author(s):  
vinayakumar R

<p><b>Social media is a platform in which tons and tons of text are generated each and every day. The data is so large that cannot be easily understood, so this has paved a path to a new field in the information technology which is natural language processing. In this paper, the text data which is used for the classification is tweets that determines the state of the person according of the sentiments which is positive, negative and neutral. Emotions are the way of expression of the person’s feelings which has a high influence on the decision making tasks. Here we have proposed the text representation, Term Frequency Inverse Document Frequency (tfidf), Keras embedding along with the machine learning and deep learning algorithms for the purpose of the classification of the sentiments, out of which Logistics Regression machine learning based methods out performs well when the features is taken in the limited amount as the features increases Support Vector Machine (SVM) which is also one of the machine learning algorithm out performs well making a benchmark accuracy for this dataset as the 75.8%. For the research purpose the dataset has been made publically available.</b><b></b></p>


2020 ◽  
Author(s):  
vinayakumar R

<p><b>Social media is a platform in which tons and tons of text are generated each and every day. The data is so large that cannot be easily understood, so this has paved a path to a new field in the information technology which is natural language processing. In this paper, the text data which is used for the classification is tweets that determines the state of the person according of the sentiments which is positive, negative and neutral. Emotions are the way of expression of the person’s feelings which has a high influence on the decision making tasks. Here we have proposed the text representation, Term Frequency Inverse Document Frequency (tfidf), Keras embedding along with the machine learning and deep learning algorithms for the purpose of the classification of the sentiments, out of which Logistics Regression machine learning based methods out performs well when the features is taken in the limited amount as the features increases Support Vector Machine (SVM) which is also one of the machine learning algorithm out performs well making a benchmark accuracy for this dataset as the 75.8%. For the research purpose the dataset has been made publically available.</b><b></b></p>


2020 ◽  
Vol 4 (2) ◽  
pp. 329-335
Author(s):  
Rusydi Umar ◽  
Imam Riadi ◽  
Purwono

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.


Sign in / Sign up

Export Citation Format

Share Document