Study of Potential Classification of Lost Students in College Based on Information Extraction on Text-Based Social Media; Case Study of Panca Budi Pembangunan University

2021 ◽  
Vol 8 (11) ◽  
pp. 325-331
Author(s):  
Eko Hariyanto ◽  
Sri Wahyuni ◽  
Supina Batubara

The main problem studied in this study is the large number of lost students who harm universities because of the difficulty of monitoring or monitoring as a preventive measure. Therefore, this research becomes very important to be done so that college institutions can make efforts to detect early (classification) of students who potentially cannot complete their studies on time or students who will drop out (DO). Thus, PT institutions through related parties such as academic guidance lecturers, academic bureaus and others can do initial prevention by providing the best solution or solution to the problems faced by students. This research aims to determine the training data model consisting of academic and non-academic factors (including the results of extracting information from social media). Furthermore, this model is used as a basis for classifying students who have the potential to "graduate on time", "graduate not on time", and "DO". The method approach used is quantitative with text mining computational algorithms for the process of extracting knowledge / information from social media which is further used in data training, as well as data mining computational algorithms for the process of classification of potential completion of student studies. The mandatory external targeted in the first year is the publication of the international journal Scopus Q4 and in the second year is the publication of the international journal Scopus Q3. For additional external targets in the first and second years respectively are the publication of international journals indexed on reputable indexers, ISBN teaching books and copyrights. The level of technological readiness (TKT) in this study up to level 2 is the formulation of technological concepts and applications to classify the potential completion of student studies using data mining. Keywords: [student lost, knowledge/information extraction, data classification, text mining, data mining].

2020 ◽  
Vol 11 (2) ◽  
pp. 66-81
Author(s):  
Badia Klouche ◽  
Sidi Mohamed Benslimane ◽  
Sakina Rim Bennabi

Sentiment analysis is one of the recent areas of emerging research in the classification of sentiment polarity and text mining, particularly with the considerable number of opinions available on social media. The Algerian Operator Telephone Ooredoo, as other operators, deploys in its new strategy to conquer new customers, by exploiting their opinions through a sentiments analysis. The purpose of this work is to set up a system called “Ooredoo Rayek”, whose objective is to collect, transliterate, translate and classify the textual data expressed by the Ooredoo operator's customers. This article developed a set of rules allowing the transliteration from Algerian Arabizi to Algerian dialect. Furthermore, the authors used Naïve Bayes (NB) and (Support Vector Machine) SVM classifiers to assign polarity tags to Facebook comments from the official pages of Ooredoo written in multilingual and multi-dialect context. Experimental results show that the system obtains good performance with 83% of accuracy.


2019 ◽  
Vol 9 (8) ◽  
pp. 1725
Author(s):  
Isra Nurul HABIBI ◽  
Abba Suganda GIRSANG

Text classification is one of the ways to classify sentences. The grouped data are comments from social media with training data from sites that provide points /scores for each review given such as tripadvisor.co.id. The word2vec method is used to extract words into numbers so that the machine learning algorithm can be applied to classify data. Word2vec is an unsupervised task that is capable of utilizing unlabeled data to convert a word into its vector representation that can also find the semantic relationship between words by counting their distance. The goal from this paper is that data from social media such as Twitter or Instagram can also quickly find out the total /weight of a tourist place from the comment given. The experiment shows that the result of F1 Score on data without removing stop words and eliminate the train data, give a better result 0,85.


Author(s):  
Rafly Indra Kurnia ◽  
◽  
Abba Suganda Girsang

This study will classify the text based on the rating of the provider application on the Google Play Store. This research is classification of user comments using Word2vec and the deep learning algorithm in this case is Long Short Term Memory (LSTM) based on the rating given with a rating scale of 1-5 with a detailed rating 1 is the lowest and rating 5 is the highest data and a rating scale of 1-3 with a detailed rating, 1 as a negative is a combination of ratings 1 and 2, rating 2 as a neutral is rating 3, and rating 3 as a positive is a combination of ratings 4 and 5 to get sentiment from users using SMOTE oversampling to handle the imbalance data. The data used are 16369 data. The training data and the testing data will be taken from user comments MyTelkomsel’s application from the play.google.com site where each comment has a rating in Indonesian Language. This review data will be very useful for companies to make business decisions. This data can be obtained from social media, but social media does not provide a rating feature for every user comment. This research goal is that data from social media such as Twitter or Facebook can also quickly find out the total of the user satisfaction based from the rating from the comment given. The best f1 scores and precisions obtained using 5 classes with LSTM and SMOTE were 0.62 and 0.70 and the best f1 scores and precisions obtained using 3 classes with LSTM and SMOTE were 0.86 and 0.87


2020 ◽  
Vol 7 (3) ◽  
pp. 443
Author(s):  
Azahari Azahari ◽  
Yulindawati Yulindawati ◽  
Dewi Rosita ◽  
Syamsuddin Mallala

<p class="Abstrak">Prediksi  kelulusan  dibutuhkan  oleh  manajemen  perguruan  tinggi  dalam  menentukan kebijakan  preventif  terkait  pencegahan  dini  kasus drop  out. Lama masa studi setiap mahasiswa bisa disebabkan dengan berbagai faktor.  Dengan  menggunakan <em>data mining</em> algoritma <em>naive bayes</em> dan <em>neural network</em> dapat  dilakukan  prediksi  kelulusan  mahasiswa di  STMIK  Widya  Cipta  Dharma (WiCiDa) Samarinda . Atribut yang digunakan yaitu, umur saat masuk kuliah, klasifikasi kota asal Sekolah Menengah Atas, pekerjaan ayah, program studi, kelas, jumlah saudara, dan Indeks Prestasi Kumulatif (IPK). Sampel mahasiswa yang lulus dan <em>drop-out</em> pada tahun 2011 sampai 2019 dijadikan sebagai data <em>training</em> dan data <em>testing</em>. Sedangkan angkatan 2015–2018 digunakan sebagai data target yang akan diprediksi masa studinya. Sebanyak 3229 mahasiswa, 1769 sebagai data <em>training</em>, 321 sebagai data <em>testing</em>, dan 1139 sebagai data target. Semua data diambil dari data mahasiswa program strata 1, dan tidak mengikut sertakan data mahasiswa D3 dan alih jenjang/transfer.  Dari data <em>testing </em>diperoleh tingkat akurasi hanya 57,63%. Hasil penelitian menunjukkan banyaknya kelemahan dari hasil prediksi <em>naive bayes</em> dikarenakan tingkat akurasi kevalidannya tergolong tidak terlalu tinggi. Sedangkan akurasi prediksi <em>neural network</em> adalah 72,58%, sehingga metode alternatif inilah yang lebih baik. Proses evaluasi dan analisis dilakukan untuk melihat dimana letak kesalahan dan kebenaran dalam hasil prediksi masa studi.</p><div><div><p><em><strong>Abstract</strong></em></p><p class="Abstract"><em>Graduation predictions are required by the higher education institution preventive policies related to the early prevention of drop-out cases. The duration of study, for each student can be caused by various factors. By using the data mining algorithm Naive bayes and neural network, the student graduation in STMIK Widya Cipta Dharma (WiCiDa) can be predicted. The attributes used are as follows: age at admission, classification of cities from high school, father’s occupation, study program, class, number of siblings, and grade point average (GPA). Samples of students who graduated and dropped out between year 2011 and 2019 were used as training data and testing data. While the year class of 2015to 2018 is used as the target data, which will be predicted during the study period. According to the data mining algorithm Naive bayes, there are 3229 students; 1769 as training data, 321 as testing data, and 1139 as target data. All data is taken from students enrolled in undergraduate program and does not include data on diploma students and transfer student. From the testing data, an accuracy rate only 57.63%. The other side, prediction accuracy of the neural network is 72.58%, so this alternative method is the best chosen. The research results show the many weaknesses of the results of prediction of Naive bayes because the level of accuracy of its validity is not high. The evaluation and analysis process are conducted to see where the errors and truths are in the results of the study period predictions.</em></p><p><em><strong><br /></strong></em></p></div></div>


Author(s):  
Lai Lai Yee ◽  
Myo Ma Ma

Data mining is the task of discovering interesting patterns from large amounts of data where the data can be stored in databases, data warehouses or other information repositories. This can be viewed as a result of the natural evolution of information technology. The key point is that data mining is the application of these and other AI and statistical techniques to common business problems in a fashion that makes these techniques available to the skilled knowledge worker as well as the trained statistics professional. This paper is classification system for Toxicology using C4.5. Firstly, the input data are randomly partitioned into two independent data, a training data and a test data. And then two third of the data are allocated to the training data and the remaining one third is allocated to the test data. Final step is C4.5 Algorithm Process, the training data is used to derive C4.5 algorithm. Classification Process, test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable the rules can be applied to the classification of new data.


A research paper is a rich source of academic and innovative writing on a particular topic, and they are unstructured in nature. Categorization of documents refers to classification of documents in classes that are predefined. It is arduous for a user to categories research paper in different domains: because extracting meaningful and relevant words from the research paper is a challenging task. For extracting important information we have used certain methods and classifiers. Methods like bag of words and tfidf is used for processing data. Prepossessing the data includes string tokenizing and stop-word removal. Then the processed data is classified using SVM classifier. For multiclass classification; since predefined classes are 4, therefore 1-v-r classifier is used. The system performance is 88% with 800 training and 200 testing documents. It is analyzed that the model performs better when the training data is more. The aim of this work is to categorize the documents and allocate set of predefined tag to them. It also evaluates the performance of the model by considering different percentages for training and testing sets of documents.


CCIT Journal ◽  
2017 ◽  
Vol 10 (2) ◽  
pp. 197-206
Author(s):  
Atika Rahmawati ◽  
Aris Marjuni ◽  
Junta Zeniarja

Pilkada Serentak is a very important event for the future viability regions and countries. Through this election people can cast their vote and elect representatives of the people according to their choice. Public respond can be expressed through twitter social media. Using twitter social media sentiment analysis can then be made about the public response to the implementation of the election simultaneously. The classification process can be detected via text tweeted by twitter users. In this study, the classification of responses detected by text because it is easily obtained and applied. This study determined the classification of the response to the Indonesian language text and increase accuracy by using SVM.Tweet classification method used by the categorical approach is divided into two classes tweet basic level: positive and negative. Data collected from Indonesian twitter tweet as much as 3000. The labeling is not done manually but using clustering method that divides the 3000 data into two groups. Cluster 1 as a group of positive tweets and Cluster 2 as a negative group tweet.2700 for training data and 300 for the test data. The stage of pre-processing the data includetokenization, casenormalization, stop word detection, and stemming. The process of classification using Support Vector Machine (SVM). Accuracy of SVM showed the highest yield that is 91% compared to the k-means clustering with the results of 82%.


Data Mining is one of the most successful domains in research. It describes the past and speculates the future for analysis. There are several techniques used in data mining. Among them classification is one of the main data mining techniques based on machine learning. In classification technique data set is classified into predefined set of groups or classes. Mathematical techniques such as decision tree, linear regression, neural networks and statistics are used for classification methods. Classification is a problem to identify which set of categories the new observation belongs to using training data set. This paper analyses the data taken from social media and uses the classification algorithm for making a comparative study on social advertisement using python.


Sign in / Sign up

Export Citation Format

Share Document