Doc2Vec &Naïve Bayes: Learners’ Cognitive Presence Assessment through Asynchronous Online Discussion TQ Transcripts

Hind Hayati; Abdessamad Chanaa; Mohammed Khalidi Idrissi; Samir Bennani

doi:10.3991/ijet.v14i08.9964

Doc2Vec &Naïve Bayes: Learners’ Cognitive Presence Assessment through Asynchronous Online Discussion TQ Transcripts

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v14i08.9964 ◽

2019 ◽

Vol 14 (08) ◽

pp. 70 ◽

Cited By ~ 3

Author(s):

Hind Hayati ◽

Abdessamad Chanaa ◽

Mohammed Khalidi Idrissi ◽

Samir Bennani

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Online Discussions ◽

Cognitive Presence ◽

Machine Learning Techniques ◽

Face To Face ◽

Learning Techniques ◽

Bayes Algorithm ◽

Context Features

Due to the lack of face to face interaction in online learning environment, this article aims essentially to give tutors the opportunity to understand and analyze learners’ cognitive behavior. In this perspective, we propose an automatic system to assess learners’ cognitive presence regarding their social interactions within synchronous online discussions. Combining Natural Language Preprocessing, Doc2Vec document embedding method and machine learning techniques; we first make some transformations and preprocessing to the given transcripts, then we apply Doc2Vec method to represent each message as a vector that will be concatenated with LIWC and context features. The vectors are input data of Naïve Bayes algorithm; a machine learning method; that aims to classify transcripts according to cognitive presence categories.

Download Full-text

Sentiment Analysis using various Machine Learning and Deep Learning Techniques

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.308 ◽

2021 ◽

pp. 385-394

Author(s):

V Umarani ◽

A Julian ◽

J Deepa

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Process ◽

Learning Techniques

Sentiment analysis has gained a lot of attention from researchers in the last year because it has been widely applied to a variety of application domains such as business, government, education, sports, tourism, biomedicine, and telecommunication services. Sentiment analysis is an automated computational method for studying or evaluating sentiments, feelings, and emotions expressed as comments, feedbacks, or critiques. The sentiment analysis process can be automated using machine learning techniques, which analyses text patterns faster. The supervised machine learning technique is the most used mechanism for sentiment analysis. The proposed work discusses the flow of sentiment analysis process and investigates the common supervised machine learning techniques such as multinomial naive bayes, Bernoulli naive bayes, logistic regression, support vector machine, random forest, K-nearest neighbor, decision tree, and deep learning techniques such as Long Short-Term Memory and Convolution Neural Network. The work examines such learning methods using standard data set and the experimental results of sentiment analysis demonstrate the performance of various classifiers taken in terms of the precision, recall, F1-score, RoC-Curve, accuracy, running time and k fold cross validation and helps in appreciating the novelty of the several deep learning techniques and also giving the user an overview of choosing the right technique for their application.

Download Full-text

Sentiment Analysis of Tweets on the COVID-19 Pandemic Using Machine Learning Techniques

Handbook of Research on Innovations and Applications of AI, IoT, and Cognitive Technologies - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-6870-5.ch021 ◽

2021 ◽

pp. 310-320

Author(s):

Jothikumar R. ◽

Vijay Anand R. ◽

Visu P. ◽

Kumar R. ◽

Susi S. ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Respiratory Tract ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Respiratory Tract Diseases ◽

Thought Processes ◽

Learning Techniques

Sentiment evaluation alludes to separate the sentiments from the characteristic language and to perceive the mentality about the exact theme. Novel corona infection, a harmful malady ailment, is spreading out of the blue through the quarter, which thought processes respiratory tract diseases that can change from gentle to extraordinary levels. Because of its quick nature of spreading and no conceived cure, it ushered in a vibe of stress and pressure. In this chapter, a framework perusing principally based procedure is utilized to discover the musings of the tweets related to COVID and its effect lockdown. The chapter examines the tweets identified with the hash tags of crown infection and lockdown. The tweets were marked fabulous, negative, or fair, and a posting of classifiers has been utilized to investigate the precision and execution. The classifiers utilized have been under the four models which incorporate decision tree, regression, helpful asset vector framework, and naïve Bayes forms.

Download Full-text

Peningkatan Performa Pendeteksian Anomali Menggunakan Ensemble Learning dan Feature Selection

Creative Information Technology Journal ◽

10.24076/citec.2020v7i1.238 ◽

2021 ◽

Vol 7 (1) ◽

pp. 1

Author(s):

Ripto Sudiyarno ◽

Arief Setyanto ◽

Emha Taufiq Luthfi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Ensemble Learning ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Detection Systems ◽

Learning Techniques ◽

Performance Results

Intrusion detection systems (IDS) atau Sistem pendeteksian intrusi dikenal sebagai teknik yang sangat menonjol dan terkemuka untuk menemukan malicious activities pada jaringan komputer, tidak seperti firewall konvensional, IDS berbeda dalam hal pengidentifikasian serangan secara cerdas dengan pendekatan analitik seperti data mining dan teknik machine learning. Dalam beberapa dekade terakhir, ensemble learning sangat memajukan penelitian pada machine learning dan klasifikasi pola, serta menunjukan peningkatan hasil kinerja dibandingkan single classifier. Pada Penelitian ini dilakukan percobaan peningkatan nilai akurasi terhadap sistem pendeteksian anomali, pertama dilakukan klasifikasi menggunakan single classifier untuk didapati hasil nilai akurasi yang nantinya dibandingkan dengan hasil dari ensemble learning dan feature selection. Penggunaan ensemble learning bertujuan untuk mendapatkan nilai akurasi yang terbaik dari single classifier. Hasil didapatkan dari nilai confusion matrix dan akan dilakukan pengujian dengan cara membandingkan nilai kedua metode diatas. Penelitian berhasil mendapatkan nilai akurasi single classifier (naïve bayes) yaitu 77,4% dan nilai ensemble learning 96,8%. Kata Kunci— ensemble learning, nsl-kdd, naïve bayes, anomali, feature selectionIntrusion detection systems (IDS) are known as very prominent and leading techniques for finding malicious activities on computer networks, unlike conventional firewalls, IDS differs in terms of identifying attacks intelligently with analytic approaches such as machine learning techniques. In the last few decades, ensemble learning has greatly advanced research in machine learning and pattern classification it has shown an improve in performance results compared to a single classifier. In this study an attempt was made to increase the accuracy of anomalous detection systems, first by classification using a single classifier to find the results of accuracy which will be compared with the results of ensemble learning and feature selection. The use of ensemble learning aims to get the best accuracy value from a single classifier. The results are obtained from the value of the confusion matrix and will be tested by comparing the values of the two methods above. The research succeeded in getting a single classifier accuracy value of 77,4% and ensemble learning 96,8%. Keywords— ensemble learning, nsl-kdd, naïve bayes, anomali, feature selection

Download Full-text

Network Malware Detection using Soft Computing and Machine Learning Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1654.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 879-885

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Anomaly Detection ◽

Soft Computing ◽

Naive Bayes ◽

Malware Detection ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Network Anomaly Detection

In today’s world there is rapid increase in the information which makes addressing of security issues more important. Malware detection is an important area for research in effective and secure functioning of computer networks. Research efforts are required to protect the systems from various security attacks. In this paper, we analyze usefulness of Soft Computing and Machine Learning Techniques for network malware detection. Hamamoto et al. [1] used combination of Genetic Algorithm and Fuzzy logic for implementation of network anomaly detection. The research work proposed in this paper extends the concepts discussed in [1]. The proposed work explores use of various Machine Learning algorithms such as K-Nearest Neighbor, Naïve Bayes and Decision Tree for network anomaly detection. The experimental observations are conducted on CIDDS (Coburg Intrusion Detection Data Set) dataset [14]. It is observed that Decision Tree approach gave better results as compared to KNN and Naïve Bayes techniques. Decision Tree technique gives 99% of accuracy and precision of 1 and recall of 1.

Download Full-text

Breast Cancer Prediction Using Classification Techniques of Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2022.39743 ◽

2022 ◽

Vol 10 (1) ◽

pp. 51-57

Author(s):

Angela More

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Decision Tree Classifier ◽

Learning Techniques ◽

Tree Classifier ◽

Abstract Data

Abstract: Data analytics play vital roles in diagnosis and treatment in the health care sector. To enable practitioner decisionmaking, huge volumes of data should be processed with machine learning techniques to produce tools for prediction and classification Breast Cancer reports 1 million cases per year. We have proposed a prediction model, which is specifically designed for prediction of Breast Cancer using Machine learning algorithms Decision tree classifier, Naïve Bayes, SVM and KNearest Neighbour algorithms. The model predicts the type of tumour, the tumour can be benign (noncancerous) or malignant (cancerous) . The model uses supervised learning which is a machine learning concept where we provide dependent and independent columns to machine. It uses classification technique which predicts the type of tumour. Keywords: Cancer, Machine learning, Prediction, Data Visualization, SVM, Naïve Bayes, Classification.

Download Full-text

Innovative Artificial Intelligence Approach for Hearing-Loss Symptoms Identification Model Using Machine Learning Techniques

Sustainability ◽

10.3390/su13105406 ◽

2021 ◽

Vol 13 (10) ◽

pp. 5406

Author(s):

Mohd Khanapi Abd Ghani ◽

Nasir G. Noma ◽

Mazin Abed Mohammed ◽

Karrar Hameed Abdulkareem ◽

Begonya Garcia-Zapirain ◽

...

Keyword(s):

Machine Learning ◽

Hearing Loss ◽

Error Rate ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Machine Learning Techniques ◽

Feature Transformation ◽

Learning Techniques ◽

Multivariate Bernoulli

Physicians depend on their insight and experience and on a fundamentally indicative or symptomatic approach to decide on the possible ailment of a patient. However, numerous phases of problem identification and longer strategies can prompt a longer time for consulting and can subsequently cause other patients that require attention to wait for longer. This can bring about pressure and tension concerning those patients. In this study, we focus on developing a decision-support system for diagnosing the symptoms as a result of hearing loss. The model is implemented by utilizing machine learning techniques. The Frequent Pattern Growth (FP-Growth) algorithm is used as a feature transformation method and the multivariate Bernoulli naïve Bayes classification model as the classifier. To find the correlation that exists between the hearing thresholds and symptoms of hearing loss, the FP-Growth and association rule algorithms were first used to experiment with small sample and large sample datasets. The result of these two experiments showed the existence of this relationship, and that the performance of the hybrid of the FP-Growth and naïve Bayes algorithms in identifying hearing-loss symptoms was found to be efficient, with a very small error rate. The average accuracy rate and average error rate for the multivariate Bernoulli model with FP-Growth feature transformation, using five training sets, are 98.25% and 1.73%, respectively.

Download Full-text

Empirical Assessment of Machine Learning Techniques for Software Requirements Risk Prediction

Electronics ◽

10.3390/electronics10020168 ◽

2021 ◽

Vol 10 (2) ◽

pp. 168

Author(s):

Rashid Naseem ◽

Zain Shaukat ◽

Muhammad Irfan ◽

Muhammad Arif Shah ◽

Arshad Ahmad ◽

...

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Naive Bayes ◽

Absolute Error ◽

Naïve Bayes ◽

Error Rates ◽

Machine Learning Techniques ◽

Decision Table ◽

Software Project ◽

Learning Techniques

Software risk prediction is the most sensitive and crucial activity of Software Development Life Cycle (SDLC). It may lead to the success or failure of a project. The risk should be predicted earlier to make a software project successful. A model is proposed for the prediction of software requirement risks using requirement risk dataset and machine learning techniques. In addition, a comparison is made between multiple classifiers that are K-Nearest Neighbour (KNN), Average One Dependency Estimator (A1DE), Naïve Bayes (NB), Composite Hypercube on Iterated Random Projection (CHIRP), Decision Table (DT), Decision Table/Naïve Bayes Hybrid Classifier (DTNB), Credal Decision Trees (CDT), Cost-Sensitive Decision Forest (CS-Forest), J48 Decision Tree (J48), and Random Forest (RF) achieve the best suited technique for the model according to the nature of dataset. These techniques are evaluated using various evaluation metrics including CCI (correctly Classified Instances), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), Root Relative Squared Error (RRSE), precision, recall, F-measure, Matthew’s Correlation Coefficient (MCC), Receiver Operating Characteristic Area (ROC area), Precision-Recall Curves area (PRC area), and accuracy. The inclusive outcome of this study shows that in terms of reducing error rates, CDT outperforms other techniques achieving 0.013 for MAE, 0.089 for RMSE, 4.498% for RAE, and 23.741% for RRSE. However, in terms of increasing accuracy, DT, DTNB, and CDT achieve better results.

Download Full-text

Python NLTK Sentiment Inspectionusing Naïve Bayes Classifier

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1328.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 2684-2687 ◽

Cited By ~ 1

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Customer Reviews ◽

Learning Techniques ◽

Stop Word ◽

Bayes Algorithm ◽

The Given

The Web is one of the richest sources for gathering of consumer reviews and opinions. There are many websites which contains opinions of the customers in the form of reviews, blogs, discussion groups, and forums. This project focuses on customer reviews on the restaurants. It predicts whether the given comment is either a positive or negative using supervised machine learning techniques. The project makes use of a dataset from Kaggle website. The dataset consists of comment and the type of comment (i.e., either positive or negative). This project makes a study on classification algorithm and text mining approaches to identify the type of comment. Firstly, the data set which is taken is made free from duplicates. That is duplicates are removed then it is followed by text pre-processing that involves removal of punctuation marks, stop word removal and then conversion of the whole text into vector format would takes place. The conversion from text to vector is an essential step because the English cannot be directly used for the analysis as we are working with linear algebra. So, as to work with this data, it has to be converted to vector format and we are using CountVectorizer to convert the data to the vector format. And finally comes the classification part. We are using Naive Bayes algorithm for this classification. This classification makes the data set into two parts as mentioned above. Here we are taking 70 percent of the data to be train data set and 30 percent of the data to be test data set

Download Full-text

Evaluating Machine Learning Methods for Predicting Diabetes among Female Patients in Bangladesh

Information ◽

10.3390/info11080374 ◽

2020 ◽

Vol 11 (8) ◽

pp. 374

Author(s):

Badiuzzaman Pranto ◽

Sk. Maliha Mehnaz ◽

Esha Bintee Mahid ◽

Imran Mahmud Sadman ◽

Ahsanur Rahman ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Machine Learning Methods ◽

Learning Techniques

Machine Learning has a significant impact on different aspects of science and technology including that of medical researches and life sciences. Diabetes Mellitus, more commonly known as diabetes, is a chronic disease that involves abnormally high levels of glucose sugar in blood cells and the usage of insulin in the human body. This article has focused on analyzing diabetes patients as well as detection of diabetes using different Machine Learning techniques to build up a model with a few dependencies based on the PIMA dataset. The model has been tested on an unseen portion of PIMA and also on the dataset collected from Kurmitola General Hospital, Dhaka, Bangladesh. The research is conducted to demonstrate the performance of several classifiers trained on a particular country’s diabetes dataset and tested on patients from a different country. We have evaluated decision tree, K-nearest neighbor, random forest, and Naïve Bayes in this research and the results show that both random forest and Naïve Bayes classifier performed well on both datasets.

Download Full-text

Automated analysis of cognitive presence in MOOC discussions

Pacific Journal of Technology Enhanced Learning ◽

10.24135/pjtel.v2i1.63 ◽

2020 ◽

Vol 2 (1) ◽

pp. 46-47

Author(s):

Yuanyuan Hu ◽

Claire Donald ◽

Nasser Giacaman

Keyword(s):

Machine Learning ◽

Online Discussion ◽

Learning Experience ◽

Computer Conferencing ◽

Online Discussions ◽

Cognitive Presence ◽

Machine Learning Techniques ◽

Rater Reliability ◽

Learning Techniques ◽

Research Bias

The Community of Inquiry (CoI) framework [1] has been broadly used to analyse learning experience in online discussion forums for two decades. Cognitive presence, which is a primary dimension of the CoI framework, manifests the reflection of (re)constructing knowledge and problem-solving processes in the learning experience [2]. Researchers doing text analysis using machine learning techniques are making promising contributions to analysing phases of cognitive presence automatically [3]–[5] in online discussions. However, most studies of automated cognitive analysis focus on improving the accuracy and reliability of the classifiers. They ignored that another purpose of applying machine learning techniques in educational research should be to pinpoint research bias that scholars neither intended to nor can have found without computer support. This session will present the example of ‘research bias’ discovered from both manual and automated classification of cognitive phases, provoking scholars to rethink and improve the conflicting part in the taxonomies of cognitive presence under MOOC context. The manual-classification rubric that used to label discussion messages of a target MOOC combines Garrison, Anderson and Archer’s [2] scheme with Park’s [6] revised version. The rubric describes four phases of cognitive presence (i.e. triggering event, exploration, integration and resolution), and indicators of each phase in online discussions. We reported the average inter-rater reliability between two human raters achieved 95.4% agreement (N = 1002) with a Cohen’s weighted kappa of 0.96. Interestingly, we found the average inter-rater reliability decreased to 80.1% after increasing the size of data samples (N = 1918) and the number of human raters to three. After training the automated classifiers to predict phases of cognitive presence, the confusion matrix implies that most of the disagreements between computer raters occurred between adjacent phases of cognitive presence. The disagreements between human raters also have the same problems. We assume the additional categories may exist between cognitive phases in such MOOC discussion messages. These details will be discussed during the presentation. References [1] D. Garrison, T. Anderson, and W. Archer, “Critical Inquiry in a Text-Based Environment: Computer Conferencing in Higher Education,” Internet High. Educ., vol. 2, no. 2, pp. 87–105, 1999. [2] D. Garrison, T. Anderson, and W. Archer, “Critical thinking, cognitive presence, and computer conferencing in distance education,” Am. J. Distance Educ., vol. 15, no. 1, pp. 7–23, 2001. [3] V. Kovanović, S. Joksimović, D. Gašević, and M. Hatala, “Automated cognitive presence detection in online discussion transcripts,” in Automated cognitive presence detection in online discussion transcripts’ CEUR Workshop Proceedings (vol. 1137), 2014. [4] V. Kovanović et al., “Towards automated content analysis of discussion transcripts,” Proc. Sixth Int. Conf. Learn. Anal. Knowl. - LAK ’16, pp. 15–24, 2016. [5] E. Farrow, J. Moore, and D. Gasevic, “Analysing discussion forum data: a replication study avoiding data contamination,” 9th Int. Learn. Anal. Knowl. Conf., no. March, 2019. [6] C. Park, “Replicating the Use of a Cognitive Presence Measurement Tool,” J. Interact. Online Learn., vol. 8, no. 2, pp. 140–155, 2009.

Download Full-text