International Journal of Information Technology and Computer Science

Myers-briggs Personality Prediction and Sentiment Analysis of Twitter using Machine Learning Classifiers and BERT

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.06.04 ◽

2021 ◽

Vol 13 (6) ◽

pp. 48-60

Author(s):

Prajwal Kaushal ◽

◽

Nithin Bharadwaj B P ◽

Pranav M S ◽

Koushik S ◽

...

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Analysis Model ◽

Research Recruitment ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Myers Briggs ◽

Personality Prediction ◽

Twitter Users ◽

Psychological Instruments

Twitter being one of the most sophisticated social networking platforms whose users base is growing exponentially, terabytes of data is being generated every day. Technology Giants invest billions of dollars in drawing insights from these tweets. The huge amount of data is still going underutilized. The main of this paper is to solve two tasks. Firstly, to build a sentiment analysis model using BERT (Bidirectional Encoder Representations from Transformers) which analyses the tweets and predicts the sentiments of the users. Secondly to build a personality prediction model using various machine learning classifiers under the umbrella of Myers-Briggs Personality Type Indicator. MBTI is one of the most widely used psychological instruments in the world. Using this we intend to predict the traits and qualities of people based on their posts and interactions in Twitter. The model succeeds to predict the personality traits and qualities on twitter users. We intend to use the analyzed results in various applications like market research, recruitment, psychological tests, consulting, etc, in future.

Download Full-text

Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.06.05 ◽

2021 ◽

Vol 13 (6) ◽

pp. 61-71

Author(s):

Isaac Kofi Nti ◽

◽

Owusu N yarko-Boateng ◽

Justice Aning

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Experimental Studies ◽

Area Under The Curve ◽

Essential Element ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Machine Learning Algorithm ◽

K Value

The numerical value of k in a k-fold cross-validation training technique of machine learning predictive models is an essential element that impacts the model’s performance. A right choice of k results in better accuracy, while a poorly chosen value for k might affect the model’s performance. In literature, the most commonly used values of k are five (5) or ten (10), as these two values are believed to give test error rate estimates that suffer neither from extremely high bias nor very high variance. However, there is no formal rule. To the best of our knowledge, few experimental studies attempted to investigate the effect of diverse k values in training different machine learning models. This paper empirically analyses the prevalence and effect of distinct k values (3, 5, 7, 10, 15 and 20) on the validation performance of four well-known machine learning algorithms (Gradient Boosting Machine (GBM), Logistic Regression (LR), Decision Tree (DT) and K-Nearest Neighbours (KNN)). It was observed that the value of k and model validation performance differ from one machine-learning algorithm to another for the same classification task. However, our empirical suggest that k = 7 offers a slight increase in validations accuracy and area under the curve measure with lesser computational complexity than k = 10 across most MLA. We discuss in detail the study outcomes and outline some guidelines for beginners in the machine learning field in selecting the best k value and machine learning algorithm for a given task.

Download Full-text

Impact of Internet of Things (IoT) as Persuasive Technology

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.06.02 ◽

2021 ◽

Vol 13 (6) ◽

pp. 16-28

Author(s):

Shagufta Faryad ◽

◽

Hira Batool ◽

Muhammad Asif ◽

Affan Yasin

Keyword(s):

Internet Of Things ◽

Persuasive Technology ◽

The Internet ◽

Digital Architecture ◽

Persuasive Strategies ◽

Routine Work ◽

Design Perspective ◽

Environmental Good ◽

The Internet Of Things ◽

Influence Behavior

The Internet of Things (IoT) adds a new dimension to how people and things can communicate and collaborate. Society and the Internet are now being interconnected tightly and purposely. The research aims to analyze how IoT as a persuasive technology can affect human behavior and increase the awareness and effectiveness of IoT products among users. How will the Internet of Things infrastructure facilitate humans to change their attitudes and behaviors towards specific routine work? Our objective is to analyze which factors influence the acceptance and rejection of particular behaviors and the core motivators that persuade people to do something or to avoid something. We aim to determine whether IoT will facilitate humans to change their focused behaviors or not. Because of the rapid convergence of digital and physical worlds and the advent of digital technology, the Internet and social media have opened up a new world of affordances, constraints, and information flows from a design perspective. This article discusses how digital architecture affects behavior and the ramifications for designers who want to influence behavior for social and environmental good. In this paper we aim to give a brief introduction to persuasive technology, especially as it pertains to human adoption of IoT technology. We discuss a number of current research opportunities in IoT gadgets and their adoptions [1]. Our results indicate that persuasive (IoT) infrastructure can be expected to achieve a change of driving behaviour among their adopters. Furthermore, attention should be paid to an appropriate selection and implementation of persuasive strategies.

Download Full-text

Psychosocial Features for Hate Speech Detection in Code-switched Texts

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.06.03 ◽

2021 ◽

Vol 13 (6) ◽

pp. 29-47

Author(s):

Edward Ombui ◽

◽

Lawrence Muchemi ◽

Peter Wagacha

Keyword(s):

Language Processing ◽

Presidential Elections ◽

Hate Speech ◽

Learning Models ◽

Speech Detection ◽

Word Families ◽

Novel Approach ◽

Duplex Theory ◽

High Level ◽

Speech Identification

This study examines the problem of hate speech identification in codeswitched text from social media using a natural language processing approach. It explores different features in training nine models and empirically evaluates their predictiveness in identifying hate speech in a ~50k human-annotated dataset. The study espouses a novel approach to handle this challenge by introducing a hierarchical approach that employs Latent Dirichlet Analysis to generate topic models that help build a high-level Psychosocial feature set that we acronym PDC. PDC groups similar meaning words in word families, which is significant in capturing codeswitching during the preprocessing stage for supervised learning models. The high-level PDC features generated are based on a hate speech annotation framework [1] that is largely informed by the duplex theory of hate [2]. Results obtained from frequency-based models using the PDC feature on the dataset comprising of tweets generated during the 2012 and 2017 presidential elections in Kenya indicate an f-score of 83% (precision: 81%, recall: 85%) in identifying hate speech. The study is significant in that it publicly shares a unique codeswitched dataset for hate speech that is valuable for comparative studies. Secondly, it provides a methodology for building a novel PDC feature set to identify nuanced forms of hate speech, camouflaged in codeswitched data, which conventional methods could not adequately identify.

Download Full-text

Linked Data: A Framework for Publishing FiveStar Open Government Data

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.06.01 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1-15

Author(s):

Bassel Al-khatib ◽

◽

Ali Ahmad Ali

Keyword(s):

Linked Data ◽

Third Party ◽

General Idea ◽

Open Government ◽

Quality Data ◽

Modular Architecture ◽

Design Data ◽

Fine Grained ◽

Open Government Data ◽

Data Source

With the increased adoption of open government initiatives around the world, a huge amount of governmental raw datasets was released. However, the data was published in heterogeneous formats and vocabularies and in many cases in bad quality due to inconsistency, messy, and maybe incorrectness as it has been collected by practicalities within the source organization, which makes it inefficient for reusing and integrating it for serving citizens and third-party apps. This research introduces the LDOG (Linked Data for Open Government) experimental framework, which aims to provide a modular architecture that can be integrated into the open government hierarchy, allowing huge amounts of data to be gathered in a fine-grained manner from source and directly publishing them as linked data based on Tim Berners lee’s five-star deployment scheme with a validation layer using SHACL, which results in high quality data. The general idea is to model the hierarchy of government and classify government organizations into two types, the modeling organizations at higher levels and data source organizations at lower levels. Modeling organization’s experts in linked data have the responsibility to design data templates, ontologies, SHACL shapes, and linkage specifications. whereas non-experts can be incorporated in data source organizations to utilize their knowledge in data to do mapping, reconciliation, and correcting data. This approach lowers the needed experts that represent a problem of linked data adoption. To test the functionality of our framework in action, we developed the LDOG platform which utilizes the different modules of the framework to power a set of user interfaces that can be used to publish government datasets. we used this platform to convert some of UAE's government datasets into linked data. Finally, on top of the converted data, we built a proof-of-concept app to show the power of five-star linked data for integrating datasets from disparate organizations and to promote the governments' adoption. Our work has defined a clear path to integrate the linked data into open governments and solid steps to publishing and enhancing it in a fine-grained and practical manner with a lower number of experts in linked data, It extends SHACL to define data shapes and convert CSV to RDF.

Download Full-text

An Enhanced List Based Packet Classifier for Performance Isolation in Internet Protocol Storage Area Networks

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.05.05 ◽

2021 ◽

Vol 13 (5) ◽

pp. 51-63

Author(s):

Joseph Kithinji ◽

◽

Makau S. Mutua ◽

Gitonga D. M wathi

Keyword(s):

Internet Protocol ◽

Service Level ◽

Area Network ◽

Linear Search ◽

Storage Area Network ◽

Cache Partitioning ◽

Storage Area Networks ◽

Performance Isolation ◽

Storage Area ◽

Time Required

Consolidation of storage into IP SANs (Internet protocol storage area network) has led to a combination of multiple workloads of varying demands and importance. To ensure that users get their Service level objective (SLO) a technique for isolating workloads is required. Solutions that exist include cache partitioning and throttling of workloads. However, all these techniques require workloads to be classified in order to be isolated. Previous works on performance isolation overlooked the classification process as a source of overhead in implementing performance isolation. However, it’s known that linear search based classifiers search linearly for rules that match packets in order to classify flows which results in delays among other problems especially when rules are many. This paper looks at the various limitation of list based classifiers. In addition, the paper proposes a technique that includes rule sorting, rule partitioning and building a tree rule firewall to reduce the cost of matching packets to rules during classification. Experiments were used to evaluate the proposed solution against the existing solutions and proved that the linear search based classification process could result in performance degradation if not optimized. The results of the experiments showed that the proposed solution when implemented would considerably reduce the time required for matching packets to their classes during classification as evident in the throughput and latency experienced.

Download Full-text

An Effective Text Classifier using Machine Learning for Identifying Tweets’ Polarity Concerning Terrorist Connotation

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.05.02 ◽

2021 ◽

Vol 13 (5) ◽

pp. 19-29

Author(s):

Norah AL-Harbi ◽

◽

Amirrudin Bin Kamsin

Keyword(s):

Machine Learning ◽

Social Networking Sites ◽

Classification Accuracy ◽

Arab World ◽

Arabic Language ◽

Machine Learning Algorithms ◽

Language Models ◽

Svm Classifier ◽

The Past ◽

Linear Svm

Terrorist groups in the Arab world are using social networking sites like Twitter and Facebook to rapidly spread terror for the past few years. Detection and suspension of such accounts is a way to control the menace to some extent. This research is aimed at building an effective text classifier, using machine learning to identify the polarity of the tweets automatically. Five classifiers were chosen, which are AdB_SAMME, AdB_SAMME.R, Linear SVM, NB, and LR. These classifiers were applied on three features namely S1 (one word, unigram), S2 (word pair, bigram), and S3 (word triplet, trigram). All five classifiers evaluated samples S1, S2, and S3 in 346 preprocessed tweets. Feature extraction process utilized one of the most widely applied weighing schemes tf-idf (term frequency-inverse document frequency).The results were validated by four experts in Arabic language (three teachers and an educational supervisor in Saudi Arabia) through a questionnaire. The study found that the Linear SVM classifier yielded the best results of 99.7 % classification accuracy on S3 among all the other classifiers used. When both classification accuracy and time were considered, the NB classifier demonstrated the performance on S1 with 99.4% accuracy, which was comparable with Linear SVM. The Arab world has faced massive terrorist attacks in the past, and therefore, the research is highly significant and relevant due to its specific focus on detecting terrorism messages in Arabic. The state-of-the-art methods developed so far for tweets classification are mostly focused on analyzing English text, and hence, there was a dire need for devising machine learning algorithms for detecting Arabic terrorism messages. The innovative aspect of the model presented in the current study is that the five best classifiers were selected and applied on three language models S1, S2, and S3. The comparative analysis based on classification accuracy and time constraints proposed the best classifiers for sentiment analysis in the Arabic language.

Download Full-text

Risk-based Decision-making System for Information Processing Systems

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.05.01 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1-18

Author(s):

Serhii Zybin ◽

◽

Yana Bielozorova

Keyword(s):

Decision Making ◽

Decision Support ◽

Decision Support System ◽

Support System ◽

Block Diagram ◽

Main Idea ◽

Time Interval ◽

Efficient Operation ◽

Distribution Of Resources ◽

The Impact

The article is dedicated to using the methodology of building a decision support system under threats and risks. This method has been developed by modifying the methods of targeted evaluation of options and is used for constructing a scheme of the decision support system. Decision support systems help to make correct and effective solution to shortage of time, incompleteness, uncertainty and unreliability of information, and taking into account the risks. When we are making decisions taking into account the risks, it is necessary to solve the following tasks:determination of quantitative characteristics of risk; determination of quantitative indicators for the effectiveness of decisions in the presence of risks; distribution of resources between means of countering threats, and means that are aimed at improving information security. The known methods for solving the first problem provide for the identification of risks (qualitative analysis), as well as the assessment of the probabilities and the extent of possible damage (quantitative analysis). However, at the same time, the task of assessing the effectiveness of decisions taking into account risks is not solved and remains at the discretion of the expert. The suggesting method of decision support under threats and risks has been developed by modifying the methods of targeted evaluation of options. The relative efficiency in supporting measures to develop measures has been calculated as a function of time given on a time interval. The main idea of the proposed approach to the analysis of the impact of threats and risks in decision-making is that events that cause threats or risks are considered as a part of the decision support system. Therefore, such models of threats or risks are included in the hierarchy of goals, their links with other system's parts and goals are established. The main functional modules that ensure the continuous and efficient operation of the decision support system are the following subsystems: subsystem for analysing problems, risks and threats; subsystem for the formation of goals and criteria; decision-making subsystem; subsystem of formation of the decisive rule and analysis of alternatives. Structural schemes of functioning are constructed for each subsystem. The given block diagram provides a full-fledged decision-making process.

Download Full-text

Data Mining for Cyberbullying and Harassment Detection in Arabic Texts

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.05.04 ◽

2021 ◽

Vol 13 (5) ◽

pp. 41-50

Author(s):

Eman Bashir ◽

◽

Mohamed Bouguessa

Keyword(s):

Data Mining ◽

Social Media ◽

Deep Learning ◽

Young People ◽

Arabic Text ◽

Learning Approaches ◽

Learning Models ◽

Social Media Platforms ◽

Arabic World

Broadly cyberbullying is viewed as a severe social danger that influences many individuals around the globe, particularly young people and teenagers. The Arabic world has embraced technology and continues using it in different ways to communicate inside social media platforms. However, the Arabic text has drawbacks for its complexity, challenges, and scarcity of its resources. This paper investigates several questions related to the content of how to protect an Arabic text from cyberbullying/harassment through the information posted on Twitter. To answer this question, we collected the Arab corpus covering the topics with specific words, which will explain in detail. We devised experiments in which we investigated several learning approaches. Our results suggest that deep learning models like LSTM achieve better performance compared to other traditional yberbullying classifiers with an accuracy of 72%.

Download Full-text

Cardiotocography Data Analysis to Predict Fetal Health Risks with Tree-Based Ensemble Learning

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.05.03 ◽

2021 ◽

Vol 13 (5) ◽

pp. 30-40

Author(s):

Pankaj Bhowmik ◽

◽

Pulak Chandra Bhowmik ◽

U. A. Md. Ehsan Ali ◽

Md. Sohrawordi

Keyword(s):

Machine Learning ◽

Ensemble Learning ◽

Health Risks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Model Assessment ◽

Decision Tree Classifier ◽

Chi Square ◽

Fetal Health ◽

Tree Classifier

A sizeable number of women face difficulties during pregnancy, which eventually can lead the fetus towards serious health problems. However, early detection of these risks can save both the invaluable life of infants and mothers. Cardiotocography (CTG) data provides sophisticated information by monitoring the heart rate signal of the fetus, is used to predict the potential risks of fetal wellbeing and for making clinical conclusions. This paper proposed to analyze the antepartum CTG data (available on UCI Machine Learning Repository) and develop an efficient tree-based ensemble learning (EL) classifier model to predict fetal health status. In this study, EL considers the Stacking approach, and a concise overview of this approach is discussed and developed accordingly. The study also endeavors to apply distinct machine learning algorithmic techniques on the CTG dataset and determine their performances. The Stacking EL technique, in this paper, involves four tree-based machine learning algorithms, namely, Random Forest classifier, Decision Tree classifier, Extra Trees classifier, and Deep Forest classifier as base learners. The CTG dataset contains 21 features, but only 10 most important features are selected from the dataset with the Chi-square method for this experiment, and then the features are normalized with Min-Max scaling. Following that, Grid Search is applied for tuning the hyperparameters of the base algorithms. Subsequently, 10-folds cross validation is performed to select the meta learner of the EL classifier model. However, a comparative model assessment is made between the individual base learning algorithms and the EL classifier model; and the finding depicts EL classifiers’ superiority in fetal health risks prediction with securing the accuracy of about 96.05%. Eventually, this study concludes that the Stacking EL approach can be a substantial paradigm in machine learning studies to improve models’ accuracy and reduce the error rate.

Download Full-text

International Journal of Information Technology and Computer Science
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mecs Publisher

Myers-briggs Personality Prediction and Sentiment Analysis of Twitter using Machine Learning Classifiers and BERT

Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation

Impact of Internet of Things (IoT) as Persuasive Technology

Psychosocial Features for Hate Speech Detection in Code-switched Texts

Linked Data: A Framework for Publishing FiveStar Open Government Data

An Enhanced List Based Packet Classifier for Performance Isolation in Internet Protocol Storage Area Networks

An Effective Text Classifier using Machine Learning for Identifying Tweets’ Polarity Concerning Terrorist Connotation

Risk-based Decision-making System for Information Processing Systems

Data Mining for Cyberbullying and Harassment Detection in Arabic Texts

Cardiotocography Data Analysis to Predict Fetal Health Risks with Tree-Based Ensemble Learning

Export Citation Format

International Journal of Information Technology and Computer ScienceLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mecs Publisher

Myers-briggs Personality Prediction and Sentiment Analysis of Twitter using Machine Learning Classifiers and BERT

Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation

Impact of Internet of Things (IoT) as Persuasive Technology

Psychosocial Features for Hate Speech Detection in Code-switched Texts

Linked Data: A Framework for Publishing FiveStar Open Government Data

An Enhanced List Based Packet Classifier for Performance Isolation in Internet Protocol Storage Area Networks

An Effective Text Classifier using Machine Learning for Identifying Tweets’ Polarity Concerning Terrorist Connotation

Risk-based Decision-making System for Information Processing Systems

Data Mining for Cyberbullying and Harassment Detection in Arabic Texts

Cardiotocography Data Analysis to Predict Fetal Health Risks with Tree-Based Ensemble Learning

International Journal of Information Technology and Computer Science
Latest Publications