Big Five Personality Prediction from Social Media Data using Machine Learning Techniques

Personality has been important for a number of types of cooperation; it has useful in predicting job achievement, expert and emotional relationship achievement, and even tendency towards a variety of interfaces. To accurately examine the characters of users, a personality test must be carried out. In numerous areas of online life it is usually impractical to use character research. . We used SVM classification, Random Forest algorithm, Naïve Bayes Algorithm and Logistic regression to comparatively predict the user’s personality accurately. The main goal of the paper is to evaluate the machine learning models using the four parameters- accuracy, precision, recall, f1 score and basing upon these parameters the best machine learning model will be used to classify the big five personality traits of the twitter users.

Download Full-text

Big five personality prediction based in Indonesian tweets using machine learning methods

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i2.pp1973-1981 ◽

2022 ◽

Vol 12 (2) ◽

pp. 1973

Author(s):

Warih Maharani ◽

Veronikha Effendy

Keyword(s):

Machine Learning ◽

Social Media ◽

Big Five ◽

Machine Learning Techniques ◽

Social Characteristics ◽

Support Vector ◽

Big Five Personality ◽

Semantic Features ◽

Learning Techniques ◽

Personality Prediction

<span lang="EN-US">The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on personality prediction analysis of English texts, but Indonesian has received scant attention. Therefore, this research aims to predict user’s personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques, including <a name="_Hlk87278444"></a>naive Bayes (NB), K-nearest neighbors (KNN), and support vector machine (SVM), based on semantic features including emotion, sentiment, and publicly available Twitter profile. We predict the personality based on the big five personality model, the most appropriate model for predicting user personality in social media. We examine the relationships between the semantic features and the Big Five personality dimensions. The experimental results indicate that the Big Five personality exhibit distinct emotional, sentimental, and social characteristics and that SVM outperformed NB and KNN for Indonesian. In addition, we observe several terms in Indonesian that specifically refer to each personality type, each of which has distinct emotional, sentimental, and social features.</span>

Download Full-text

Sentiment Analysis in Social Media using Machine Learning Techniques

Iraqi Journal of Science ◽

10.24996/ijs.2020.61.1.22 ◽

2020 ◽

pp. 193-201 ◽

Cited By ~ 1

Author(s):

Hayder A. Alatabi ◽

Ayad R. Abbas

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Machine Learning Techniques ◽

Great Success ◽

Social Media Data ◽

Learning Techniques ◽

The World ◽

Analysis System ◽

Media Data

Over the last period, social media achieved a widespread use worldwide where the statistics indicate that more than three billion people are on social media, leading to large quantities of data online. To analyze these large quantities of data, a special classification method known as sentiment analysis, is used. This paper presents a new sentiment analysis system based on machine learning techniques, which aims to create a process to extract the polarity from social media texts. By using machine learning techniques, sentiment analysis achieved a great success around the world. This paper investigates this topic and proposes a sentiment analysis system built on Bayesian Rough Decision Tree (BRDT) algorithm. The experimental results show the success of this system where the accuracy of the system is more than 95% on social media data.

Download Full-text

Leveraging Machine Learning in Financial Fraud Forensics in the Age of Cybersecurity

10.4018/978-1-7998-8386-9.ch010 ◽

2022 ◽

pp. 220-249

Author(s):

Md Ariful Haque ◽

Sachin Shetty

Keyword(s):

Machine Learning ◽

Financial Institutions ◽

Machine Learning Techniques ◽

Cyber Attack ◽

Financial Industry ◽

Security Breaches ◽

Financial Gain ◽

Learning Techniques ◽

Machine Learning Model ◽

Machine Learning Models

Financial sectors are lucrative cyber-attack targets because of their immediate financial gain. As a result, financial institutions face challenges in developing systems that can automatically identify security breaches and separate fraudulent transactions from legitimate transactions. Today, organizations widely use machine learning techniques to identify any fraudulent behavior in customers' transactions. However, machine learning techniques are often challenging because of financial institutions' confidentiality policy, leading to not sharing the customer transaction data. This chapter discusses some crucial challenges of handling cybersecurity and fraud in the financial industry and building machine learning-based models to address those challenges. The authors utilize an open-source e-commerce transaction dataset to illustrate the forensic processes by creating a machine learning model to classify fraudulent transactions. Overall, the chapter focuses on how the machine learning models can help detect and prevent fraudulent activities in the financial sector in the age of cybersecurity.

Download Full-text

Machine Learning Generalisation across Different 3D Architectural Heritage

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9060379 ◽

2020 ◽

Vol 9 (6) ◽

pp. 379 ◽

Cited By ~ 4

Author(s):

Eleonora Grilli ◽

Fabio Remondino

Keyword(s):

Machine Learning ◽

Point Cloud ◽

Machine Learning Techniques ◽

Training Dataset ◽

High Complexity ◽

Architectural Heritage ◽

Learning Techniques ◽

Machine Learning Model ◽

Point Cloud Classification

The use of machine learning techniques for point cloud classification has been investigated extensively in the last decade in the geospatial community, while in the cultural heritage field it has only recently started to be explored. The high complexity and heterogeneity of 3D heritage data, the diversity of the possible scenarios, and the different classification purposes that each case study might present, makes it difficult to realise a large training dataset for learning purposes. An important practical issue that has not been explored yet, is the application of a single machine learning model across large and different architectural datasets. This paper tackles this issue presenting a methodology able to successfully generalise to unseen scenarios a random forest model trained on a specific dataset. This is achieved looking for the best features suitable to identify the classes of interest (e.g., wall, windows, roof and columns).

Download Full-text

Analysis of Naıve Bayes Algorithm for Email Spam Filtering

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst0701002 ◽

2021 ◽

Vol 7 (01) ◽

pp. 5-9

Author(s):

RajKishore Sahni

Keyword(s):

Machine Learning ◽

Service Providers ◽

Machine Learning Techniques ◽

Research Trend ◽

Learning Approaches ◽

Spam Filtering ◽

Internet Service ◽

Learning Techniques ◽

Bayes Algorithm ◽

Email Spam

The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter spam emails. We present a systematic review of some of the popular machine learning based email spam filtering approaches. Our review covers survey of the important concepts, attempts, efficiency, and the research trend in spam filtering. The preliminary discussion in the study background examines the applications of machine learning techniques to the email spam filtering process of the leading internet service providers (ISPs) like Gmail, Yahoo and Outlook emails spam filters. Discussion on general email spam filtering process, and the various efforts by different researchers in combating spam through the use machine learning techniques was done. Our review compares the strengths and drawbacks of existing machine learning approaches and the open research problems in spam filtering. We recommended deep learning and deep adversarial learning as the future techniques that can effectively handle the menace of spam emails

Download Full-text

Classification Framework for Healthy Hairs and Alopecia Areata: A Machine Learning (ML) Approach

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/1102083 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Choudhary Sobhan Shakeel ◽

Saad Jawaid Khan ◽

Beenish Chaudhry ◽

Syeda Fatima Aijaz ◽

Umer Hassan

Keyword(s):

Machine Learning ◽

Alopecia Areata ◽

Nearest Neighbor ◽

Autoimmune Disorder ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Framework ◽

Learning Techniques ◽

Machine Learning Model ◽

Image Set

Alopecia areata is defined as an autoimmune disorder that results in hair loss. The latest worldwide statistics have exhibited that alopecia areata has a prevalence of 1 in 1000 and has an incidence of 2%. Machine learning techniques have demonstrated potential in different areas of dermatology and may play a significant role in classifying alopecia areata for better prediction and diagnosis. We propose a framework pertaining to the classification of healthy hairs and alopecia areata. We used 200 images of healthy hairs from the Figaro1k dataset and 68 hair images of alopecia areata from the Dermnet dataset to undergo image preprocessing including enhancement and segmentation. This was followed by feature extraction including texture, shape, and color. Two classification techniques, i.e., support vector machine (SVM) and k -nearest neighbor (KNN), are then applied to train a machine learning model with 70% of the images. The remaining image set was used for the testing phase. With a 10-fold cross-validation, the reported accuracies of SVM and KNN are 91.4% and 88.9%, respectively. Paired sample T -test showed significant differences between the two accuracies with a p < 0.001 . SVM generated higher accuracy (91.4%) as compared to KNN (88.9%). The findings of our study demonstrate potential for better prediction in the field of dermatology.

Download Full-text

Machine learning techniques to predict daily rainfall amount

Journal Of Big Data ◽

10.1186/s40537-021-00545-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model ◽

Extreme Gradient Boosting

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

Identification and Implementation Perspective on the Stratification Algorithms in the Prognostication of Heart Disease using Machine Learning Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8663.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1223-1225

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Heart Diseases ◽

Classification Algorithm ◽

Machine Learning Techniques ◽

Data Set ◽

Learning Techniques ◽

Machine Learning Model ◽

Combination Of Classifiers ◽

Great Idea

Analysis of patient’s data is always a great idea to get accurate results on using classifiers. A combination of classifiers would give an accurate result than using a single classifier because one single classifier does not give accurate results but always appropriate ones. The aim is to predict the outcome feature of the data set. The “outcome” can contain only two values that is 0 and 1. 0 means patient doesn’t have heart disease and 1 means patient have heart diseases. So, there is a need to build a classification algorithm that can predict the Outcome feature of the test dataset with good accuracy. For this understanding the data is important, and then various classification algorithm can be tested. Then the best model can be selected which gives highest accuracy among all. The built model can then be given to the software developer for building the end user application using the selected machine learning model that will be able to predict the heart disease in a patient.

Download Full-text

Clinical Data Analysis for Prediction of Cardiovascular Disease Using Machine Learning Techniques

Computational Intelligence and Neuroscience ◽

10.1155/2022/2973324 ◽

2022 ◽

Vol 2022 ◽

pp. 1-13

Author(s):

Rajkumar Gangappa Nadakinamani ◽

A. Reyana ◽

Sandeep Kautish ◽

A. S. Vibith ◽

Yogita Gupta ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Cardiac Risk ◽

Machine Learning Algorithms ◽

Random Tree ◽

Machine Learning Techniques ◽

Learning Technology ◽

Tree Model ◽

Learning Techniques ◽

Machine Learning Model

Cardiovascular disease is difficult to detect due to several risk factors, including high blood pressure, cholesterol, and an abnormal pulse rate. Accurate decision-making and optimal treatment are required to address cardiac risk. As machine learning technology advances, the healthcare industry’s clinical practice is likely to change. As a result, researchers and clinicians must recognize the importance of machine learning techniques. The main objective of this research is to recommend a machine learning-based cardiovascular disease prediction system that is highly accurate. In contrast, modern machine learning algorithms such as REP Tree, M5P Tree, Random Tree, Linear Regression, Naive Bayes, J48, and JRIP are used to classify popular cardiovascular datasets. The proposed CDPS’s performance was evaluated using a variety of metrics to identify the best suitable machine learning model. When it came to predicting cardiovascular disease patients, the Random Tree model performed admirably, with the highest accuracy of 100%, the lowest MAE of 0.0011, the lowest RMSE of 0.0231, and the fastest prediction time of 0.01 seconds.

Download Full-text

Machine learning distributions of quantum ansatz with hierarchical structure

International Journal of Modern Physics B ◽

10.1142/s0217979220501969 ◽

2020 ◽

Vol 34 (20) ◽

pp. 2050196

Author(s):

Haozhen Situ ◽

Zhimin He

Keyword(s):

Machine Learning ◽

Quantum Systems ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Machine Learning Model ◽

Variational Autoencoder ◽

Systems Learning ◽

Near Term ◽

Learning Measurement

Machine learning techniques can help to represent and solve quantum systems. Learning measurement outcome distribution of quantum ansatz is useful for characterization of near-term quantum computing devices. In this work, we use the popular unsupervised machine learning model, variational autoencoder (VAE), to reconstruct the measurement outcome distribution of quantum ansatz. The number of parameters in the VAE are compared with the number of measurement outcomes. The numerical results show that VAE can efficiently learn the measurement outcome distribution with few parameters. The influence of entanglement on the task is also revealed.

Download Full-text