Machine learning models for bank reviews classification

Using the banking products and services review corpus, analysis is conducted to establish different text classification models. The paper explores different approaches to the processing of unstructured textual information. Based on the selected approaches, the review corpus on banking products and services received during the COVID-19 pandemic is analyzed. An automatic Internet resources parser has been developed to obtain the required training sample. Software has been developed that implemens basic methods for the classification models construction. This model can be used to create system for monitoring people’s attitudes to banking processes.

Download Full-text

Multi-Class Text Classification Using Machine Learning Models for Online Drug Reviews

2021 IEEE World AI IoT Congress (AIIoT) ◽

10.1109/aiiot52608.2021.9454250 ◽

2021 ◽

Author(s):

Shreehar Joshi ◽

Eman Abdelfattah

Keyword(s):

Machine Learning ◽

Text Classification ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Online learning behavior analysis based on machine learning

Asian Association of Open Universities Journal ◽

10.1108/aaouj-08-2019-0029 ◽

2019 ◽

Vol 14 (2) ◽

pp. 97-106

Author(s):

Ning Yan ◽

Oliver Tat-Sheung Au

Keyword(s):

Machine Learning ◽

Online Learning ◽

Correlation Analysis ◽

Prediction Accuracy ◽

Classification Models ◽

Limited Data ◽

Learning Models ◽

Learning Behavior ◽

Content Type ◽

Machine Learning Models

Purpose The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction model based on limited data. Design/methodology/approach The prediction label in this paper is the course grade of students, and the eigenvalues available are student age, student gender, connection time, hits count and days of access. The machine learning model used in this paper is the classical three-layer feedforward neural networks, and the scaled conjugate gradient algorithm is adopted. Pearson correlation analysis method is used to find the relationships between course grade and the student eigenvalues. Findings Days of access has the highest correlation with course grade, followed by hits count, and connection time is less relevant to students’ course grade. Student age and gender have the lowest correlation with course grade. Binary classification models have much higher prediction accuracy than multi-class classification models. Data normalization and data discretization can effectively improve the prediction accuracy of machine learning models, such as ANN model in this paper. Originality/value This paper may help teachers to find some clue to identify students with learning difficulties in advance and give timely help through the online learning behavior data. It shows that acceptable prediction models based on machine learning can be built using a small and limited data set. However, introducing external data into machine learning models to improve its prediction accuracy is still a valuable and hard issue.

Download Full-text

The influence of training sample size on the accuracy of deep learning models for the prediction of soil properties with near-infrared spectroscopy data

SOIL ◽

10.5194/soil-6-565-2020 ◽

2020 ◽

Vol 6 (2) ◽

pp. 565-578

Author(s):

Wartini Ng ◽

Budiman Minasny ◽

Wanderson de Sousa Mendes ◽

José Alexandre Melo Demattê

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Soil Properties ◽

Sample Size ◽

Training Sample ◽

Calibration Data ◽

Learning Models ◽

Data Set ◽

Calibration Data Set ◽

Machine Learning Models

Abstract. The number of samples used in the calibration data set affects the quality of the generated predictive models using visible, near and shortwave infrared (VIS–NIR–SWIR) spectroscopy for soil attributes. Recently, the convolutional neural network (CNN) has been regarded as a highly accurate model for predicting soil properties on a large database. However, it has not yet been ascertained how large the sample size should be for CNN model to be effective. This paper investigates the effect of the training sample size on the accuracy of deep learning and machine learning models. It aims at providing an estimate of how many calibration samples are needed to improve the model performance of soil properties predictions with CNN as compared to conventional machine learning models. In addition, this paper also looks at a way to interpret the CNN models, which are commonly labelled as a black box. It is hypothesised that the performance of machine learning models will increase with an increasing number of training samples, but it will plateau when it reaches a certain number, while the performance of CNN will keep improving. The performances of two machine learning models (partial least squares regression – PLSR; Cubist) are compared against the CNN model. A VIS–NIR–SWIR spectra library from Brazil, containing 4251 unique sites with averages of two to three samples per depth (a total of 12 044 samples), was divided into calibration (3188 sites) and validation (1063 sites) sets. A subset of the calibration data set was then created to represent a smaller calibration data set ranging from 125, 300, 500, 1000, 1500, 2000, 2500 and 2700 unique sites, which is equivalent to a sample size of approximately 350, 840, 1400, 2800, 4200, 5600, 7000 and 7650. All three models (PLSR, Cubist and CNN) were generated for each sample size of the unique sites for the prediction of five different soil properties, i.e. cation exchange capacity, organic carbon, sand, silt and clay content. These calibration subset sampling processes and modelling were repeated 10 times to provide a better representation of the model performances. Learning curves showed that the accuracy increased with an increasing number of training samples. At a lower number of samples (< 1000), PLSR and Cubist performed better than CNN. The performance of CNN outweighed the PLSR and Cubist model at a sample size of 1500 and 1800, respectively. It can be recommended that deep learning is most efficient for spectra modelling for sample sizes above 2000. The accuracy of the PLSR and Cubist model seems to reach a plateau above sample sizes of 4200 and 5000, respectively, while the accuracy of CNN has not plateaued. A sensitivity analysis of the CNN model demonstrated its ability to determine important wavelengths region that affected the predictions of various soil attributes.

Download Full-text

News Article Text Classification and Summary for Authors and Topics

10.5121/csit.2020.101401 ◽

2020 ◽

Author(s):

Aviel J. Stein ◽

Janith Weerasinghe ◽

Spiros Mancoridis ◽

Rachel Greenstadt

Keyword(s):

Machine Learning ◽

Random Forests ◽

Text Classification ◽

Authorship Attribution ◽

News Article ◽

Support Vector ◽

The Internet ◽

Original Text ◽

Learning Models ◽

Machine Learning Models

News articles are important for providing timely, historic information. However, the Internet is replete with text that may contain irrelevant or unhelpful information, therefore means of processing it and distilling content is important and useful to human readers as well as information extracting tools. Some common questions we may want to answer are “what is this article about?” and “who wrote it?”. In this work we compare machine learning models for evaluating two common NLP tasks, topic and authorship attribution, on the 2017 Vox Media dataset. Additionally, we use the models to classify on a subsection, about ~20%, of the original text which show to be better for classification than the provided blurbs. Because of the large number of topics, we take into account topic overlap and address it via top-n accuracy and hierarchical groupings of topics. We also consider edge cases in authorship by classifying on inter-topic and intra-topic author distributions. Our results show that both topics and authors readily identifiable consistently perform best when using neural networks rather than support vector, random forests, or naive Bayes classifiers, although the latter methods perform acceptably.

Download Full-text

Classification Models for Bank Marketing Campaign: Towards Smart Bank Marketing

American Journal of Business and Operations Research ◽

10.54216/ajbor.050102 ◽

2021 ◽

pp. 21-30

Author(s):

Ahmad Freij ◽

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Linear Regression ◽

Support Vector ◽

Classification Models ◽

Learning Models ◽

Marketing Campaign ◽

Bank Marketing ◽

Machine Learning Models

In this paper, we have proposed two models of marketing classification which are Support Vector Machine (SVM) and Linear regression, these two models are the most popular and useful models of classification. In this paper, we represent how these two models are used for a case study of a bank marketing campaign, the dataset is related to a bank marketing campaign, and for Applying the machine learning models of classification, the RapidMiner software was used.

Download Full-text

A comparative analysis of machine learning models for quality pillar assessment of SaaS services by multi-class text classification of users’ reviews

Future Generation Computer Systems ◽

10.1016/j.future.2019.06.022 ◽

2019 ◽

Vol 101 ◽

pp. 341-371 ◽

Cited By ~ 6

Author(s):

Muhammad Raza ◽

Farookh Khadeer Hussain ◽

Omar Khadeer Hussain ◽

Ming Zhao ◽

Zia ur Rehman

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Text Classification ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Text classification using Fuzzy TF-IDF and Machine Learning Models

Proceedings of the 4th International Conference on Big Data and Internet of Things ◽

10.1145/3372938.3372956 ◽

2019 ◽

Author(s):

Mariem Bounabi ◽

Karim El Moutaouakil ◽

Khalid Satori

Keyword(s):

Machine Learning ◽

Text Classification ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Prospects for Generative - Adversarial Networks in Network Traffic Classification Tasks

Journal of Physics Conference Series ◽

10.1088/1742-6596/2096/1/012174 ◽

2021 ◽

Vol 2096 (1) ◽

pp. 012174

Author(s):

G D Asyaev

Keyword(s):

Machine Learning ◽

Network Traffic ◽

Training Sample ◽

Generative Adversarial Networks ◽

Traffic Classification ◽

Learning Models ◽

Generative Adversarial Network ◽

Adversarial Networks ◽

Network Traffic Classification ◽

Machine Learning Models

Abstract The paper presents an approach that allows increasing the training sample and reducing class imbalance for traffic classification problems. The basic principles and architecture of generative adversarial networks are considered. The mathematical model of network traffic classification is described. The training sample taken to solve the problem has been analyzed. The data proprocessing is carried out and justified. An architecture of the generative-adversarial network is constructed and an algorithm for generating new features is developed. Machine learning models for traffic classification problem were considered and built: Logistic regression, k Nearest Neighbors, Decision tree, Random forest. A comparative analysis of the results of machine learning models without and with the generation of new features is conducted. The obtained results can be applied both in the tasks of network traffic classification, and in general cases of multiclass classification and exclusion of unbalanced features.

Download Full-text

Multi Faceted Text Classification using Supervised Machine Learning Models

10.31979/etd.7crd-u5pw ◽

2016 ◽

Author(s):

Abhiteja Gajjala

Keyword(s):

Machine Learning ◽

Text Classification ◽

Supervised Machine Learning ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text