scholarly journals Data Mining Applied to Transportation Mode Classification Problem

Author(s):  
Andrea Vassilev
Author(s):  
Balazs Feil ◽  
Janos Abonyi

This chapter aims to give a comprehensive view about the links between fuzzy logic and data mining. It will be shown that knowledge extracted from simple data sets or huge databases can be represented by fuzzy rule-based expert systems. It is highlighted that both model performance and interpretability of the mined fuzzy models are of major importance, and effort is required to keep the resulting rule bases small and comprehensible. Therefore, in the previous years, soft computing based data mining algorithms have been developed for feature selection, feature extraction, model optimization, and model reduction (rule based simplification). Application of these techniques is illustrated using the wine data classification problem. The results illustrate that fuzzy tools can be applied in a synergistic manner through the nine steps of knowledge discovery.


Author(s):  
Hualin Wang ◽  
Xiaogang Su

This chapter presents an award-winning algorithm for the data mining competition of PAKDD 2007, in which the goal is to help a financial company to predict the likelihood of taking up a home loan for their credit card based customers. The involved data are very limited and characterized by very low buying rate. To tackle such an unbalanced classification problem, the authors apply a bagging algorithm based on probit model ensembles. One integral element of the algorithm is a special way of conducting the resampling in forming bootstrap samples. A brief justification is provided. This method offers a feasible and robust way to solve this difficult yet very common business problem.


2021 ◽  
Author(s):  
Yida Zhu ◽  
Haiyong Luo ◽  
Song Guo ◽  
Fang Zhao

Author(s):  
Johannes Gehrke

It is the goal of classification and regression to build a data mining model that can be used for prediction. To construct such a model, we are given a set of training records, each having several attributes. These attributes can either be numerical (for example, age or salary) or categorical (for example, profession or gender). There is one distinguished attribute, the dependent attribute; the other attributes are called predictor attributes. If the dependent attribute is categorical, the problem is a classification problem. If the dependent attribute is numerical, the problem is a regression problem. It is the goal of classification and regression to construct a data mining model that predicts the (unknown) value for a record where the value of the dependent attribute is unknown. (We call such a record an unlabeled record.) Classification and regression have a wide range of applications, including scientific experiments, medical diagnosis, fraud detection, credit approval, and target marketing (Hand, 1997). Many classification and regression models have been proposed in the literature, among the more popular models are neural networks, genetic algorithms, Bayesian methods, linear and log-linear models and other statistical methods, decision tables, and tree-structured models, the focus of this chapter (Breiman, Friedman, Olshen, & Stone, 1984). Tree-structured models, socalled decision trees, are easy to understand, they are non-parametric and thus do not rely on assumptions about the data distribution, and they have fast construction methods even for large training datasets (Lim, Loh, & Shih, 2000). Most data mining suites include tools for classification and regression tree construction (Goebel & Gruenwald, 1999).


2014 ◽  
Vol 571-572 ◽  
pp. 237-240
Author(s):  
Jing Ya Lu ◽  
Wan Li Zuo ◽  
Liang Zhu

Mining newsworthy events from a large number of microblogging information is not only the primary problem that several big microblogging websites need to solve, but also a new research field in micro-information age. For now, a lot of study about even recognizing has been made at home and abroad, but relatively rarely contrapose short text (microblogging message). The paper considers newsworthy event recognizing in short text as classification problem, utilizes the decision tree classification algorithm in data mining, sufficiently mines features of event in short text, and then recognizes the newsworthy event in microblogging. In the last, we verify the effect of the model.


2022 ◽  
Vol 2022 ◽  
pp. 1-17
Author(s):  
Rukhma Qasim ◽  
Waqas Haider Bangyal ◽  
Mohammed A. Alqarni ◽  
Abdulwahab Ali Almazroi

Text Classification problem has been thoroughly studied in information retrieval problems and data mining tasks. It is beneficial in multiple tasks including medical diagnose health and care department, targeted marketing, entertainment industry, and group filtering processes. A recent innovation in both data mining and natural language processing gained the attention of researchers from all over the world to develop automated systems for text classification. NLP allows categorizing documents containing different texts. A huge amount of data is generated on social media sites through social media users. Three datasets have been used for experimental purposes including the COVID-19 fake news dataset, COVID-19 English tweet dataset, and extremist-non-extremist dataset which contain news blogs, posts, and tweets related to coronavirus and hate speech. Transfer learning approaches do not experiment on COVID-19 fake news and extremist-non-extremist datasets. Therefore, the proposed work applied transfer learning classification models on both these datasets to check the performance of transfer learning models. Models are trained and evaluated on the accuracy, precision, recall, and F1-score. Heat maps are also generated for every model. In the end, future directions are proposed.


Sign in / Sign up

Export Citation Format

Share Document