Detecting E-Commerce Phishing Website By
Data Mining

We will use a data mining techniques and machine learning that will generate a classification model from a training dataset and then this model will be applied to testing dataset which will show the websites as malicious or legitimate. We will compare the accuracy and time taken of two models generated and conclude which is better among those two.

Download Full-text

PREDICTION OF CREDIT CARD PAYMENT NEXT MONTH THROUGH TREE NET DATA MINING TECHNIQUES

International Journal of Computing ◽

10.47839/ijc.19.1.1698 ◽

2020 ◽

pp. 97-105

Author(s):

Ahmed Mohammed Hussein ◽

Hadeel Qasem Gheni ◽

Wed Kadhim Oleiwi ◽

Zahraa Yaseen Hasan

Keyword(s):

Data Mining ◽

Credit Card ◽

Confusion Matrix ◽

True Positive Rate ◽

Training Dataset ◽

Data Mining Techniques ◽

Testing Dataset ◽

Positive Rate ◽

Optimal Tree ◽

Card Payment

A number of research initiatives have recently been launched around the world regarding the conceptualization, specification, design and development principles of the future use of credit cards, storing secret information on them, while most time we use them for online payment. In addition, if it has enough money, we can pay for what we need at any time. Therefore, the goal of this proposed research is to use data mining techniques to predict credit card payment next month. Our proposed system contains five steps: (a) find the suitable database from the internet because this database is not available in Iraq, (b) pre-process the credit card database based on person correlation matrix to determine which feature is less correlated with other to remove it and reduce the time of prediction, (c) split pre-processing database into two parts training and testing dataset, (d) apply TreeNet prediction data mining techniques (TPDMT) on training dataset to test if we need payment next month or do not, find the optimal tree. TreeNet based on Boosting Machine usually makes the predictor to use Decision Trees (DTs). (e ) Finally, pass the testing dataset on the optimal tree results from TPDMT, then using the five measures related to confusion matrix to evaluate the results including “Accuracy (AC), recall or true positive rate (TP), precision (P), F-measure (considers both precision and recall) and Fb”.

Download Full-text

Data mining techniques with machine learning algorithm to predict patients of heart disease

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1088/1/012035 ◽

2021 ◽

Vol 1088 (1) ◽

pp. 012035

Author(s):

Mulyawan ◽

Agus Bahtiar ◽

Githera Dwilestari ◽

Fadhil Muhammad Basysyar ◽

Nana Suarna

Keyword(s):

Machine Learning ◽

Data Mining ◽

Heart Disease ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Data Mining Techniques

Download Full-text

Data Mining-based Financial Statement Fraud Detection: Systematic Literature Review and Meta-analysis to Estimate Data Sample Mapping of Fraudulent Companies Against Non-fraudulent Companies

Global Business Review ◽

10.1177/0972150920984857 ◽

2021 ◽

pp. 097215092098485

Author(s):

Sonika Gupta ◽

Sushil Kumar Mehta

Keyword(s):

Machine Learning ◽

Data Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Classification Accuracy ◽

Meta Analysis ◽

Financial Statement ◽

Research Articles ◽

Financial Statement Fraud ◽

Data Mining Techniques

Data mining techniques have proven quite effective not only in detecting financial statement frauds but also in discovering other financial crimes, such as credit card frauds, loan and security frauds, corporate frauds, bank and insurance frauds, etc. Classification of data mining techniques, in recent years, has been accepted as one of the most credible methodologies for the detection of symptoms of financial statement frauds through scanning the published financial statements of companies. The retrieved literature that has used data mining classification techniques can be broadly categorized on the basis of the type of technique applied, as statistical techniques and machine learning techniques. The biggest challenge in executing the classification process using data mining techniques lies in collecting the data sample of fraudulent companies and mapping the sample of fraudulent companies against non-fraudulent companies. In this article, a systematic literature review (SLR) of studies from the area of financial statement fraud detection has been conducted. The review has considered research articles published between 1995 and 2020. Further, a meta-analysis has been performed to establish the effect of data sample mapping of fraudulent companies against non-fraudulent companies on the classification methods through comparing the overall classification accuracy reported in the literature. The retrieved literature indicates that a fraudulent sample can either be equally paired with non-fraudulent sample (1:1 data mapping) or be unequally mapped using 1:many ratio to increase the sample size proportionally. Based on the meta-analysis of the research articles, it can be concluded that machine learning approaches, in comparison to statistical approaches, can achieve better classification accuracy, particularly when the availability of sample data is low. High classification accuracy can be obtained with even a 1:1 mapping data set using machine learning classification approaches.

Download Full-text

Hybrid classification model to detect advanced intrusions using data mining techniques

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.4.10031 ◽

2018 ◽

Vol 7 (2.4) ◽

pp. 10

Author(s):

V Mala ◽

K Meena

Keyword(s):

Data Mining ◽

Learning Algorithm ◽

Detection System ◽

Classification Model ◽

Detection Methods ◽

Data Mining Techniques ◽

Detection Systems ◽

Intruder Detection ◽

Hybrid Classification ◽

Using Data

Traditional signature based approach fails in detecting advanced malwares like stuxnet, flame, duqu etc. Signature based comparison and correlation are not up to the mark in detecting such attacks. Hence, there is crucial to detect these kinds of attacks as early as possible. In this research, a novel data mining based approach were applied to detect such attacks. The main innovation lies on Misuse signature detection systems based on supervised learning algorithm. In learning phase, labeled examples of network packets systems calls are (gave) provided, on or after which algorithm can learn about the attack which is fast and reliable to known. In order to detect advanced attacks, unsupervised learning methodologies were employed to detect the presence of zero day/ new attacks. The main objective is to review, different intruder detection methods. To study the role of Data Mining techniques used in intruder detection system. Hybrid –classification model is utilized to detect advanced attacks.

Download Full-text

Emotion Detection using Social Media and Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36117 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 4491-4494

Author(s):

Mr. Bhavar Shivam S.

Keyword(s):

Machine Learning ◽

Social Networking ◽

Social Networking Sites ◽

Training Dataset ◽

Emotion Detection ◽

Pos Tagging ◽

Testing Dataset ◽

Svm Algorithm ◽

The Right ◽

Negative Sentiment

Today we do a lot of things online from shopping to data sharing on social networking sites. Social networking (SNS) is good for releasing stress and depression by sharing one’s thoughts. Thus, emotion detection has become a hot trend to day. But there is a problem in analyzing emotions on a SNS like twitter as it generates lakhs of tweets each day and it is hard to keep track of the emotion behind each tweet as it is impossible for a human being to read and decide the emotions behind tweets. So, to help understand behind the texts in a SNS site we thought of designing a project which will keep track of the tweets and predict the right emotion behind the tweets whether they have a positive or a negative sentiment behind them. This thought of project can be achieved by a integration of SNS with NLP and machine learning together. For SNS we will use Twitter as it generates a lot of data which is accessible freely using an API. First, we will enter a keyword and fetch tweets from the twitter. Then stop words will be removed from these tweets using NLTK stop words database. Then the tweets will be passed for POS tagging and only right form of grammatical words will be kept and others will be removed. Then we create a training dataset with two types positive and negative. Then SVM algorithm will be trained using this training dataset. Then each tweet will be passed to the SVM as testing dataset which in turn will return classification of each tweet as a whole in two classes positive and negative. Thus, our application will be helpful in recognizing emotion behind a tweet.

Download Full-text

Performance Analysis and Prediction Student Performance to build effective student Using Data Mining Techniques

UHD Journal of Science and Technology ◽

10.21928/uhdjst.v3n2y2019.pp10-15 ◽

2019 ◽

Vol 3 (2) ◽

pp. 10

Author(s):

Ardalan Husin Awlla

Keyword(s):

Data Mining ◽

Performance Analysis ◽

Student Performance ◽

Fraud Detection ◽

Classification Model ◽

Data Mining Techniques ◽

Incipient Stage ◽

The Everyday ◽

Using Data

In this period of computerization, schooling has additionally remodeled itself and is not restrained to old lecture technique. The everyday quest is on to discover better approaches to make it more successful and productive for students. These days, masses of data are gathered in educational databases, however it stays unutilized. To be able to get required advantages from such major information, effective tools are required. Data mining is a developing capable tool for examination and expectation. It is effectively applied in the field of fraud detection, marketing, promoting, forecast and loan assessment. However, it is in incipient stage in the area of education. In this paper, data mining techniques have been applied to construct a classification model to predict the performance of students.

Download Full-text

Business Intelligence using Machine Learning and Data Mining techniques - An analysis

2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA) ◽

10.1109/iceca.2018.8474847 ◽

2018 ◽

Cited By ~ 2

Author(s):

Ruchi Sharma ◽

Pravin Srinath

Keyword(s):

Machine Learning ◽

Data Mining ◽

Business Intelligence ◽

Data Mining Techniques

Download Full-text

Prediction of Skin Diseases Using Machine Learning

10.4018/978-1-7998-7888-9.ch008 ◽

2022 ◽

pp. 154-178

Author(s):

Siddhartha Kumar Arjaria ◽

Vikas Raj ◽

Sunil Kumar ◽

Priyanshu Shrivastava ◽

Monu Kumar ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Skin Disease ◽

Skin Diseases ◽

Information Gain ◽

Machine Learning Algorithms ◽

Ensemble Method ◽

Chi Square ◽

Data Mining Techniques ◽

Disease Rates

Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.

Download Full-text

A Survey on Building Recommendation Systems Using Data Mining Techniques

10.4018/978-1-7998-8413-2.ch002 ◽

2022 ◽

pp. 24-56

Author(s):

Rajab Ssemwogerere ◽

Wamwoyo Faruk ◽

Nambobi Mutwalibi

Keyword(s):

Machine Learning ◽

Data Mining ◽

Recommender Systems ◽

Performance Measures ◽

Data Mining Technique ◽

Data Mining Techniques ◽

Learning Hypothesis ◽

Depth Study ◽

And Performance ◽

Using Data

Classification is a data mining technique or approach used to estimate the grouped membership of items on a basis of a common feature. This technique is virtuous for future planning and discovering new knowledge about a specific dataset. An in-depth study of previous pieces of literature implementing data mining techniques in the design of recommender systems was performed. This chapter provides a broad study of the way of designing recommender systems using various data mining classification techniques of machine learning and also exploiting their methodological decisions in four aspects, the recommendation approaches, data mining techniques, recommendation types, and performance measures. This study focused on some selected classification methods and can be so supportive for both the researchers and the students in the field of computer science and machine learning in strengthening their knowledge about the machine learning hypothesis and data mining.

Download Full-text

Analysis of flight delays in aviation system using different classification algorithms and feature selection methods

The Aeronautical Journal ◽

10.1017/aer.2019.72 ◽

2019 ◽

Vol 123 (1267) ◽

pp. 1415-1436 ◽

Cited By ~ 1

Author(s):

A. B. A. Anderson ◽

A. J. Sanjeev Kumar ◽

A. B. Arockia Christopher

Keyword(s):

Data Mining ◽

Feature Selection ◽

Classification Model ◽

System Level ◽

Support Vector ◽

Flight Delays ◽

Data Mining Techniques ◽

Mining Methods ◽

Artificial Neural Network Ann ◽

Aircraft System

ABSTRACTData mining is a process of finding correlations and collecting and analysing a huge amount of data in a database to discover patterns or relationships. Flight delay creates significant problems in the present aviation system. Data mining techniques are desired for analysing the performance in which micro-level causes propagate to make system-level patterns of delay. Analysing flight delays is very difficult – both when looking from a historical view as well as when estimating delays with forecast demand. This paper proposes using Decision Tree (DT), Support Vector Machine (SVM), Naive Bayesian (NB), K-nearest neighbour (KNN) and Artificial Neural Network (ANN) to study and analyse delays among aircrafts. The performance of different data mining methods is found in the different regions of the updated datasets on these classifiers. Finally, the result shows a significant variation in the performance of different data mining methods and feature selection for this problem. This paper aims to deal with how data mining techniques can be used to understand difficult aircraft system delays in aviation. Our aim is to develop a classification model for studying and reducing delay using different data mining methods and, in this manner, to show that DT has a greater classification accuracy. The different feature selectors are used in this study in order to reduce the number of initial attributes. Our results clearly demonstrate the value of DT for analysing and visualising how system-level effects happen from subsystem-level causes.

Download Full-text