Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text

High performance spatial data mining for very large data-sets (citation_only)

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03 ◽

10.1145/781498.781509 ◽

2003 ◽

Author(s):

Baris Kazar

Keyword(s):

Data Mining ◽

Spatial Data ◽

High Performance ◽

Large Data ◽

Spatial Data Mining ◽

Large Data Sets ◽

Data Sets

Download Full-text

From Visualisation to Data Mining with Large Data Sets

Proceedings of the 2005 Particle Accelerator Conference ◽

10.1109/pac.2005.1591735 ◽

2006 ◽

Author(s):

A. Adelmann ◽

R.D. Ryne ◽

J.M. Shalf ◽

C. Siegerist

Keyword(s):

Data Mining ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Research of Improved Attribute Reduction Algorithm Based on Data Mining of Rough Set

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.2120 ◽

2014 ◽

Vol 644-650 ◽

pp. 2120-2123 ◽

Cited By ~ 2

Author(s):

De Zhi An ◽

Guang Li Wu ◽

Jun Lu

Keyword(s):

Data Mining ◽

Rough Set ◽

Rough Set Theory ◽

Attribute Reduction ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Reduction Algorithm ◽

The Core ◽

Rules Extraction

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.

Download Full-text

A dynamic K-means clustering for data mining

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i2.pp521-526 ◽

2019 ◽

Vol 13 (2) ◽

pp. 521

Author(s):

Md. Zakir Hossain ◽

Md.Nasim Akhtar ◽

R.B. Ahmad ◽

Mostafijur Rahman

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Large Data ◽

Threshold Value ◽

Specific Pattern ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Number Of Clusters ◽

Data Points

<span>Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed.The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.</span>

Download Full-text

Diversification-based learning simulated annealing algorithm for hub location problems

Benchmarking An International Journal ◽

10.1108/bij-04-2018-0092 ◽

2019 ◽

Vol 26 (6) ◽

pp. 1995-2016

Author(s):

Himanshu Rathore ◽

Shirsendu Nandi ◽

Peeyush Pandey ◽

Surya Prakash Singh

Keyword(s):

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Convergence Rates ◽

Location Problems ◽

Computational Time ◽

Data Sets ◽

Learning Mechanisms ◽

Hub Location ◽

Content Type ◽

Annealing Algorithm

Purpose The purpose of this paper is to examine the efficacy of diversification-based learning (DBL) in expediting the performance of simulated annealing (SA) in hub location problems. Design/methodology/approach This study proposes a novel diversification-based learning simulated annealing (DBLSA) algorithm for solving p-hub median problems. It is executed on MATLAB 11.0. Experiments are conducted on CAB and AP data sets. Findings This study finds that in hub location models, DBLSA algorithm equipped with social learning operator outperforms the vanilla version of SA algorithm in terms of accuracy and convergence rates. Practical implications Hub location problems are relevant in aviation and telecommunication industry. This study proposes a novel application of a DBLSA algorithm to solve larger instances of hub location problems effectively in reasonable computational time. Originality/value To the best of the author’s knowledge, this is the first application of DBL in optimisation. By demonstrating its efficacy, this study steers research in the direction of learning mechanisms-based metaheuristic applications.

Download Full-text

From data to knowledge mining

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s089006040900016x ◽

2009 ◽

Vol 23 (4) ◽

pp. 427-441 ◽

Cited By ~ 6

Author(s):

Ana Cristina Bicharra Garcia ◽

Inhauma Ferraz ◽

Adriana S. Vivacqua

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Evaluation Criteria ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Mining Technique ◽

Mining Technique ◽

Data Points

AbstractMost past approaches to data mining have been based on association rules. However, the simple application of association rules usually only changes the user's problem from dealing with millions of data points to dealing with thousands of rules. Although this may somewhat reduce the scale of the problem, it is not a completely satisfactory solution. This paper presents a new data mining technique, called knowledge cohesion (KC), which takes into account a domain ontology and the user's interest in exploring certain data sets to extract knowledge, in the form of semantic nets, from large data sets. The KC method has been successfully applied to mine causal relations from oil platform accident reports. In a comparison with association rule techniques for the same domain, KC has shown a significant improvement in the extraction of relevant knowledge, using processing complexity and knowledge manageability as the evaluation criteria.

Download Full-text

DISCOVERY OF CAUSALITY POSSIBILITIES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001404003058 ◽

2004 ◽

Vol 18 (01) ◽

pp. 63-73 ◽

Cited By ~ 1

Author(s):

LAWRENCE MAZLACK

Keyword(s):

Data Mining ◽

Association Rules ◽

Joint Probability ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Large Databases ◽

Very Large Databases ◽

Predictive Relationships ◽

Strength Of Association

Determining causality has been a tantalizing goal throughout human history. Proper sacrifices to the gods were thought to bring rewards; failure to make suitable observations were thought to lead to disaster. Today, data mining holds the promise of extracting unsuspected information from very large databases. Methods have been developed to build association rules from large data sets. Association rules indicate the strength of association of two or more data attributes. In many ways, the interest in association rules is that they offer the promise (or illusion) of causal, or at least, predictive relationships. However, association rules only calculate a joint probability; they do not express a causal relationship. If causal relationships could be discovered, it would be very useful. Our goal is to explore causality in the data mining context.

Download Full-text

Survival Analysis of Python and R within the Job Market Trend

Journal of Information Technology and Computing ◽

10.48185/jitc.v1i1.94 ◽

2020 ◽

Vol 1 (1) ◽

pp. 31-40

Author(s):

Hina Afzal ◽

Arisha Kamran ◽

Asifa Noreen

Keyword(s):

Data Mining ◽

Survival Analysis ◽

Programming Languages ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Mining Techniques ◽

R Programming Language ◽

R Programming ◽

High Level

The market nowadays, due to the rapid changes happening in the technologies requires a high level of interaction between the educators and the fresher coming to going the market. The demand for IT-related jobs in the market is higher than all other fields, In this paper, we are going to discuss the survival analysis in the market of parallel two programming languages Python and R . Data sets are growing large and the traditional methods are not capable enough of handling the large data sets, therefore, we tried to use the latest data mining techniques through python and R programming language. It took several months of effort to gather such an amount of data and process it with the data mining techniques using python and R but the results showed that both languages have the same rate of growth over the past years.

Download Full-text

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

High performance spatial data mining for very large data-sets (citation_only)

From Visualisation to Data Mining with Large Data Sets

Research of Improved Attribute Reduction Algorithm Based on Data Mining of Rough Set

A dynamic K-means clustering for data mining

Diversification-based learning simulated annealing algorithm for hub location problems

From data to knowledge mining

DISCOVERY OF CAUSALITY POSSIBILITIES

Survival Analysis of Python and R within the Job Market Trend

Export Citation Format