Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Penerapan Metode Clustering Dengan Algoritma K-Means Tindak Kejahatan Pencurian di Kabupaten Asahan

J-Com (Journal of Computer) ◽

10.33330/j-com.v1i1.1065 ◽

2021 ◽

Vol 1 (1) ◽

pp. 7-14

Author(s):

Nur Afni Syahpitri Damanik ◽

Irianto Irianto ◽

Dahriansah Dahriansah

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Historical Data ◽

Large Data ◽

Large Data Sets ◽

Annual Data ◽

Data Sets ◽

Process Data ◽

Different Characteristics

Abstract:Theft is the illegal taking of property or belongings of another person without the permission of the owner. The most common crime problem in Asahan District is theft, so that the POLRES is still having trouble determining which areas are often the crime of theft. With this problem, we need to do a grouping for areas where theft often occurs, so the process used is the data mining process. Data mining is one of the processes of Knowledge Discovery from Databases (KDD). KDD is an activity that includes collecting, using historical data to find regularities, patterns or relationships in large data sets. One of the techniques known in data mining is clustering technique. The K-Means method is a method for clustering techniques, K- Means is a method that partitions data into groups so that data with the same characteristics are entered into the same set of groups and data with different characteristics are grouped into other groups. The attributes used in grouping this data are annual data, namely 2015, 2016, 2017, 2018, 2019. A case study of 9 POLSEK in the Asahan. Keywords: Data Mining, Clustering, K-Means Algorithm, Theft Crimes Grouping. Abstrak: Pencurian merupakan pengambilan properti atau barang milik orang lain secara tidak sah tanpa ijin dari pemilik. Masalah tindak kejahatan yang paling banyak terjadi di Kabupaten Asahan adalah tindak kejahatan pencurian sehingga pihak POLRES masih kesulitan untuk menentukan daerah mana saja yang sering terjadi tindak kejahatan pencuriaan. Dengan adanya masalah ini kita perlu melakukan pengelompokan untuk daerah mana saja yang sering terjadi tindak pencurian maka proses yang digunakan adalah proses data mining. Data mining adalah salah satu proses dari Knowledge Discovery from Databases (KDD). KDD adalah kegiatan yang meliputi pengumpulan, pemakaian data, historis untuk menemukan keteraturan, pola atau hubungan dalam set data besar. Salah satu teknik yang di kenal dalam data mining adalah teknik clustering. Metode K-Means merupakan metode untuk teknik clustering, K-Means adalah metode yang mempartisi data kedalam kelompok sehingga data berkarakteristik sama dimasukan kedalam set kelompok yang sama dan data yang berkerakteristik berbeda dikelompokkan ke dalam kelompok yang lain. Atribut yang di gunakan dalam pengelomokan data ini adalah data pertahun yaitu tahun 2015, 2016, 2017, 2018, 2019. Studi kasus pada 9 POLSEK yang ada di daerah kabupaten Asahan. Kata kunci: Data Mining, Clustering, Algoritma K-Means, Pengelompokan Tindak Kejahatan Pencurian.

Download Full-text

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

10.1109/icris52159.2020.00170 ◽

2020 ◽

Author(s):

Wenjie Jia ◽

Zhihong He

Keyword(s):

Data Mining ◽

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Annealing Algorithm ◽

Outlier Data ◽

Fast Decomposition

Download Full-text

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Design Engineering ◽

10.17762/de.v2021i04.1800 ◽

2021 ◽

pp. 1826-1839

Author(s):

Sandeep Adhikari, Dr. Sunita Chaudhary

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Decision Tree Classifier ◽

Tree Classifier

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text

High performance spatial data mining for very large data-sets (citation_only)

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03 ◽

10.1145/781498.781509 ◽

2003 ◽

Author(s):

Baris Kazar

Keyword(s):

Data Mining ◽

Spatial Data ◽

High Performance ◽

Large Data ◽

Spatial Data Mining ◽

Large Data Sets ◽

Data Sets

Download Full-text

From Visualisation to Data Mining with Large Data Sets

Proceedings of the 2005 Particle Accelerator Conference ◽

10.1109/pac.2005.1591735 ◽

2006 ◽

Author(s):

A. Adelmann ◽

R.D. Ryne ◽

J.M. Shalf ◽

C. Siegerist

Keyword(s):

Data Mining ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Research of Improved Attribute Reduction Algorithm Based on Data Mining of Rough Set

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.2120 ◽

2014 ◽

Vol 644-650 ◽

pp. 2120-2123 ◽

Cited By ~ 2

Author(s):

De Zhi An ◽

Guang Li Wu ◽

Jun Lu

Keyword(s):

Data Mining ◽

Rough Set ◽

Rough Set Theory ◽

Attribute Reduction ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Reduction Algorithm ◽

The Core ◽

Rules Extraction

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.

Download Full-text

A dynamic K-means clustering for data mining

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i2.pp521-526 ◽

2019 ◽

Vol 13 (2) ◽

pp. 521

Author(s):

Md. Zakir Hossain ◽

Md.Nasim Akhtar ◽

R.B. Ahmad ◽

Mostafijur Rahman

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Large Data ◽

Threshold Value ◽

Specific Pattern ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Number Of Clusters ◽

Data Points

<span>Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed.The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.</span>

Download Full-text

From data to knowledge mining

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s089006040900016x ◽

2009 ◽

Vol 23 (4) ◽

pp. 427-441 ◽

Cited By ~ 6

Author(s):

Ana Cristina Bicharra Garcia ◽

Inhauma Ferraz ◽

Adriana S. Vivacqua

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Evaluation Criteria ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Mining Technique ◽

Mining Technique ◽

Data Points

AbstractMost past approaches to data mining have been based on association rules. However, the simple application of association rules usually only changes the user's problem from dealing with millions of data points to dealing with thousands of rules. Although this may somewhat reduce the scale of the problem, it is not a completely satisfactory solution. This paper presents a new data mining technique, called knowledge cohesion (KC), which takes into account a domain ontology and the user's interest in exploring certain data sets to extract knowledge, in the form of semantic nets, from large data sets. The KC method has been successfully applied to mine causal relations from oil platform accident reports. In a comparison with association rule techniques for the same domain, KC has shown a significant improvement in the extraction of relevant knowledge, using processing complexity and knowledge manageability as the evaluation criteria.

Download Full-text

DISCOVERY OF CAUSALITY POSSIBILITIES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001404003058 ◽

2004 ◽

Vol 18 (01) ◽

pp. 63-73 ◽

Cited By ~ 1

Author(s):

LAWRENCE MAZLACK

Keyword(s):

Data Mining ◽

Association Rules ◽

Joint Probability ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Large Databases ◽

Very Large Databases ◽

Predictive Relationships ◽

Strength Of Association

Determining causality has been a tantalizing goal throughout human history. Proper sacrifices to the gods were thought to bring rewards; failure to make suitable observations were thought to lead to disaster. Today, data mining holds the promise of extracting unsuspected information from very large databases. Methods have been developed to build association rules from large data sets. Association rules indicate the strength of association of two or more data attributes. In many ways, the interest in association rules is that they offer the promise (or illusion) of causal, or at least, predictive relationships. However, association rules only calculate a joint probability; they do not express a causal relationship. If causal relationships could be discovered, it would be very useful. Our goal is to explore causality in the data mining context.

Download Full-text

Survival Analysis of Python and R within the Job Market Trend

Journal of Information Technology and Computing ◽

10.48185/jitc.v1i1.94 ◽

2020 ◽

Vol 1 (1) ◽

pp. 31-40

Author(s):

Hina Afzal ◽

Arisha Kamran ◽

Asifa Noreen

Keyword(s):

Data Mining ◽

Survival Analysis ◽

Programming Languages ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Mining Techniques ◽

R Programming Language ◽

R Programming ◽

High Level

The market nowadays, due to the rapid changes happening in the technologies requires a high level of interaction between the educators and the fresher coming to going the market. The demand for IT-related jobs in the market is higher than all other fields, In this paper, we are going to discuss the survival analysis in the market of parallel two programming languages Python and R . Data sets are growing large and the traditional methods are not capable enough of handling the large data sets, therefore, we tried to use the latest data mining techniques through python and R programming language. It took several months of effort to gather such an amount of data and process it with the data mining techniques using python and R but the results showed that both languages have the same rate of growth over the past years.

Download Full-text