Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Author(s):  
Patricia A. Abbott
2021 ◽  
Vol 1 (1) ◽  
pp. 7-14
Author(s):  
Nur Afni Syahpitri Damanik ◽  
Irianto Irianto ◽  
Dahriansah Dahriansah

Abstract:Theft is the illegal taking of property or belongings of another person without the permission of the owner. The most common crime problem in Asahan District is theft, so that the POLRES is still having trouble determining which areas are often the crime of theft. With this problem, we need to do a grouping for areas where theft often occurs, so the process used  is the data mining process. Data mining is one of the processes of Knowledge Discovery from Databases (KDD). KDD is an activity that includes collecting, using historical data to find regularities, patterns or relationships in large data sets. One of the techniques known in data mining is clustering technique. The K-Means method is a method for clustering techniques, K- Means is a method that partitions data into groups so that data with the same characteristics are entered into the same set of groups and data with different characteristics are grouped into other groups. The attributes used in grouping this data are annual data, namely 2015, 2016, 2017, 2018, 2019. A case study of 9 POLSEK in the Asahan. Keywords: Data Mining, Clustering, K-Means Algorithm, Theft Crimes Grouping.  Abstrak: Pencurian merupakan pengambilan properti atau barang milik orang lain secara tidak sah tanpa ijin dari pemilik. Masalah tindak kejahatan yang paling banyak terjadi di Kabupaten Asahan adalah tindak kejahatan pencurian sehingga pihak POLRES masih kesulitan untuk menentukan daerah mana saja yang sering terjadi tindak kejahatan pencuriaan. Dengan adanya masalah ini kita perlu melakukan pengelompokan untuk daerah mana saja yang sering terjadi tindak pencurian maka proses yang digunakan adalah proses data mining. Data mining adalah salah satu proses dari Knowledge Discovery from Databases (KDD). KDD adalah kegiatan yang meliputi pengumpulan, pemakaian data, historis untuk menemukan keteraturan, pola atau hubungan dalam set data besar. Salah satu teknik yang di kenal dalam data mining adalah teknik clustering. Metode K-Means merupakan metode untuk teknik clustering, K-Means adalah metode yang mempartisi data kedalam kelompok sehingga data berkarakteristik sama dimasukan kedalam set kelompok yang sama dan data yang berkerakteristik berbeda dikelompokkan ke dalam kelompok yang lain. Atribut yang di gunakan dalam pengelomokan data ini adalah data pertahun yaitu tahun 2015, 2016, 2017, 2018, 2019. Studi kasus pada 9 POLSEK yang ada di daerah kabupaten Asahan. Kata kunci: Data Mining, Clustering, Algoritma K-Means, Pengelompokan Tindak Kejahatan  Pencurian.


2021 ◽  
pp. 1826-1839
Author(s):  
Sandeep Adhikari, Dr. Sunita Chaudhary

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.


Author(s):  
A. Adelmann ◽  
R.D. Ryne ◽  
J.M. Shalf ◽  
C. Siegerist

2014 ◽  
Vol 644-650 ◽  
pp. 2120-2123 ◽  
Author(s):  
De Zhi An ◽  
Guang Li Wu ◽  
Jun Lu

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.


Author(s):  
Md. Zakir Hossain ◽  
Md.Nasim Akhtar ◽  
R.B. Ahmad ◽  
Mostafijur Rahman

<span>Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets.  The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed.The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.</span>


Author(s):  
Ana Cristina Bicharra Garcia ◽  
Inhauma Ferraz ◽  
Adriana S. Vivacqua

AbstractMost past approaches to data mining have been based on association rules. However, the simple application of association rules usually only changes the user's problem from dealing with millions of data points to dealing with thousands of rules. Although this may somewhat reduce the scale of the problem, it is not a completely satisfactory solution. This paper presents a new data mining technique, called knowledge cohesion (KC), which takes into account a domain ontology and the user's interest in exploring certain data sets to extract knowledge, in the form of semantic nets, from large data sets. The KC method has been successfully applied to mine causal relations from oil platform accident reports. In a comparison with association rule techniques for the same domain, KC has shown a significant improvement in the extraction of relevant knowledge, using processing complexity and knowledge manageability as the evaluation criteria.


Author(s):  
LAWRENCE MAZLACK

Determining causality has been a tantalizing goal throughout human history. Proper sacrifices to the gods were thought to bring rewards; failure to make suitable observations were thought to lead to disaster. Today, data mining holds the promise of extracting unsuspected information from very large databases. Methods have been developed to build association rules from large data sets. Association rules indicate the strength of association of two or more data attributes. In many ways, the interest in association rules is that they offer the promise (or illusion) of causal, or at least, predictive relationships. However, association rules only calculate a joint probability; they do not express a causal relationship. If causal relationships could be discovered, it would be very useful. Our goal is to explore causality in the data mining context.


2020 ◽  
Vol 1 (1) ◽  
pp. 31-40
Author(s):  
Hina Afzal ◽  
Arisha Kamran ◽  
Asifa Noreen

The market nowadays, due to the rapid changes happening in the technologies requires a high level of interaction between the educators and the fresher coming to going the market. The demand for IT-related jobs in the market is higher than all other fields, In this paper, we are going to discuss the survival analysis in the market of parallel two programming languages Python and R . Data sets are growing large and the traditional methods are not capable enough of handling the large data sets, therefore, we tried to use the latest data mining techniques through python and R programming language. It took several months of effort to gather such an amount of data and process it with the data mining techniques using python and R but the results showed that both languages have the same rate of growth over the past years.


Sign in / Sign up

Export Citation Format

Share Document