A Survey of Methodologies and Techniques for Data Mining and Intelligent Data Discovery

Author(s):  
Ricardo Gonzalez ◽  
Ali Kamrani
Keyword(s):  
2017 ◽  
Vol 8 (3) ◽  
pp. 1-18 ◽  
Author(s):  
Mohamed Elhadi Rahmani ◽  
Abdelmalek Amine ◽  
Reda Mohamed Hamou

Bio-inspired algorithms are sort of implementation of natural solutions to solve hard problems – so called NP problems. A seismic hazard is the probability that an earthquake will occur in a given geographic area, within a given window of time, and with ground motion intensity exceeding a given threshold. Seismic hazards prediction is one of the fields where data mining plays an important role. This paper presents a new bio-inspired algorithm motivated by the echolocation behavior of bats for seismic hazard states prediction in coal mines based on previously recorded data. It is a distance calculation based approach, Results were very satisfactory in a manner that encourage us to continue working on this approach. The implementation of the algorithm touches three fields of studies, data discovery or so called data mining, bio inspired techniques, and seismic hazards predictions.


Data Mining ◽  
2013 ◽  
pp. 50-65
Author(s):  
Frederick E. Petry

This chapter focuses on the application of the discovery of association rules in approaches vague spatial databases. The background of data mining and uncertainty representations using rough set and fuzzy set techniques is provided. The extensions of association rule extraction for uncertain data as represented by rough and fuzzy sets is described. Finally, an example of rule extraction for both types of uncertainty representations is given.


Author(s):  

Web usage mining is a part of data mining. Data usage mining is divided into three parts 1) Data content mining 2) Data structured mining 3) Data usage mining. In this paper I am discussing about log files which are used in data usage mining. Log files are used to store user’s activity in web server using websites. So that websites can be improved by gathering user data. Web usage mining having three sub parts which is reprocessing, data discovery and data analysis. Further, in this paper, details about web log files are discussed. Three algorithms are discussed which are used for patterns of log files. There comparison is showed in this paper with the help of graphs.


Author(s):  
Arvind Singh

Health care is one of the speedy growing areas. The Health care system contains large amount of medical data which should be mined from data warehouse. The mined data from data warehouse helps in finding the important information. Comprehensive amount of data in health care database need the growth of tools which can be used to access the data, analyze and analysis the data, discovery of knowledge, and versed use of the stored knowledge. The health care system has lot of data about the patient’s details, medications etc. In this paper we have studied different data mining and warehousing techniques used in healthcare areas.


2020 ◽  
Vol 7 (2) ◽  
pp. 417
Author(s):  
Ikhsan Wisnuadji Gamadarenda ◽  
Indra Waspada

<p class="Abstrak">Penyakit ginjal kronis (PGK) merupakan masalah kesehatan publik di seluruh dunia dengan insiden yang terus meningkat. Berdasarkan sumber dari BPJS Kesehatan, perawatan PGK merupakan ranking kedua pembiayaan terbesar setelah penyakit jantung. Pendeteksian PGK juga memerlukan banyak atribut sehingga membutuhkan biaya yang cukup mahal. Oleh sebab itu dibuat sistem dengan tahapan data mining berbasis web yang memudahkan untuk melakukan deteksi PGK, sehingga PGK dapat dicegah, ditanggulangi, dan kemungkinan mendapatkan terapi yang efektif lebih besar jika diketahui lebih awal. Proses penelitian ini menggunakan sebuah rangka kerja<em> data mining</em> <em>Knowledge Data Discover</em>y (KDD). Dalam skenario rangka kerja yang digunakan, sistem ini menggunakan Algoritme <em>Backward Elimination</em> untuk mengurangi jumlah atribut yang dipakai dengan tujuan untuk mengurangi jenis pemeriksaan yang dilakukan, dan Algoritme k-<em>Nearest Neighbor</em> sebagai algoritme klasifikasi untuk mendeteksi penyakit. Hasil pemodelan terbaik <em>data mining</em> dari sistem yang dibuat menggunakan <em>Backward Elimination</em> (α = 0,05) dan kNN (<em>k = </em>3) dengan pertimbangan penurunan biaya pemeriksaan dan sensitivity tertinggi. Rekomendasi sistem menghasilkan 10 atribut yang terpilih dari 24 atribut awal yang digunakan, yaitu: berat jenis (<em>sg</em>), albumin (<em>al</em>), urea darah (<em>bu</em>), kreatinin serum (<em>sc</em>), sodium (<em>sod</em>), hemoglobin (<em>hemo</em>), sel darah merah (<em>rbc</em>), hipertensi (<em>htn</em>), diabetes mellitus (<em>dm</em>), dan nafsu makan (<em>appet</em>). Penggunaan atribut yang telah terseleksi tersebut, berhasil menekan biaya pemeriksaan hingga 73,36%. Selanjutnya dilakukan pendeteksian penyakit menggunakan Algoritme k-<em>Nearest Neighbor </em>menghasilkan nilai akurasi sebesar 99,25%, <em>sensitivity</em> sebesar 99,5%, dan <em>specificity</em> sebesar 98,745%.</p><p class="Abstrak"><em><strong>Abstract</strong></em></p><p class="Abstract"><em>Chronic kidney disease (CKD) is a health problem for people around the world with increasing incidence. Based on sources from BPJS Kesehatan, CKD care is the second largest ranking of financing after heart disease. CKD detection also requires many attributes, so it requires quite expensive costs. Create a system with web-based data mining stages that makes it easy to detect CKD. Allowing CKD to be prevented, addressed, and advised to get effective therapy is greater if acknowledged earlier. The process of this research uses work methods of Data Mining Knowledge Data Discovery (KDD). In the framework of the framework used, this system uses the Backward Elimination Algorithm to reduce the number of attributes used to reduce the type of inspection performed, and the k-Nearest Neighbor Algorithm as an algorithm to update disease. The best data mining modeling results from the system are made using Backward Elimination (α = 0.05) and kNN (k = 3) by calculating the increase in inspection costs and the highest sensitivity. System recommendations produce 10 attributes selected from the 24 initial attributes used, namely: specific gravity (sg), albumin (al), blood urea (bu), serum creatinine (sc), sodium (soil), hemoglobin (hemo), cell red blood (rbc), hypertension (htn), diabetes mellitus (dm), and appetite (appetite). The use of the selected attributes succeeded in achieving inspection costs of up to 73.36%. Furthermore, disease detection using the k-Nearest Neighbor Algorithm produces an accuracy value of 99.25%, sensitivity of 99.5%, and specificity of 98.745%.</em></p><p class="Abstrak"><em><strong><br /></strong></em></p>


Author(s):  
Frederick E. Petry

This chapter focuses on the application of the discovery of association rules in approaches vague spatial databases. The background of data mining and uncertainty representations using rough set and fuzzy set techniques is provided. The extensions of association rule extraction for uncertain data as represented by rough and fuzzy sets is described. Finally, an example of rule extraction for both types of uncertainty representations is given.


Keyword(s):  

Banyaknya pengguna jalan yang tidak mematuhi peraturan berlalu lintas dengan baik, setiap harinya dapat menambah tingkat kecelakaan dan pelanggaran tata tertib lalu lintas dalam berkendara pada wilayah Kota Tasikmalaya, sehingga masyarakat kurang dalam memahami ketertiban dijalan raya. Penelitian ini menerapkan data mining dengan menggunakan metode clustering pada data pelanggaran lalu lintas Polres Tasikmalaya Kota, algoritma yang digunakan yaitu K-Means clustering berupa proses pengelompokan sejumlah data atau objek ke dalam cluster atau group sehingga setiap dalam cluster tersebut akan berisi data yang semirip mungkin dan berbeda dengan objek dalam cluster lainnya. Data pelanggaran lalu lintas Polres Tasikmalaya kota ini diproses melalui Knowledge Data Discovery (KDD) sehingga dapat diketahui pengujian dengan rapidminer, menghasilkan cluster-cluster pelanggaran lalu lintas. Sampel yang digunakan di ambil dari tabel data pelanggaran lalu lintas yang telah ditrasformasikan. Dimana atribut yang ditentukan sebanyak 6 atribut yaitu wilayah, tidak menggunakan helm, sabuk keselamatan, melanggar rambu lintas, tidak membawa sim dan stnk dan kelebihan muatan. Dimana akan mempresentasikan cluster-cluster tiap kelompok wilayah dan jenis pelanggaran lalu lintas.


2020 ◽  
Author(s):  
Mohammed J. Zaki ◽  
Wagner Meira, Jr
Keyword(s):  

2010 ◽  
Vol 24 (2) ◽  
pp. 112-119 ◽  
Author(s):  
F. Riganello ◽  
A. Candelieri ◽  
M. Quintieri ◽  
G. Dolce

The purpose of the study was to identify significant changes in heart rate variability (an emerging descriptor of emotional conditions; HRV) concomitant to complex auditory stimuli with emotional value (music). In healthy controls, traumatic brain injured (TBI) patients, and subjects in the vegetative state (VS) the heart beat was continuously recorded while the subjects were passively listening to each of four music samples of different authorship. The heart rate (parametric and nonparametric) frequency spectra were computed and the spectra descriptors were processed by data-mining procedures. Data-mining sorted the nu_lf (normalized parameter unit of the spectrum low frequency range) as the significant descriptor by which the healthy controls, TBI patients, and VS subjects’ HRV responses to music could be clustered in classes matching those defined by the controls and TBI patients’ subjective reports. These findings promote the potential for HRV to reflect complex emotional stimuli and suggest that residual emotional reactions continue to occur in VS. HRV descriptors and data-mining appear applicable in brain function research in the absence of consciousness.


Sign in / Sign up

Export Citation Format

Share Document