Research of Data Mining based on clustering model

Keyword(s):  
2014 ◽  
Vol 580-583 ◽  
pp. 2082-2087 ◽  
Author(s):  
Hua Wei Chen ◽  
Ji Wen Huang ◽  
Bing Li ◽  
Shi Dong Fu ◽  
Xin Zhang

Data mining model is the most important technical basis of the control target decomposition for the most stringent water resources management of Shandong province. K-means clustering model is adopted to analysis the water withdrawal of industrial added value per ten thousand yuan in 2010. Based on the yearly industrial water consumption trend from 1995 to 2010 of 17 municipal-level cities in Shandong province, the ARIMA (p, d, q) model is established through a lot of fitting and optimization and then the regional industrial water demand and water utilization efficiency in 2015 were forecasted. According to the proposed principal and technical route of target decomposition, the industrial water utilization efficiency target in 2015 of the whole province and 17 municipal-level cities are defined respectively.


Author(s):  
Wahyuri Wahyuri ◽  
Umi Athiyah ◽  
Ira Puspitasari ◽  
Yunita Nita

Background: Drug sampling and testing in the context of post-marketing control is an important component to ensure drug safety in the supply chains. The results are used by the Indonesian National Agency for Drug and Food Control (NA-FDC) for conducting public warnings, evaluating the Good Manufacturing Practice (GMP) and Good Distribution Practice (GDP) implementation, and enforcing the law against drug violation.Objective: This study aimed to identify and analyze drug distribution patterns to provide an overview of drug sampling in the public sector. Methods: The data was collected from Balai Besar Pengawas Obat dan Makanan (BBPOM) Palangka Raya’s database. The collected data were the drug sampling data from Integrated Information Reporting Systems (IIRS) application from 2014 to 2018. Next, we employed CRISP-DM methodology to analyze the data and to identify the pattern. K-means clustering model was selected for data modeling.Results: The dataset contained five attributes, i.e., drug name, therapeutic classes, district/city, sample category, and evaluation of drug surveillance. The drug distribution pattern formed three clusters. First cluster contained 522 drug items in eight therapeutic classes and spread over ten districts, second cluster contained 1542 drug items in five therapeutic classes and spread over five districts, and third cluster contained 503 drug items in eleven therapeutic classes and spread across nine districts.Conclusion: To conclude, the applied data mining technique has improved the decision on the drug sampling planning. It also provides in-depth information on the improvement of drug post-marketing control performance in Central Kalimantan Province.Keywords: Clustering, CRISP-DM, Data Mining, Drug distribution patterns, Drug quality control, Drug sampling


2021 ◽  
Vol 8 (3) ◽  
pp. 1607-1614
Author(s):  
Mardiani Mardiani

Manajemen pengetahuan menggunakan Model SECI membantu dalam transfer pengetahuan tacit dan eksplisit. Keterbatasan kemampuan sumber daya manusia dalam transfer pengetauan membutuhkan alat bantu dalam prosesnya. Ekstraksi pengetahuan dapat dilakukan dengan implementasi data mining. Hasil keluaran data mining yang besar akan dimanfaatkan oleh dunia pendidikan untuk tujuan strategis, misalnya evaluasi penyusunan profil lulusan dari hasil analisis kompetensi lulusan. Kurikulum Program Studi disusun berdasarkan profil Lulusan dan Program Studi membutuhkan pemetaan kebutuhan dari data alumni dalam menyusun kurikulum, sementara alumni membutuhkan mata kuliah yang mendukung setelah selesai kuliah. Manajemen Pengetahuan menampung pengetahuan dari lulusannya, sementara Data mining digunakan sebagai alat dalam mengolah data. Transfer pengetahuan dan pengolahan data kompetensi lulusan, dan memungkinkan munculnya pengetahuan baru bagi perguruan tinggi yang bisa dimanfaatkan dalam proses penyusunan kurikulum berikutnya. Model yang digunakan adalah SECI dikombinasikan dengan algoritma klasifikasi dan clustering. Model SECI yang sudah dipetakan alat bantu teknologinya pada setiap prosesnya, dibuat lebih jelas dan spesifik pengelompokkannya dengan implementasi Data Mining pada setiap kuadran Model SECI. Desain model SECI yang dikombinasikan dengan teknologi Data Mining akan memperbaiki kekurangan yang terdapat pada model sebelumnya.


2020 ◽  
Author(s):  
Min Li ◽  
Qunwei Wang ◽  
Yinzhong Shen

Abstract Background Highly active antiretroviral therapy (ART) is still the only effective method to stop the disease progression in acquired immunodeficiency syndrome (AIDS) patients. However, poor adherence to the therapy makes it ineffective. In this work, we construct an adherence prediction model of AIDS patients using the classical recency, frequency and monetary value (RFM) model in the data mining-based customer relationship management model to obtain adherence predictor variables. Methods We cleaned 257305 diagnostic data elements of AIDS outpatients in Shanghai from August 2009 to December 2019 to obtain 16440 elements. We tested the RFM and RFm (R: recent consultation month, F: consultation frequency, M/m: total/average medical costs per visit) models, three clustering methods (K-means, Kohonen and two-step clustering) and four decision algorithms (C5.0, the classification and regression tree, Chi-square Automatic Interaction Detector and Quick, Unbiased, Efficient, Statistical Tree) to select the optimal combination. The optimal model and clustering analysis were used to divide the patients into two groups (good and poor adherence), then the optimal decision algorithm was used to construct the prediction model of adherence and obtain its predictor variables. Results The results revealed that the RFm model, K-means clustering analysis and C5.0 algorithm were optimal. After three rounds of k-means clustering analysis, the optimal RFm clustering model quality was 0.8, 10614 elements were obtained, including 9803 and 811 from patients with good or poor adherence, respectively, and five types of patients were identified. The prediction model had an accuracy of 100% with the recent consultation month as an important adherence predictor variable. Conclusions This work presented a prediction model for medication adherence in AIDS patients at the designated AIDS center in Shanghai, using the RFm model and the k-means and C5.0 algorithms. The model can be expanded to include patients from other centers in China and worldwide.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yichen Chu ◽  
Xiaojian Yin

Mental health is an important basic condition for college students to become adults. Educators gradually attach importance to strengthening the mental health education of college students. This paper makes a detailed analysis and research on college students’ mental health, expounds the development and application of clustering analysis algorithm, applies the distance formula and clustering criterion function commonly used in clustering analysis, and makes a specific description of some classic algorithms of clustering analysis. Based on expounding the advantages and disadvantages of fast-clustering analysis algorithm and hierarchical clustering analysis algorithm, this paper introduces the concept of the two-step clustering algorithm, discusses the algorithm flow of clustering model in detail, and gives the algorithm flow chart. The main work of this paper is to analyze the clustering algorithm of students’ mental health database formed by mental health assessment tool test, establish a data mining model, mine the database, analyze the state characteristics of different college students’ mental health, and provide corresponding solutions. In order to meet the needs of the psychological management system based on the clustering analysis method, the clustering analysis algorithm is used to cluster the data. Based on the original database, this paper establishes the methods of selecting, cleaning, and transforming the data of students’ psychological archives. Finally, it expounds on the application of data mining in students’ psychological management system and summarizes and prospects the implementation of the system.


It is becoming increasingly difficult to cluster multi-valued data in data mining because of the multiple data interval values of individual functions. Identifying a clustering model that is appropriate for these disguised multi-valued data deployments in data analysis applications is an open problem. To answer this question, this paper proposes a feature selection based on the probabilistic features association mechanism (PFAM). The problem is mainly due to the difficulty in identifying the class information and the multiple values for each individual features. This work explores the problem of unsupervised feature selection through computing the probabilistic association score and multi-value data reformation for effective clustering in multivariate datasets. By minimizing a reformation clustering error, it can conserve together the degree of similarity and the categorization information of the actual data contents. The proposed approach is evaluated the clustering purity and Normalized Mutual Information on multivariate document datasets. The experimental evaluation shows the improvisation of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document