Analisis Dan Penerapan Algoritma K-Means Dalam Strategi Promosi Kampus Akademi Maritim Suaka Bahari

2021 ◽  
Vol 3 (1) ◽  
pp. 1-7
Author(s):  
Tuti Hartati ◽  
Odi Nurdiawan ◽  
Eko Wiyandi

The process of accepting new cadet candidates at the Maritime Academy of Marine Sanctuary every year, produces a lot of data in the form of profiles of prospective cadets. The activity caused a large accumulation of data, it became difficult to identify prospective cadets. This research discusses the application of data mining to generate profiles that have similar attributes. One of the data mining techniques used to identify a group of objects that have the same characteristics is Cluster Analysis. The data clustering method is divided into one or more clusters that have the same characteristics called K-means. The method that the author uses is knowledge discovery in databases (KDD) consisting of Data, Data Cleaning, Data transformation, Data mining, Pattern evolution, knowledge. Implementation of K-means Clustering process using Rapid Miner. Attributes used by NIT, Level, Name, Student Status, Type of Registration, Gender, Place of Birth, Date of Birth, Religion, School Origin, School Origin Department, Religion, GPA, Subdistrict, District/ City, Province. Returns the number of clusters 30 (k=30). From the research results based on davies bouldin test on K-means algorithm resulted in the closest value of 0 is k = 29 with Davies bouldin: 0.070, with the most cluster member distribution in cluster 16 containing cluster members 115 items.

2020 ◽  
Vol 1 (4) ◽  
pp. 1-6
Author(s):  
Arjun Dutta

This paper deals with concise study on clustering: existing methods and developments made at various times. Clustering is defined as an unsupervised learning where the targets are sorted out on the foundation of some similarity inherent among them. In the recent times, we dispense with large masses of data including images, video, social text, DNA, gene information, etc. Data clustering analysis has come out as an efficient technique to accurately achieve the task of categorizing information into sensible groups. Clustering has a deep association with researches in several scientific fields. k-means algorithm was suggested in 1957. K-mean is the most popular partitional clustering method till date. In many commercial and non-commercial fields, clustering techniques are used. The applications of clustering in some areas like image segmentation, object and role recognition and data mining are highlighted. In this paper, we have presented a brief description of the surviving types of clustering approaches followed by a survey of the areas.


2017 ◽  
Vol 13 (8) ◽  
pp. 155014771772862 ◽  
Author(s):  
Jianpeng Qi ◽  
Yanwei Yu ◽  
Lihong Wang ◽  
Jinglei Liu ◽  
Yingjie Wang

K-means plays an important role in different fields of data mining. However, k-means often becomes sensitive due to its random seeds selecting. Motivated by this, this article proposes an optimized k-means clustering method, named k*-means, along with three optimization principles. First, we propose a hierarchical optimization principle initialized by k* seeds ([Formula: see text]) to reduce the risk of random seeds selecting, and then use the proposed “top- n nearest clusters merging” to merge the nearest clusters in each round until the number of clusters reaches at [Formula: see text]. Second, we propose an “optimized update principle” that leverages moved points updating incrementally instead of recalculating mean and [Formula: see text] of cluster in k-means iteration to minimize computation cost. Third, we propose a strategy named “cluster pruning strategy” to improve efficiency of k-means. This strategy omits the farther clusters to shrink the adjustable space in each iteration. Experiments performed on real UCI and synthetic datasets verify the efficiency and effectiveness of our proposed algorithm.


Author(s):  
Agung Triayudi ◽  
Wahyu Oktri Widyarto ◽  
Lia Kamelia ◽  
Iksal Iksal ◽  
Sumiati Sumiati

<span lang="EN-US">Implementation of data mining, machine learning, and statistical data from educational department commonly known as educational data mining. Most of school systems require a teacher to teach a number of students at one time. Exam are regularly being use as a method to measure student’s achievement, which is difficult to understand because examination cannot be done easily. The other hand, programming classes makes source code editing and UNIX commands able to easily detect and store automatically as log-data. Hence, rather that estimating the performance of those student based on this log-data, this study being more focused on detecting them who experienced a difficulty or unable to take programming classes. We propose CLG clustering methods that can predict a risk of being dropped out from school using cluster data for outlier detection.</span>


2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


Author(s):  
Shadi Aljawarneh ◽  
Aurea Anguera ◽  
John William Atwood ◽  
Juan A. Lara ◽  
David Lizcano

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.


2011 ◽  
Vol 403-408 ◽  
pp. 1804-1807
Author(s):  
Ning Zhao ◽  
Shao Hua Dong ◽  
Qing Tian

In order to optimize electric- arc welding (ERW) welded tube scheduling , the paper introduces data cleaning, data extraction and transformation in detail and defines the datasets of sample attribute, which is based on analysis of production process of ERW welded tube. Furthermore, Decision-Tree method is adopted to achieve data mining and summarize scheduling rules which are validated by an example.


2016 ◽  
Vol 23 (1) ◽  
pp. 177-191
Author(s):  
Anderson Roges Teixeira Góes ◽  
Maria Teresinha Arns Steiner

Resumo A qualidade na educação tem sido objeto de muita discussão, seja nas escolas e entre seus gestores, seja na mídia ou na literatura. No entanto, uma análise mais profunda na literatura parece não indicar técnicas que explorem bancos de dados com a finalidade de obter classificações para o desempenho escolar, nem tampouco há um consenso sobre o que seja “qualidade educacional”. Diante deste contexto, neste artigo, é proposta uma metodologia que se enquadra no processo KDD (Knowledge Discovery in Databases, ou seja, Descoberta de Conhecimento em Bases de Dados) para a classificação do desempenho de instituições de ensino, de forma comparativa, com base nas notas obtidas na Prova Brasil, um dos itens integrantes do Índice de Desenvolvimento da Educação Básica (IDEB) no Brasil. Para ilustrar a metodologia, esta foi aplicada às escolas públicas municipais de Araucária, PR, região metropolitana de Curitiba, PR, num total de 17, que, por ocasião da pesquisa, ofertavam Ensino Fundamental, considerando as notas obtidas pela totalidade dos alunos dos anos iniciais (1º. ao 5º. ano do ensino fundamental) e dos anos finais (6º. ao 9º. ano do ensino fundamental). Na etapa de Data Mining, principal etapa do processo KDD, foram utilizadas três técnicas de forma comparativa para o Reconhecimento de Padrões: Redes Neurais Artificiais; Support Vector Machines; e Algoritmos Genéticos. Essas técnicas apresentaram resultados satisfatórios na classificação das escolas, representados por meio de uma “Etiqueta de Classificação do Desempenho”. Por meio desta etiqueta, os gestores educacionais poderão ter melhor base para definir as medidas a serem adotadas junto a cada escola, podendo definir mais claramente as metas a serem cumpridas.


Author(s):  
Weri Sirait ◽  
Sarjon Defit ◽  
Gunadi Widi Nurcahyo

School of Information and Computer Management (STMIK) Indonesia Padang is a private university under the auspices of the Higher Education Service Institution (LLDIKTI) Region X, producing graduates who are competent in the field of system analysts and database administrators. Requirements to meet undergraduate graduates (S1) final year students need to complete a final project or thesis. Final year students at STMIK Indonesia Padang often experience confusion in taking the final assignment topic. This is due to the fact that the final year students have not been able to direct their potential in determining the final assignment topic. In this case, researchers conducted the process of grouping final level students using the Data Mining K-means Clustering technique. The process of grouping final-level students is done by utilizing the data of course values ​​from the field mapping system analysts and database administrators. In this grouping two clusters will be produced, namely students taking the final assignment of system analysts and database administrator. So by using this K-means Clustering method, students have direction in taking the final assignment topic. The results obtained from 40 data samples used were students who took the topic of the final project system analysts as many as 20 students and students who took the final assignment of database administrators were 20 students.


Author(s):  
Muhamad Alias Md. Jedi ◽  
Robiah Adnan

TCLUST is a method in statistical clustering technique which is based on modification of trimmed k-means clustering algorithm. It is called “crisp” clustering approach because the observation is can be eliminated or assigned to a group. TCLUST strengthen the group assignment by putting constraint to the cluster scatter matrix. The emphasis in this paper is to restrict on the eigenvalues, λ of the scatter matrix. The idea of imposing constraints is to maximize the log-likelihood function of spurious-outlier model. A review of different robust clustering approach is presented as a comparison to TCLUST methods. This paper will discuss the nature of TCLUST algorithm and how to determine the number of cluster or group properly and measure the strength of group assignment. At the end of this paper, R-package on TCLUST implement the types of scatter restriction, making the algorithm to be more flexible for choosing the number of clusters and the trimming proportion.


Sign in / Sign up

Export Citation Format

Share Document