scholarly journals Effect of Distance Metrics in Determining K-Value in K-Means Clustering Using Elbow and Silhouette Method

Author(s):  
Danny Matthew SAPUTRA ◽  
Daniel SAPUTRA ◽  
Liniyanti D. OSWARI
Keyword(s):  
Author(s):  
Durmuş Özkan Şahin ◽  
Sedat Akleylek ◽  
Erdal Kılıç

There is a remarkable increase in mobile device usage in recent years. The Android operating system is by far the most preferred open-source mobile operating system around the world. Besides, the Android operating system is preferred in many devices on the Internet of Things (IoT) devices are used in many areas of daily life. Smart cities, smart environment, health, home automation, agriculture, and livestock are some of the usage areas. Health is one of the most frequently used areas. Since the Android operating system is both the widely used operating system and open-source, the vast majority of malware released on the market is now designed for Android platforms. Therefore, devices using the Android operating system are under serious threat. In this study, a system that detects malware on Android operating systems based on machine learning is proposed. Besides, feature vectors are created with permissions that have an important place in the security of the Android operating system. Feature vectors created using the k-nearest neighbor algorithm (KNN), one of the machine learning techniques, are given as input to this algorithm, and a classification of malicious software and benign software is provided. In the KNN algorithm, the k value and the distance metric used to find the closest sample directly affect the classification performance. In addition, the study examining the parameters of the KNN algorithm in detail in permission-based studies is limited. For this reason, the performance of the malware detection system is presented comparatively using five different k values and five different distance metrics under different data sets. When the results are examined, it is observed that higher classification performances are obtained when values such as 1, 3 are given to k and metrics such as Euclidean and Minkowski are chosen instead of the Chebyshev distance metric.


2003 ◽  
Vol 765 ◽  
Author(s):  
S. Van Elshocht ◽  
R. Carter ◽  
M. Caymax ◽  
M. Claes ◽  
T. Conard ◽  
...  

AbstractBecause of aggressive downscaling to increase transistor performance, the physical thickness of the SiO2 gate dielectric is rapidly approaching the limit where it will only consist of a few atomic layers. As a consequence, this will result in very high leakage currents due to direct tunneling. To allow further scaling, materials with a k-value higher than SiO2 (“high-k materials”) are explored, such that the thickness of the dielectric can be increased without degrading performance.Based on our experimental results, we discuss the potential of MOCVD-deposited HfO2 to scale to (sub)-1-nm EOTs (Equivalent Oxide Thickness). A primary concern is the interfacial layer that is formed between the Si and the HfO2, during the MOCVD deposition process, for both H-passivated and SiO2-like starting surfaces. This interfacial layer will, because of its lower k-value, significantly contribute to the EOT and reduce the benefit of the high-k material. In addition, we have experienced serious issues integrating HfO2 with a polySi gate electrode at the top interface depending on the process conditions of polySi deposition and activation anneal used. Furthermore, we have determined, based on a thickness series, the k-value for HfO2 deposited at various temperatures and found that the k-value of the HfO2 depends upon the gate electrode deposited on top (polySi or TiN).Based on our observations, the combination of MOCVD HfO2 with a polySi gate electrode will not be able to scale below the 1-nm EOT marker. The use of a metal gate however, does show promise to scale down to very low EOT values.


Author(s):  
Noorma Rosita ◽  
Dewi Haryadi ◽  
Tristiana Erawati ◽  
Rossa Nanda ◽  
Widji Soeratri

The aim of this study was to investigate the ability of NLC in increasing photostability of tomato extract in term of antioxidant activity. Photostability testing on antioxidant activity of samples were conducted by accelerating method using UVB radiation 32.400 joule for 21 hours radiation. Antioxidant activity was measured by DPPH method. NLC was made by High Shear Homogenization (HPH) method at 24000 rpm for 4 cycles, while conventional creame was made by low speed at 400 rpm. The product were characterized include: pH, viscosity, and particle size. There were had difference characters and physical stability. NLC had smaller size, more homogenous and more stable than conventional creame. It was known that stability of antioxidant activity of tomato extract in NLC system higher than in conventional creame. That was showed with k value, as constanta of rate scavenging activity decreasing in antioxidant power between time (Sigma 2-tail less than 0.005) of NLC and conventional creame were: 2.03x10-2 %/hour ±0.08 (3.94) and 4.71x 10-2 %/ hour ±0.23 (4.88) respectively.


2020 ◽  
Vol 4 (2) ◽  
pp. 377-383
Author(s):  
Eko Laksono ◽  
Achmad Basuki ◽  
Fitra Bachtiar

There are many cases of email abuse that have the potential to harm others. This email abuse is commonly known as spam, which contains advertisements, phishing scams, and even malware. This study purpose to know the classification of email spam with ham using the KNN method as an effort to reduce the amount of spam. KNN can classify spam or ham in an email by checking it using a different K value approach. The results of the classification evaluation using confusion matrix resulted in the KNN method with a value of K = 1 having the highest accuracy value of 91.4%. From the results of the study, it is known that the optimization of the K value in KNN using frequency distribution clustering can produce high accuracy of 100%, while k-means clustering produces an accuracy of 99%. So based on the results of the existing accuracy values, the frequency distribution clustering and k-means clustering can be used to optimize the K-optimal value of the KNN in the classification of existing spam emails.


2020 ◽  
Vol 16 (3) ◽  
pp. 262-269
Author(s):  
Tahere Talebi Azad Boni ◽  
Haleh Ayatollahi ◽  
Mostafa Langarizadeh

Background: One of the greatest challenges in the field of medicine is the increasing burden of chronic diseases, such as diabetes. Diabetes may cause several complications, such as kidney failure which is followed by hemodialysis and an increasing risk of cardiovascular diseases. Objective: The purpose of this research was to develop a clinical decision support system for assessing the risk of cardiovascular diseases in diabetic patients undergoing hemodialysis by using a fuzzy logic approach. Methods: This study was conducted in 2018. Initially, the views of physicians on the importance of assessment parameters were determined by using a questionnaire. The face and content validity of the questionnaire was approved by the experts in the field of medicine. The reliability of the questionnaire was calculated by using the test-retest method (r = 0.89). This system was designed and implemented by using MATLAB software. Then, it was evaluated by using the medical records of diabetic patients undergoing hemodialysis (n=208). Results: According to the physicians' point of view, the most important parameters for assessing the risk of cardiovascular diseases were glomerular filtration, duration of diabetes, age, blood pressure, type of diabetes, body mass index, smoking, and C reactive protein. The system was designed and the evaluation results showed that the values of sensitivity, accuracy, and validity were 85%, 92% and 90%, respectively. The K-value was 0.62. Conclusion: The results of the system were largely similar to the patients’ records and showed that the designed system can be used to help physicians to assess the risk of cardiovascular diseases and to improve the quality of care services for diabetic patients undergoing hemodialysis. By predicting the risk of the disease and classifying patients in different risk groups, it is possible to provide them with better care plans.


2019 ◽  
Vol 40 (1) ◽  
pp. 7
Author(s):  
Marcelo Silveira de Farias ◽  
José Fernando Schlosser ◽  
Javier Solis Estrada ◽  
Gismael Francisco Perin ◽  
Alfran Tellechea Martini

The growing global demand of energy, the decrease of petroleum reserves and the current of environmental contamination problems, make it imperative to study renewable energy sources for use in internal combustion engines, in order to decrease the dependence on fossil fuels and reduce emissions of pollutant gases. This study aimed to evaluate the emissions of a diesel-cycle engine of an agricultural tractor that uses diesel S500 (B5) mixed with 3, 6, 9, 12 and 15% of hydrous ethanol. It determined emissions of CO2 (ppm), NOx (ppm), and opacity (k value) of gases. A standard procedure was applied considering eight operating modes (M1, M2, M3, M4, M5, M6, M7, and M8) by breaking with an electric dynamometer in a laboratory. The experimental design was completely randomized, with 60 replicates and a 6 x 8 factorial design. Greater opacity and gas emissions were observed when the engine operated with 3% ethanol, while lower emissions occurred with 12 and 15%. With these fuels, the reduction of opacity, CO2, and NOx, in relation to diesel oil, was 24.49 and 26.53%, 4.96 and 5.15%, and 6.59 and 9.70%, respectively. In conclusion, the addition of 12 and 15% ethanol in diesel oil significantly reduces engine emissions.


2020 ◽  
pp. 1-12
Author(s):  
Ayla Gülcü ◽  
Sedrettin Çalişkan

Collateral mechanism in the Electricity Market ensures the payments are executed on a timely manner; thus maintains the continuous cash flow. In order to value collaterals, Takasbank, the authorized central settlement bank, creates segments of the market participants by considering their short-term and long-term debt/credit information arising from all market activities. In this study, the data regarding participants’ daily and monthly debt payment and penalty behaviors is analyzed with the aim of discovering high-risk participants that fail to clear their debts on-time frequently. Different clustering techniques along with different distance metrics are considered to obtain the best clustering. Moreover, data preprocessing techniques along with Recency, Frequency, Monetary Value (RFM) scoring have been used to determine the best representation of the data. The results show that Agglomerative Clustering with cosine distance achieves the best separated clustering when the non-normalized dataset is used; this is also acknowledged by a domain expert.


Sign in / Sign up

Export Citation Format

Share Document