scholarly journals eHMCOKE: an enhanced overlapping clustering algorithm for data analysis

2021 ◽  
Vol 10 (4) ◽  
pp. 2212-2222
Author(s):  
Alvincent E. Danganan ◽  
Edjie Malonzo De Los Reyes

Improved multi-cluster overlapping k-means extension (IMCOKE) uses median absolute deviation (MAD) in detecting outliers in datasets makes the algorithm more effective with regards to overlapping clustering. Nevertheless, analysis of the applied MAD positioning was not considered. In this paper, the incorporation of MAD used to detect outliers in the datasets was analyzed to determine the appropriate position in identifying the outlier before applying it in the clustering application. And the assumption of the study was the size of the cluster and cluster that are close to each other can led to a higher runtime performance in terms of overlapping clusters. Therefore, additional parameters such as radius of clusters and distance between clusters are added measurements in the algorithm procedures. Evaluation was done through experimentations using synthetic and real datasets. The performance of the eHMCOKE was evaluated via F1-measure criterion, speed and percentage of improvement. Evaluation results revealed that the eHMCOKE takes less time to discover overlap clusters with an improvement rate of 22% and achieved the best performance of 91.5% accuracy rate via F1-measure in identifying overlapping clusters over the IMCOKE algorithm. These results proved that the eHMCOKE significantly outruns the IMCOKE algorithm on mosts of the test conducted.

Author(s):  
Alvincent Egonia Danganan ◽  
Ariel M. Sison ◽  
Ruji P. Medina

<p>In this paper, a new data analysis tool called Overlapping Clustering Application (OCA) was presented. It was developed to identify overlapping clusters and outliers in an unsupervised manner. The main function of OCA is composed of three phases. The first phase is the detection of the abnormal values(outliers) in the datasets using median absolute deviation. The second phase is to segment data objects into cluster using k-means algorithm. Finally, the last phase is the identification of overlapping clusters, it uses maxdist (maximum distance of data objects allowed in a cluster) as a predictor of data objects that can belong to multiple clusters.  Experimental results revealed that the developed OCA proved its capability in detecting overlapping clusters and outliers accordingly.</p>


2018 ◽  
Vol 3 (1) ◽  
pp. 001
Author(s):  
Zulhendra Zulhendra ◽  
Gunadi Widi Nurcahyo ◽  
Julius Santony

In this study using Data Mining, namely K-Means Clustering. Data Mining can be used in searching for a large enough data analysis that aims to enable Indocomputer to know and classify service data based on customer complaints using Weka Software. In this study using the algorithm K-Means Clustering to predict or classify complaints about hardware damage on Payakumbuh Indocomputer. And can find out the data of Laptop brands most do service on Indocomputer Payakumbuh as one of the recommendations to consumers for the selection of Laptops.


2018 ◽  
Author(s):  
Arriel Benis ◽  
Nissim Harel ◽  
Refael Barak Barkan ◽  
Einav Srulovici ◽  
Calanit Key

BACKGROUND Data collected by health care organizations consist of medical information and documentation of interactions with patients through different communication channels. This enables the health care organization to measure various features of its performance such as activity, efficiency, adherence to a treatment, and different quality indicators. This information can be linked to sociodemographic, clinical, and communication data with the health care providers and administrative teams. Analyzing all these measurements together may provide insights into the different types of patient behaviors or more accurately to the different types of interactions patients have with the health care organizations. OBJECTIVE The primary aim of this study is to characterize usage profiles of the available communication channels with the health care organization. The main objective is to suggest new ways to encourage the usage of the most appropriate communication channel based on the patient’s profile. The first hypothesis is that the patient’s follow-up and clinical outcomes are influenced by the patient’s preferred communication channels with the health care organization. The second hypothesis is that the adoption of newly introduced communication channels between the patient and the health care organization is influenced by the patient’s sociodemographic or clinical profile. The third hypothesis is that the introduction of a new communication channel influences the usage of existing communication channels. METHODS All relevant data will be extracted from the Clalit Health Services data warehouse, the largest health care management organization in Israel. Data analysis process will use data mining approach as a process of discovering new knowledge and dealing with processing data extracted with statistical methods, machine learning algorithms, and information visualization tools. More specifically, we will mainly use the k-means clustering algorithm for discretization purposes and patients’ profile building, a hierarchical clustering algorithm, and heat maps for generating a visualization of the different communication profiles. In addition, patients’ interviews will be conducted to complement the information drawn from the data analysis phase with the aim of suggesting ways to optimize existing communication flows. RESULTS The project was funded in 2016. Data analysis is currently under way and the results are expected to be submitted for publication in 2019. Identification of patient profiles will allow the health care organization to improve its accessibility to patients and their engagement, which in turn will achieve a better treatment adherence, quality of care, and patient experience. CONCLUSIONS Defining solutions to increase patient accessibility to health care organization by matching the communication channels to the patient’s profile and to change the health care organization’s communication with the patient to a highly proactive one will increase the patient’s engagement according to his or her profile. INTERNATIONAL REGISTERED REPOR RR1-10.2196/10734


Author(s):  
E.B. Priyanka ◽  
S. Thangavel ◽  
Priyanka Prabhakaran

Oil and Gas Pipeline (OGP) projects face a wide scope of wellbeing and security Risk Factors (RFs) all around the world, especially in the oil and gas delivering nations having influencing climate and unsampled data. Lacking data about the reasons for pipeline risk predictor and unstructured data about the security of the OGP prevent endeavors of moderating such dangers. This paper, subsequently, means to foster a risk analyzing framework in view of a comprehensive methodology of recognizing, dissecting and positioning the related RFs, and assessing the conceivable pipeline characteristics. Hazard Mitigation Methods (HMMs), which are the initial steps of this approach. A new methodology has been created to direct disappointment investigation of pinhole erosion in pipelines utilizing the typical pipeline risk strategy and erosion climate reenactments during a full life pattern of the pipeline. Hence in the proposed work, manifold learning with rank based clustering algorithm is incorporated with the cloud server for improved data analysis. The probability risk rate is identified from the burst pressure by clustering the normal and leak category to improve the accuracy of the prediction system experimented on the lab-scale oil pipeline system. The numerical results like auto-correlation, periodogram, Laplace transformed P-P Plot are utilized to estimate the datasets restructured by the manifold learning approach. The obtained experimental results shows that the cloud server datasets are clustered with rank prioritization to make proactive decision in faster manner by distinguishing labelled and unlabeled pressure attributes.


2019 ◽  
Vol 31 (6) ◽  
pp. 525-538
Author(s):  
Rebekka Hoffmann ◽  
Anna Helga Jónsdóttir ◽  
Ebba Thora Hvannberg

Abstract Usability testing can involve multiple users and evaluators. In such cases, consolidating usability problems (UPs) constitutes an essential part of data analysis. In a between-subjects design, this study aims to re-examine a previous study by comparing the results of novice evaluators merging UPs individually vs. collaboratively and to assess the quality of the final UP lists, by computing the merging rate and the accuracy rate, respectively. Law and Hvannberg compared the results of evaluators merging UPs individually vs. collaboratively in a within-subjects design, revealing a tendency towards merging UPs in collaborative settings. In the present study, 45 novice evaluators consolidated four UP lists into a single UP master list while working alone or with a partner. The results showed no significant difference between evaluators in the two settings, suggesting that the UP consolidation process does not benefit from positive group decision effects.


Sign in / Sign up

Export Citation Format

Share Document