scholarly journals Stock Data Clustering of Food and Beverage Company

Author(s):  
Shofwatul Uyun ◽  
Subanar Subanar

AbstractCluster analysis can be defined as identifying groups of similar objects to discover distribution of patterns and interesting correlations in large data sets. Clustering analysis is important in the fields of pattern recognition and pattern classification. Over the years many methods have been developed for clustering data. In general, clustering methods can be categoried into two categories, i.e., fuzzy clustering and hard clustering. Fuzzy C-means is one of many methods of clustering based on fuzzy approach, while K-Means and K-Medoid are methods clustering based on crisp approach.This study aims to apply Fuzzy C-Means, K-Means and K-Medoid methods for clustering stock data in a jbod and beverage company. The main goal is to find a clustering method that can produce optimal clusters, The resulting clusters are validated using Dunn'• Index (DI). It is expected that the result of this reseach can be used to support decision making in the food and beverage company.Keywords : Clustering, Fuzzy C-Means, K-Means, K-Medoid, Cluster Validity, Dunn's Index (Dl)

2021 ◽  
Author(s):  
Sebastiaan Valkiers ◽  
Max Van Houcke ◽  
Kris Laukens ◽  
Pieter Meysman

The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).


Author(s):  
B. K. Tripathy ◽  
Hari Seetha ◽  
M. N. Murty

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.


Author(s):  
Terence Kwok ◽  
Kate Smith ◽  
Sebastian Lozano ◽  
David Taniar

2007 ◽  
Vol 17 (01) ◽  
pp. 71-103 ◽  
Author(s):  
NARGESS MEMARSADEGHI ◽  
DAVID M. MOUNT ◽  
NATHAN S. NETANYAHU ◽  
JACQUELINE LE MOIGNE

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.


JUTI UNISI ◽  
2020 ◽  
Vol 4 (1) ◽  
pp. 1-8
Author(s):  
Abdul Muni

PT. Alpa Scorpii is the sector private the economy in the motorcycle sales. The utilization of the data is not maximum, sales report that is used only limited to report. Promotion strategy is to increase the income of the company in relation to the straight way with the cost. The data mining so that data can be used as the existing knowledge from the large data sets or with the term knowledge discovery or pattern recognition. Many methods in data mining one only with the method the algorithm K-Means the Cluster. Clustering  data so that the field of marketing can perform the motor sales promotion strategy to new customers with the right and can improve corporate earnings.


Author(s):  
Frank Klawonn ◽  
Olga Georgieva

Most clustering methods have to face the problem of characterizing good clusters among noise data. The arbitrary noise points that just do not belong to any class being searched for are of a real concern. The outliers or noise data points are data that severely deviate from the pattern set by the majority of the data, and rounding and grouping errors result from the inherent inaccuracy in the collection and recording of data. In fact, a single outlier can completely spoil the least squares (LS) estimate and thus the results of most LS based clustering techniques such as the hard C-means (HCM) and the fuzzy C-means algorithm (FCM) (Bezdek, 1999).


A clustering technique is an appropriate solvable approach for classifying information while no existence of premature information pertaining to class labels, using promising techniques like cloud based computing and big data over latest years. Investigating awareness was gradually piled up with unsupervised methods such as clustering approaches to pull out useful information from the data set available. Time series based clustering data was used in most of the technical domains to extract information enriched patterns to power the data analysis which extracts useful essence from complicated as well as large data sets. It is mostly not possible for large datasets using classification approach whereas clustering approach will resolve the problem with aid of unsupervised techniques. In the proposed methodology, main spotlight on time series health care datasets, one of the kind of admired data in clustering approaches. This summary will expose 4 major components of Time series approaches.


Author(s):  
Raymond Greenlaw ◽  
Sanpawat Kantabutra

This chapter provides the reader with an introduction to clustering algorithms and applications. A number of important well-known clustering methods are surveyed. The authors present a brief history of the development of the field of clustering, discuss various types of clustering, and mention some of the current research directions in the field of clustering. Algorithms are described for top-down and bottom-up hierarchical clustering, as are algorithms for K-Means clustering and for K-Medians clustering. The technique of representative points is also presented. Given the large data sets involved with clustering, the need to apply parallel computing to clustering arises, so they discuss issues related to parallel clustering as well. Throughout the chapter references are provided to works that contain a large number of experimental results. A comparison of the various clustering methods is given in tabular format. They conclude the chapter with a summary and an extensive list of references.


Author(s):  
John A. Hunt

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].


Sign in / Sign up

Export Citation Format

Share Document