scholarly journals Extended Advanced Method of Clustering Big data to Achieve high Dimensionality

Clustering is one of the relevant knowledge engineering methods of data analysis. The clustering method will automatically directly affect the result dataset. The proposed work aims at developing an Extended Advanced Method of Clustering (EAMC) to address numerous types of issues associated to large and high dimensional dataset. The proposed Extended Advance Method of clustering will repetitively avoid computational time between each data cluster object contained by the cluster that saves execution time in term. For each iteration EAMC needs a data structure to store data that can be utilized for the next iteration. We have gained outcomes from the proposed method, which demonstrates that there is an improvement in effectiveness and pace of clustering and precision generation, which will decrease the convolution of computing over the old algorithms like SOM, HAC, and K-means. This paper includes EAMC and the investigational outcomes done using academic datasets

2019 ◽  
Vol 19 (1) ◽  
pp. 1-4 ◽  
Author(s):  
Ivan Gavrilyuk ◽  
Boris N. Khoromskij

AbstractMost important computational problems nowadays are those related to processing of the large data sets and to numerical solution of the high-dimensional integral-differential equations. These problems arise in numerical modeling in quantum chemistry, material science, and multiparticle dynamics, as well as in machine learning, computer simulation of stochastic processes and many other applications related to big data analysis. Modern tensor numerical methods enable solution of the multidimensional partial differential equations (PDE) in {\mathbb{R}^{d}} by reducing them to one-dimensional calculations. Thus, they allow to avoid the so-called “curse of dimensionality”, i.e. exponential growth of the computational complexity in the dimension size d, in the course of numerical solution of high-dimensional problems. At present, both tensor numerical methods and multilinear algebra of big data continue to expand actively to further theoretical and applied research topics. This issue of CMAM is devoted to the recent developments in the theory of tensor numerical methods and their applications in scientific computing and data analysis. Current activities in this emerging field on the effective numerical modeling of temporal and stationary multidimensional PDEs and beyond are presented in the following ten articles, and some future trends are highlighted therein.


Author(s):  
Vidadi Akhundov Vidadi Akhundov

In this study, attention is drawn to the under-explored area of strategic content analysis and the development of strategic vision for managers, with the supporting role of interpreting visualized big data to apply appropriate knowledge management strategies in regional companies. The study suggests improved models that can be used to process data and apply solutions to Big Data. The paper proposes a model of business processes in the region in the context of information clusters, which become the object of analysis in the conditions of active accumulation of big data about the external and internal environment. Research has shown that traditional econometric and data collection techniques cannot be directly applied to Big Data analysis due to computational volatility or computational complexity. The paper provides a brief description of the essence of the methods of associative and causal data analysis and the problems that complicate its application in Big Data. The scheme of accelerated search for a set of causal relationships is described. The use of semantically structured models, cause-effect models and the K-clustering method for decision making in big data is practical and ensures the adequacy of the results. The article explains the stages of applying these models in practice. In the course of the study, content analysis was carried out using the main methods of processing structured data on the example of the countries of the world using synthetic indicators showing the trends of Industry 4.0. When assessing Industry 4.0 technologies by region, the diversity of country grouping attributes should be considered. Therefore, during the analysis, the countries of the world were compared in two groups. The first group - the results for developed countries are presented in tabular form. For the second group, the results are presented in an explanatory form. In the process of assessing industrial 4.0 technologies, statistical indicators were used: "The share of medium and high-tech activities", "Competitiveness indicators", "Results in the field of knowledge and technology", "The share of medium and high-tech production in the total value added in the manufacturing industry", “Industrial Competitiveness Index (CIP score)”. As a result, the rating of the countries was determined based on the analysis of these indicators. . The reasons for the difficulties of calculations when processing Big Data are given in the concluding part of the article. Keywords: K - clustering method, causal links, data point, Euclidean distance


2018 ◽  
Author(s):  
Subarna Palit ◽  
Fabian J. Theis ◽  
Christina E. Zielinski

AbstractRecent advances in cytometry have radically altered the fate of single-cell proteomics by allowing a more accurate understanding of complex biological systems. Mass cytometry (CyTOF) provides simultaneous single-cell measurements that are crucial to understand cellular heterogeneity and identify novel cellular subsets. High-dimensional CyTOF data were traditionally analyzed by gating on bivariate dot plots, which are not only laborious given the quadratic increase of complexity with dimension but are also biased through manual gating. This review aims to discuss the impact of new analysis techniques for in-depths insights into the dynamics of immune regulation obtained from static snapshot data and to provide tools to immunologists to address the high dimensionality of their single-cell data.


Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2984
Author(s):  
Masurah Mohamad ◽  
Ali Selamat ◽  
Ondrej Krejcar ◽  
Ruben Gonzalez Crespo ◽  
Enrique Herrera-Viedma ◽  
...  

This study proposes an alternate data extraction method that combines three well-known feature selection methods for handling large and problematic datasets: the correlation-based feature selection (CFS), best first search (BFS), and dominance-based rough set approach (DRSA) methods. This study aims to enhance the classifier’s performance in decision analysis by eliminating uncorrelated and inconsistent data values. The proposed method, named CFS-DRSA, comprises several phases executed in sequence, with the main phases incorporating two crucial feature extraction tasks. Data reduction is first, which implements a CFS method with a BFS algorithm. Secondly, a data selection process applies a DRSA to generate the optimized dataset. Therefore, this study aims to solve the computational time complexity and increase the classification accuracy. Several datasets with various characteristics and volumes were used in the experimental process to evaluate the proposed method’s credibility. The method’s performance was validated using standard evaluation measures and benchmarked with other established methods such as deep learning (DL). Overall, the proposed work proved that it could assist the classifier in returning a significant result, with an accuracy rate of 82.1% for the neural network (NN) classifier, compared to the support vector machine (SVM), which returned 66.5% and 49.96% for DL. The one-way analysis of variance (ANOVA) statistical result indicates that the proposed method is an alternative extraction tool for those with difficulties acquiring expensive big data analysis tools and those who are new to the data analysis field.


2019 ◽  
Vol 9 (1) ◽  
pp. 01-12 ◽  
Author(s):  
Kristy F. Tiampo ◽  
Javad Kazemian ◽  
Hadi Ghofrani ◽  
Yelena Kropivnitskaya ◽  
Gero Michel

Sign in / Sign up

Export Citation Format

Share Document