Text Clustering Using PSO Based Dynamic Adaptive SOM for Detecting Emergent Trends

Detection and realization of new trends from corpus are achieved through Emergent Trend Detection (ETD) methods, which is a principal application of text mining. This article discusses the influence of the Particle Swarm Optimization (PSO) on Dynamic Adaptive Self Organizing Maps (DASOM) in the design of an efficient ETD scheme by optimizing the neural parameters of the network. This hybrid machine learning scheme is designed to accomplish maximum accuracy with minimum computational time. The efficiency and scalability of the proposed scheme is analyzed and compared with standard algorithms such as SOM, DASOM and Linear Regression analysis. The system is trained and tested on DBLP database, University of Trier, Germany. The superiority of hybrid DASOM algorithm over the well-known algorithms in handling high dimensional large-scale data to detect emergent trends from the corpus is established in this article.

Download Full-text

Mercator: a pipeline for multi-method, unsupervised visualization and distance generation

Bioinformatics ◽

10.1093/bioinformatics/btab037 ◽

2021 ◽

Author(s):

Zachary B Abrams ◽

Caitlin E Coombes ◽

Suli Li ◽

Kevin R Coombes

Keyword(s):

Large Scale ◽

R Package ◽

High Dimensional ◽

Vast Number ◽

Large Scale Data ◽

User Friendly ◽

Exploratory Pattern ◽

Scale Data ◽

Selection Of ◽

Publication Quality

Abstract Summary Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics. Availabilityand implementation Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).

Download Full-text

A study on high dimensional large-scale data visualization

Korean Journal of Applied Statistics ◽

10.5351/kjas.2016.29.6.1061 ◽

2016 ◽

Vol 29 (6) ◽

pp. 1061-1075

Author(s):

Eun-Kyung Lee ◽

Nayoung Hwang ◽

Yoondong Lee

Keyword(s):

Data Visualization ◽

Large Scale ◽

High Dimensional ◽

Large Scale Data ◽

Scale Data

Download Full-text

Trust, Religiousity, Income, Quality of Accounting Information, and Muzaki Decision to Pay Zakat

JURNAL AKUNTANSI DAN KEUANGAN ISLAM ◽

10.35836/jakis.v9i1.217 ◽

2021 ◽

Vol 9 (1) ◽

pp. 39-58

Author(s):

Efri Syamsul Bahri ◽

◽

Ade Suhaeti ◽

Nursanita Nasution ◽

◽

...

Keyword(s):

Large Scale ◽

Sampling Method ◽

Negative Impact ◽

Linear Regression Analysis ◽

Accounting Information ◽

Multiple Linear Regression Analysis ◽

Large Scale Data ◽

Quality Of Accounting Information ◽

Scale Data

This study tests the factors that influence the decision of muzaki in channeling zakat, namely: trust, religiosity, income, and quality of accounting information. This study is a survey of 40 muzaki from Amil Zakat Institution (known as LAZ) Zakat Sukses in Depok. This study uses the purposive sampling method. Data analysis using SPSS 25 software with multiple linear regression analysis. The results of this study indicate that trust, religiosity, income, and the quality of accounting information simultaneously influence the decision of muzaki to distribute zakat through LAZ Zakat Sukses in Depok. Partially, trust, religiosity, and income positively affect the decision of muzaki to distribute zakat through LAZ Zakat Sukses. Meanwhile, the quality of accounting information has a negative impact on muzakki's decisions in distributing zakat through LAZ Zakat Sukses. This study's scope is on the muzaki at LAZ Zakat Sukses Depok. Therefore, the results may not represent nationally. Therefore, similar studies in collecting more large-scale data and broader areas will be useful. The implication is that LAZ Zakat Sukses need to show zakat management's performance to increase muzaki trust.

Download Full-text

Subsampled Hessian Newton Methods for Supervised Learning

Neural Computation ◽

10.1162/neco_a_00751 ◽

2015 ◽

Vol 27 (8) ◽

pp. 1766-1795 ◽

Cited By ~ 8

Author(s):

Chien-Chih Wang ◽

Chun-Heng Huang ◽

Chih-Jen Lin

Keyword(s):

Supervised Learning ◽

Newton Method ◽

Large Scale ◽

Order Approximation ◽

Hessian Matrix ◽

Computational Time ◽

Learning Approaches ◽

Newton Methods ◽

Large Scale Data ◽

Scale Data

Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time-consuming. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations, the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this work, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a two-dimensional subproblem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification.

Download Full-text

A Supervised Learning Model for High-Dimensional and Large-Scale Data

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/2972957 ◽

2017 ◽

Vol 8 (2) ◽

pp. 1-23 ◽

Cited By ~ 8

Author(s):

Chong Peng ◽

Jie Cheng ◽

Qiang Cheng

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Learning Model ◽

High Dimensional ◽

Large Scale Data ◽

Scale Data

Download Full-text

Evolutionary Computation Access on Incremental Map Reduce for Mining Large Scale Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1161.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 860-865

Keyword(s):

Evolutionary Computation ◽

Large Scale ◽

Iterative Algorithms ◽

Swarm Optimization ◽

Sequential Computation ◽

Iterative Processing ◽

Large Scale Data ◽

Computation Algorithm ◽

Pair Level ◽

Scale Data

In recent era, data updates arrive constantly from different areas like social network, finance, healthcare, ecommerce etc… Hence the data becomes large and computation on it becomes difficult. A framework for mining data earlyand to refresh the computed result with the new data arrival is proposed. The framework includes an incremental mapreduce method on hadoop with evolutionary computation algorithm for reduction in time complexity and increased accuracy. Proposed approach is a key pair level incremental iterative processing to Mapreduce for mining big data and uses particle swarm optimization to avoid recomputation from scratch on the new data arrived. Thereby the I/O overhead gets reduced for accessing predefined states. Experimental results were tested on three iterative algorithms in hadoop showed good performance compared to traditional mapreduce with sequential computation access

Download Full-text