A Supervised Learning Model for High-Dimensional and Large-Scale Data

2017 ◽  
Vol 8 (2) ◽  
pp. 1-23 ◽  
Author(s):  
Chong Peng ◽  
Jie Cheng ◽  
Qiang Cheng
2019 ◽  
Vol 15 (3) ◽  
pp. 64-78
Author(s):  
Chandrakala D ◽  
Sumathi S ◽  
Saran Kumar A ◽  
Sathish J

Detection and realization of new trends from corpus are achieved through Emergent Trend Detection (ETD) methods, which is a principal application of text mining. This article discusses the influence of the Particle Swarm Optimization (PSO) on Dynamic Adaptive Self Organizing Maps (DASOM) in the design of an efficient ETD scheme by optimizing the neural parameters of the network. This hybrid machine learning scheme is designed to accomplish maximum accuracy with minimum computational time. The efficiency and scalability of the proposed scheme is analyzed and compared with standard algorithms such as SOM, DASOM and Linear Regression analysis. The system is trained and tested on DBLP database, University of Trier, Germany. The superiority of hybrid DASOM algorithm over the well-known algorithms in handling high dimensional large-scale data to detect emergent trends from the corpus is established in this article.


Author(s):  
Zachary B Abrams ◽  
Caitlin E Coombes ◽  
Suli Li ◽  
Kevin R Coombes

Abstract Summary Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics. Availabilityand implementation Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).


2016 ◽  
Vol 29 (6) ◽  
pp. 1061-1075
Author(s):  
Eun-Kyung Lee ◽  
Nayoung Hwang ◽  
Yoondong Lee

2015 ◽  
Vol 27 (8) ◽  
pp. 1766-1795 ◽  
Author(s):  
Chien-Chih Wang ◽  
Chun-Heng Huang ◽  
Chih-Jen Lin

Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time-consuming. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations, the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this work, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a two-dimensional subproblem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification.


2016 ◽  
Vol 6 (2) ◽  
pp. 76-82
Author(s):  
Antonius Rachmat ◽  
Yuan Lukito

Crowdsourced Labelling is a large scale data labelling process, solicits a large group of people to label the data, usually via Internet.  This paper discusses about design and implementation of Web-based Crowdsourced Labelling.  Supervised learning classification methods need labelled training data for its training phase.  Unfortunately, in many cases, there aren’t any already available labelled training data.  Large scale data labelling is a tedious and time consuming work.  This research develops a web-based crowdsourced labelling which able to solicit a large group of people as data labeler to speed up the data labelling process.  This system also allows multiple labeler for every data.  The final label is calculated using Weighted Majority Voting method.  We grabbed and used Facebook comments from the two candidates’ Facebook Page of 2014 Indonesian Presidential Election as testing data.  Based on the testing conducted we can conclude that this system is able to handle all the labelling steps well and able to handle collision occurred when multiple labeler labelling a same data in the same time. The system successfully produces final label in CSV format, which can be processed further with many sentiment analysis tools or machine learning tools. Index Terms - Crowdsources labeling, web-based system, supervised learning, weighted majority voting.


2009 ◽  
Vol 28 (11) ◽  
pp. 2737-2740
Author(s):  
Xiao ZHANG ◽  
Shan WANG ◽  
Na LIAN

Sign in / Sign up

Export Citation Format

Share Document