Investigations on optimizing performance of the distributed computing in heterogeneous environment using machine learning technique for large scale data set

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text

Nonlinear Component Analysis for Large-Scale Data Set Using Fixed-Point Algorithm

Advances in Neural Networks – ISNN 2009 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01513-7_16 ◽

2009 ◽

pp. 144-151

Author(s):

Weiya Shi ◽

Yue-Fei Guo

Keyword(s):

Fixed Point ◽

Large Scale ◽

Component Analysis ◽

Data Set ◽

Fixed Point Algorithm ◽

Nonlinear Component ◽

Large Scale Data ◽

Scale Data

Download Full-text

An Improved Kernel Principal Component Analysis for Large-Scale Data Set

Advances in Neural Networks - ISNN 2010 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13318-3_2 ◽

2010 ◽

pp. 9-16 ◽

Cited By ~ 1

Author(s):

Weiya Shi ◽

Dexian Zhang

Keyword(s):

Principal Component Analysis ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

The Research on Large Scale Data Set Clustering Algorithm Based on Tag Set

Communications in Computer and Information Science - Computational Intelligence and Intelligent Systems ◽

10.1007/978-981-10-0356-1_38 ◽

2016 ◽

pp. 365-372

Author(s):

Qiang Chen

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

Counting From Sky: A Large-Scale Data Set for Remote Sensing Object Counting and a Benchmark Method

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2020.3020555 ◽

2020 ◽

pp. 1-14

Author(s):

Guangshuai Gao ◽

Qingjie Liu ◽

Yunhong Wang

Keyword(s):

Remote Sensing ◽

Large Scale ◽

Data Set ◽

Large Scale Data ◽

Object Counting ◽

Scale Data

Download Full-text

Exploration of Tourist Activities in Urban Destination Using Venue Check-In Data

Journal of Hospitality & Tourism Research ◽

10.1177/1096348019889121 ◽

2019 ◽

Vol 44 (3) ◽

pp. 472-498

Author(s):

Huy Quan Vu ◽

Jian Ming Luo ◽

Gang Li ◽

Rob Law

Keyword(s):

Data Collection ◽

Large Scale ◽

Traditional Approach ◽

Urban Tourism ◽

Data Set ◽

Tourism Marketing ◽

Large Scale Data ◽

New Type ◽

Scale Data

Understanding the differences and similarities in the activities of tourists from various cultures is important for tourism managers to develop appropriate plans and strategies that could support urban tourism marketing and managements. However, tourism managers still face challenges in obtaining such understanding because the traditional approach of data collection, which relies on survey and questionnaires, is incapable of capturing tourist activities at a large scale. In this article, we present a method for the study of tourist activities based on a new type of data, venue check-ins. The effectiveness of the presented approach is demonstrated through a case study of a major tourism country, France. Analysis based on a large-scale data set from 19 tourism cities in France reveals interesting differences and similarities in the activities of tourists from 14 markets (countries). Valuable insights are provided for various urban tourism applications.

Download Full-text