Artificial Intelligence and Machine Learning for Large-Scale Data

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text

Large scale data mining using genetics-based machine learning

Proceeding of the fifteenth annual conference companion on Genetic and evolutionary computation conference companion - GECCO '13 Companion ◽

10.1145/2464576.2480807 ◽

2013 ◽

Author(s):

Jaume Bacardit ◽

Xavier Llorà

Keyword(s):

Machine Learning ◽

Data Mining ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

Using Uncertain DM-Chameleon Clustering Algorithm Based on Machine Learning to Predict Landslide Hazards

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2019.p0329 ◽

2019 ◽

Vol 31 (2) ◽

pp. 329-338 ◽

Cited By ~ 1

Author(s):

Jian Hu ◽

Haiwan Zhu ◽

Yimin Mao ◽

Canlong Zhang ◽

Tian Liang ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Clustering Algorithm ◽

Uncertain Data ◽

Landslide Hazard ◽

Data Sets ◽

Large Scale Data ◽

Landslide Hazards ◽

Hazard Levels ◽

Scale Data

Landslide hazard prediction is a difficult, time-consuming process when traditional methods are used. This paper presents a method that uses machine learning to predict landslide hazard levels automatically. Due to difficulties in obtaining and effectively processing rainfall in landslide hazard prediction, and to the existing limitation in dealing with large-scale data sets in the M-chameleon algorithm, a new method based on an uncertain DM-chameleon algorithm (developed M-chameleon) is proposed to assess the landslide susceptibility model. First, this method designs a new two-phase clustering algorithm based on M-chameleon, which effectively processes large-scale data sets. Second, the new E-H distance formula is designed by combining the Euclidean and Hausdorff distances, and this enables the new method to manage uncertain data effectively. The uncertain data model is presented at the same time to effectively quantify triggering factors. Finally, the model for predicting landslide hazards is constructed and verified using the data from the Baota district of the city of Yan’an, China. The experimental results show that the uncertain DM-chameleon algorithm of machine learning can effectively improve the accuracy of landslide prediction and has high feasibility. Furthermore, the relationships between hazard factors and landslide hazard levels can be extracted based on clustering results.

Download Full-text

Editorial: Network Mining and Machine Learning Methods of the Analysis of the Large-Scale Data in Biology, Medicine and Pharmacy

Current Bioinformatics ◽

10.2174/157489361301180219151606 ◽

2018 ◽

Vol 13 (1) ◽

pp. 2-2 ◽

Cited By ~ 1

Author(s):

Lei Chen ◽

Jiangning Song

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Methods ◽

Network Mining ◽

Machine Learning Methods ◽

Large Scale Data ◽

Scale Data

Download Full-text

Investigations on optimizing performance of the distributed computing in heterogeneous environment using machine learning technique for large scale data set

Materials Today Proceedings ◽

10.1016/j.matpr.2021.07.089 ◽

2021 ◽

Author(s):

Rajeev Pandey ◽

Sanjay Silakari

Keyword(s):

Machine Learning ◽

Distributed Computing ◽

Large Scale ◽

Heterogeneous Environment ◽

Machine Learning Technique ◽

Data Set ◽

Large Scale Data ◽

Learning Technique ◽

Scale Data

Download Full-text

Large-scale data mining using genetics-based machine learning

Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery ◽

10.1002/widm.1078 ◽

2013 ◽

Vol 3 (1) ◽

pp. 37-61 ◽

Cited By ~ 38

Author(s):

Jaume Bacardit ◽

Xavier Llorà

Keyword(s):

Machine Learning ◽

Data Mining ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

Machine Learning and Large-Scale Data for Visual Recognition

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.136.245 ◽

2016 ◽

Vol 136 (3) ◽

pp. 245-248

Author(s):

Tatsuya Harada

Keyword(s):

Machine Learning ◽

Large Scale ◽

Visual Recognition ◽

Large Scale Data ◽

Scale Data

Download Full-text

Large scale data mining using genetics-based machine learning

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation - GECCO '11 ◽

10.1145/2001858.2002137 ◽

2011 ◽

Cited By ~ 1

Author(s):

Jaume Bacardit ◽

Xavier Llorà

Keyword(s):

Machine Learning ◽

Data Mining ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

Opportunities for Understanding MS Mechanisms and Progression With MRI Using Large-Scale Data Sharing and Artificial Intelligence

Neurology ◽

10.1212/wnl.0000000000012884 ◽

2021 ◽

pp. 10.1212/WNL.0000000000012884

Author(s):

Hugo Vrenken ◽

Mark Jenkinson ◽

Dzung Pham ◽

Charles R.G. Guttmann ◽

Deborah Pareto ◽

...

Keyword(s):

Artificial Intelligence ◽

Image Analysis ◽

Data Sharing ◽

Large Scale ◽

Personal Data ◽

Human Observer ◽

Imaging Data ◽

Large Scale Data ◽

Scale Data

Multiple sclerosis (MS) patients have heterogeneous clinical presentations, symptoms and progression over time, making MS difficult to assess and comprehend in vivo. The combination of large-scale data-sharing and artificial intelligence creates new opportunities for monitoring and understanding MS using magnetic resonance imaging (MRI).First, development of validated MS-specific image analysis methods can be boosted by verified reference, test and benchmark imaging data. Using detailed expert annotations, artificial intelligence algorithms can be trained on such MS-specific data. Second, understanding disease processes could be greatly advanced through shared data of large MS cohorts with clinical, demographic and treatment information. Relevant patterns in such data that may be imperceptible to a human observer could be detected through artificial intelligence techniques. This applies from image analysis (lesions, atrophy or functional network changes) to large multi-domain datasets (imaging, cognition, clinical disability, genetics, etc.).After reviewing data-sharing and artificial intelligence, this paper highlights three areas that offer strong opportunities for making advances in the next few years: crowdsourcing, personal data protection, and organized analysis challenges. Difficulties as well as specific recommendations to overcome them are discussed, in order to best leverage data sharing and artificial intelligence to improve image analysis, imaging and the understanding of MS.

Download Full-text