scholarly journals A MapReduce solution for associative classification of big data

2016 ◽  
Vol 332 ◽  
pp. 33-55 ◽  
Author(s):  
Alessio Bechini ◽  
Francesco Marcelloni ◽  
Armando Segatori
2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Data Mining is an essential task because the digital world creates huge data daily. Associative classification is one of the data mining task which is used to carry out classification of data, based on the demand of knowledge users. Most of the associative classification algorithms are not able to analyze the big data which are mostly continuous in nature. This leads to the interest of analyzing the existing discretization algorithms which converts continuous data into discrete values and the development of novel discretizer Reliable Distributed Fuzzy Discretizer for big data set. Many discretizers suffer the problem of over splitting the partitions. Our proposed method is implemented in distributed fuzzy environment and aims to avoid over splitting of partitions by introducing a novel stopping criteria. Proposed discretization method is compared with existing distributed fuzzy partitioning method and achieved good accuracy in the performance of associative classifiers.


2021 ◽  
Author(s):  
Mohammad Hassan Almaspoor ◽  
Ali Safaei ◽  
Afshin Salajegheh ◽  
Behrouz Minaei-Bidgoli

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.


Author(s):  
Kseniia Antipova

This article explores the main approaches of Russian and foreign authors towards big data definition; reflects the classification of data, components of big data; and provides comparative characteristics to legal regulation of big data. The subject of this research is the legislation of the Russian Federation and legislation of the European Union that regulate the activity on collection, processing and use of big data, personal data and information; judicial and arbitration practice of the Russian Federation in the sphere of personal data; normative legal acts of the Russian Federation; governmental regulation of the Russian Federation and foreign countries in the area of processing, use and transmission of data; as well as legal doctrine in the field of research dedicated to the nature of big data. The relevance of this research is substantiated by the fact that there is yet no conceptual uniformity with regards to big data in the world; the essence and methods of regulating big data are not fully explored. The goal of this research is determine the legal qualification of the data that comprise big data. The task lies in giving definition to the term “big data”; demonstrate the approaches towards determination of legal nature of big data; conduct  classification of big data; outline the criteria for distinguishing data that comprise the concept of big data; formulate the model for optimal regulation of relations in the process of activity on collection, processing, and use of the data. The original definition of big data in the narrow and broad sense is provided. As a result, the author distinguishes the types of data, reflects the legal qualification of data depending on the category of data contained therein: industrial data, user data, and personal data. Attention is also turned to the contractual form of big data circulation.


2016 ◽  
Vol 12 (S325) ◽  
pp. 39-45 ◽  
Author(s):  
Maria Süveges ◽  
Sotiria Fotopoulou ◽  
Jean Coupon ◽  
Stéphane Paltani ◽  
Laurent Eyer ◽  
...  

AbstractThroughout the processing and analysis of survey data, a ubiquitous issue nowadays is that we are spoilt for choice when we need to select a methodology for some of its steps. The alternative methods usually fail and excel in different data regions, and have various advantages and drawbacks, so a combination that unites the strengths of all while suppressing the weaknesses is desirable. We propose to use a two-level hierarchy of learners. Its first level consists of training and applying the possible base methods on the first part of a known set. At the second level, we feed the output probability distributions from all base methods to a second learner trained on the remaining known objects. Using classification of variable stars and photometric redshift estimation as examples, we show that the hierarchical combination is capable of achieving general improvement over averaging-type combination methods, correcting systematics present in all base methods, is easy to train and apply, and thus, it is a promising tool in the astronomical “Big Data” era.


2019 ◽  
Vol 8 (3) ◽  
pp. 27-31
Author(s):  
R. P. L. Durgabai ◽  
P. Bhargavi ◽  
S. Jyothi

Data, in today’s world, is essential. The Big Data technology is rising to examine the data to make fast insight and strategic decisions. Big data refers to the facility to assemble and examine the vast amounts of data that is being generated by different departments working directly or indirectly involved in agriculture. Due to lack of resources the pest analysis of rice crop is in poor condition which effects the production. In Andhra Pradesh rice is cultivated in almost all the districts. The goal is to provide better solutions for finding pest attack conditions in all districts using Big Data Analytics and to make better decisions on high productivity of rice crop in Andhra Pradesh.


2020 ◽  
Vol 17 (11) ◽  
pp. 5182-5197
Author(s):  
Amrinder Kaur ◽  
Rakesh Kumar

User interaction over the internet is growing day by day. The social network users send massive information to the network to share with others on the network. This increases the information on social media, hence needed a mechanism to handle or manage such high dimensional data termed as Big Data. Big Data reduction can be performed by using a feature selection approach. But, the Classification of such massive data is a challenging task for all the researchers. To overcome this problem, a metaheuristic based Genetic Algorithm (GA) for the selection of most suitable rows which can be provided for training. The selected rows undergo a feature extraction process, which is attained by Principle Component Analysis (PCA). The extracted principle components are optimized using another meta-heuristic algorithm termed as Whale Optimization. As the proposed algorithm uses unlabelled data, clustering is done to label the data. Two different distribution indexes were calculated for data with GA selected rows and data with GA selected rows along with PCA and whale. The distribution index is the ratio of a total number of elements in one cluster to a total number of elements in the second cluster. High distribution index leads to better accuracy when it comes to classifying the text data. The data is clustered using the K-Means algorithm to find the cluster indexes. The proposed algorithm presents a hybrid classification mechanism with upper and lower boundaries of classified labels using Artificial Neural Network (ANN) and Support Vector Machine (SVM).


Author(s):  
Yuriy V. Kostyuchenko ◽  
Maxim Yuschenko

Paper aimed to consider of approaches to big data (social network content) utilization for understanding of social behavior in the conflict zones, and analysis of dynamics of illegal armed groups. Analysis directed to identify of underage militants. The probabilistic and stochastic methods of analysis and classification of number, composition and dynamics of illegal armed groups in active conflict areas are proposed. Data of armed conflict – antiterrorist operation in Donbas (Eastern Ukraine in the period 2014-2015) is used for analysis. The numerical distribution of age, gender composition, origin, social status and nationality of child militants among illegal armed groups has been calculated. Conclusions on the applicability of described method in criminological practice, as well as about the possibilities of interpretation of obtaining results in the context of study of terrorism are proposed.


Sign in / Sign up

Export Citation Format

Share Document