scholarly journals Reliable Distributed Fuzzy Discretizer for Associative Classification of Big data

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Data Mining is an essential task because the digital world creates huge data daily. Associative classification is one of the data mining task which is used to carry out classification of data, based on the demand of knowledge users. Most of the associative classification algorithms are not able to analyze the big data which are mostly continuous in nature. This leads to the interest of analyzing the existing discretization algorithms which converts continuous data into discrete values and the development of novel discretizer Reliable Distributed Fuzzy Discretizer for big data set. Many discretizers suffer the problem of over splitting the partitions. Our proposed method is implemented in distributed fuzzy environment and aims to avoid over splitting of partitions by introducing a novel stopping criteria. Proposed discretization method is compared with existing distributed fuzzy partitioning method and achieved good accuracy in the performance of associative classifiers.

Web Services ◽  
2019 ◽  
pp. 221-239 ◽  
Author(s):  
Arushi Jain ◽  
Vishal Bhatnagar ◽  
Pulkit Sharma

There is a proliferation in the amount of data generated and its volume, which is going to persevere for many coming years. Big data clustering is the exercise of taking a set of objects and dividing them into groups in such a way that the objects in the same groups are more similar to each other according to a certain set of parameters than to those in other groups. These groups are known as clusters. Cluster analysis is one of the main tasks in the field of data mining and is a commonly used technique for statistical analysis of data. While big data collaborative filtering defined as a technique that filters the information sought by the user and patterns by collaborating multiple data sets such as viewpoints, multiple agents and pre-existing data about the users' behavior stored in matrices. Collaborative filtering is especially required when a huge data set is present.


Author(s):  
Arushi Jain ◽  
Vishal Bhatnagar ◽  
Pulkit Sharma

There is a proliferation in the amount of data generated and its volume, which is going to persevere for many coming years. Big data clustering is the exercise of taking a set of objects and dividing them into groups in such a way that the objects in the same groups are more similar to each other according to a certain set of parameters than to those in other groups. These groups are known as clusters. Cluster analysis is one of the main tasks in the field of data mining and is a commonly used technique for statistical analysis of data. While big data collaborative filtering defined as a technique that filters the information sought by the user and patterns by collaborating multiple data sets such as viewpoints, multiple agents and pre-existing data about the users' behavior stored in matrices. Collaborative filtering is especially required when a huge data set is present.


2021 ◽  
Vol 129 (10) ◽  
pp. 1336
Author(s):  
Sonali Dubey ◽  
Rohit Kumar ◽  
Abhishek K. Rai ◽  
Awadhesh K. Rai

Laser-induced breakdown spectroscopy (LIBS) is emerging as an analytical tool for investigating geological materials. The unique abilities of this technique proven its potential in the area of geology. Detection of light elements, portability for in-field analysis, spot detection, and no sample preparation are some features that make this technique appropriate for the study of geological materials. The application of the LIBS technique has been tremendously developed in recent years. In this report, results obtained from previous and most recent studies regarding the investigation of geological materials LIBS technique are reviewed. Firstly, we introduce investigations that report the advancement in LIBS instrumentation, its applications, especially in the area of gemology and the extraterrestrial/planetary exploration have been reviewed. Investigation of gemstones by LIBS technique is not widely reviewed in the past as compared to LIBS application in planetary exploration or other geological applications. It is anticipated that for the classification of gemstones samples, huge data set is appropriate and to analyze this data set, multivariate/chemometric methods will be useful. Recent advancement of LIBS instrumentation for the study of meteorites, depth penetration in Martian rocks and its regolith proved the feasibility of LIBS used as robotic vehicles in the Martian environment. Keywords: LIBS, Gemstone, geological samples, Extra-terrestrial


Author(s):  
Trupti Vishwambhar Kenekar ◽  
Ajay R. Dani

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.


2016 ◽  
Vol 332 ◽  
pp. 33-55 ◽  
Author(s):  
Alessio Bechini ◽  
Francesco Marcelloni ◽  
Armando Segatori

2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Win-Tsung Lo ◽  
Yue-Shan Chang ◽  
Ruey-Kai Sheu ◽  
Chun-Chieh Chiu ◽  
Shyan-Ming Yuan

Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.


2019 ◽  
Vol 8 (S3) ◽  
pp. 35-40
Author(s):  
S. Mamatha ◽  
T. Sudha

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.


The digital world with digital processing, requires large storage space. The continuous explosion of the data such as text, image, audio, video, data centers and backup data lead to several problem in both storage and retrieval process. In this paper drought analysis and prediction is done using big data processing tools such as Hadoop and hive which can increase high. Previously to analyze and predict drought, traditional techniques such as AVISO model is used which is complex to process, requires more processing time, cannot process huge data and also has more security issues like malware in the database, abuse of privileges, etc. The system proposed in this paper can process huge data and has more processing speed. Here, drought analysis and prediction is carried out. To analyze drought dataset with more than ten lakhs are processed and drought type is found using map-reduce algorithm which maps and reduces the data using numerical summarization. Drought types such as D0, D1, D2,D3 D4 are analyzed to obtain reduced output. The obtained drought type are clustered using hive. To predict drought, random forest algorithm acts as an predictor which creates multiple decision trees and finds the best split among them. Finally, the predicted output is visualized using the time series model. The tools used in this paper include Hadoop and hive which can process huge data and it is the solution of Big Data. Hadoop is an open-source software framework for storing data and processing them efficiently, even if the data size is very huge. Hadoop uses Hadoop Distributed File System(HDFS) for storage and MapReduce for processing the data. Hive is a query processing tool which is built on top of Hadoop. It is a Structured query language(SQL)-like language called HiveQL (HQL). In this paper hive is used to cluster the data obtained from MapReduce. Thus using Big Data improves performance more than 50% compared to traditional system


2018 ◽  
Vol 8 (3) ◽  
pp. 120-125
Author(s):  
Ahmad Alaiad ◽  
Hassan Najadat ◽  
Nusaiba Al-Mnayyis ◽  
Ashwaq Khalil

Data envelopment analysis (DEA) has been widely used in many fields. Recently, it has been adopted by the healthcare sector to improve efficiency and performance of the healthcare organisations, and thus, reducing overall costs and increasing productivity. In this paper, we demonstrate the results of applying the DEA model in Jordanian hospitals. The dataset consists of 28 hospitals and is classified into two groups: efficient and non-efficient hospitals. We applied different association classification data mining techniques (JCBA, WeightedClassifier and J48) to generate strong rules using the Waikato Environment for Knowledge Analysis. We also applied the open source DEA software and MaxDEA software to manipulate the DEA model. The results showed that JCBA has the highest accuracy. However, WeightedClassifier method achieves the highest number of generated rules, while the JCBA method has the minimum number of generated rules. The results have several implications for practice in the healthcare sector and decision makers. Keywords: Component, DEA, DMU, output-oriented model, health care system.


An advanced Incremental processing technique is planned for data examination in knowledge to have the clustering results inform. Data is continuously arriving by different data generating factors like social network, online shopping, sensors, e-commerce etc. [1]. On account of this Big Data the consequences of data mining applications getting stale and neglected after some time. Cloud knowledge applications regularly perform iterative calculations (e.g., PageRank) on continuously converting datasets. Though going before trainings grow Map-Reduce aimed at productive iterative calculations, it's miles also pricey to carry out a whole new big-ruler Map-Reduce iterative task near well-timed quarter new adjustments to fundamental records sets. Our usage of MapReduce keeps running [4] scheduled a big cluster of product technologies and is incredibly walkable: an ordinary Map-Reduce computation procedure several terabytes of records arranged heaps of technologies. Processor operator locates the machine clean to apply: masses of MapReduce applications, we look at that during many instances, The differences result separate a totally little part of the data set, and the recently iteratively merged nation is very near the recently met state. I2MapReduce clustering adventures this commentary to keep re-calculated by way of beginning after the before affected national [2], and by using acting incremental up-dates on the converging information. The approach facilitates in enhancing the process successively period and decreases the jogging period of stimulating the consequences of big data.


Sign in / Sign up

Export Citation Format

Share Document