scholarly journals Map Reduce clustering in Incremental Big Data processing

An advanced Incremental processing technique is planned for data examination in knowledge to have the clustering results inform. Data is continuously arriving by different data generating factors like social network, online shopping, sensors, e-commerce etc. [1]. On account of this Big Data the consequences of data mining applications getting stale and neglected after some time. Cloud knowledge applications regularly perform iterative calculations (e.g., PageRank) on continuously converting datasets. Though going before trainings grow Map-Reduce aimed at productive iterative calculations, it's miles also pricey to carry out a whole new big-ruler Map-Reduce iterative task near well-timed quarter new adjustments to fundamental records sets. Our usage of MapReduce keeps running [4] scheduled a big cluster of product technologies and is incredibly walkable: an ordinary Map-Reduce computation procedure several terabytes of records arranged heaps of technologies. Processor operator locates the machine clean to apply: masses of MapReduce applications, we look at that during many instances, The differences result separate a totally little part of the data set, and the recently iteratively merged nation is very near the recently met state. I2MapReduce clustering adventures this commentary to keep re-calculated by way of beginning after the before affected national [2], and by using acting incremental up-dates on the converging information. The approach facilitates in enhancing the process successively period and decreases the jogging period of stimulating the consequences of big data.

Author(s):  
Trupti Vishwambhar Kenekar ◽  
Ajay R. Dani

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.


Hadmérnök ◽  
2020 ◽  
Vol 15 (4) ◽  
pp. 141-158
Author(s):  
Eszter Katalin Bognár

In modern warfare, the most important innovation to date has been the utilisation of information as a  weapon. The basis of successful military operations is  the ability to correctly assess a situation based on  credible collected information. In today’s military, the primary challenge is not the actual collection of data.  It has become more important to extract relevant  information from that data. This requirement cannot  be successfully completed without necessary  improvements in tools and techniques to support the acquisition and analysis of data. This study defines  Big Data and its concept as applied to military  reconnaissance, focusing on the processing of  imagery and textual data, bringing to light modern  data processing and analytics methods that enable  effective processing.


Author(s):  
Khyati R Nirmal ◽  
K.V.V. Satyanarayana

<p><span>In recent times Big Data Analysis are imminent as essential area in the field of Computer Science. Taking out of significant information from Big Data by separating the data in to distinct group is crucial task and it is beyond the scope of commonly used personal machine. It is necessary to adopt the distributed environment similar to map reduce paradigm and migrate the data mining algorithm using it. In Data Mining the partition based K Means Clustering is one of the broadly used algorithms for grouping data according to the degree of similarities between data. It requires the number of K and initial centroid of cluster as input. By surveying the parameters preferred by algorithm or opted by user influence the functionality of Algorithm. It is the necessity to migrate the K means Clustering on MapReduce and predicts the value of k using machine learning approach. For selecting the initial cluster the efficient method is to be devised and united with it. This paper is comprised the survey of several methods for predicting the value of K in K means Clustering and also contains the survey of different methodologies to find out initial center of the cluster. Along with initial value of k and initial centroid selection the objective of proposed work is to compact with analysis of categorical data.</span></p>


2019 ◽  
Vol 16 (8) ◽  
pp. 3211-3215 ◽  
Author(s):  
S. Prince Mary ◽  
D. Usha Nandini ◽  
B. Ankayarkanni ◽  
R. Sathyabama Krishna

Integration of cloud and big data very difficult and challenging task and to find the number of resources to complete their job is very difficult and challenging. So, virtualization is implemented it involves 3 phases map reduce, shuffle phase and reduce phase. Many researchers have been done already they have applied Heterogeneousmap reduce application and they use least-work-left policy technique to distributed server system. In this paper we have discussed about virtualization is used for hadoop jobs for effective data processing and to find the processing time of job and balance partition algorithm is used. The main objective is to implement virtualization in our local machines.


2017 ◽  
Vol 45 (4) ◽  
pp. 194-201 ◽  
Author(s):  
Shanyong Wang ◽  
Jun Li ◽  
Dingtao Zhao

Purpose The purpose of this paper is to apply an extended technology acceptance model to examine the medical data analyst’s intention to use medical big data processing technique. Design/methodology/approach Questionnaire survey method was used to collect data from 293 medical data analysts and analyzed with the assistance of structural equation modeling. Findings The results indicate that the perceived usefulness, social influence and attitude are important to the intention to use medical big data processing technique, and the direct effect of perceived usefulness on intention to use is greater than social influence and attitude. The perceived usefulness is influenced by perceived ease of use. Attitude is influenced by perceived usefulness, and attitude acts as a mediator between perceived usefulness and usage intention. Unexpectedly, attitude is not influenced by perceived ease of use and social influence. Originality/value This research examines the medical data analyst’s intention to use medical big data processing technique and provides several implications for using medical big data processing technique.


2019 ◽  
Vol 8 (S3) ◽  
pp. 35-40
Author(s):  
S. Mamatha ◽  
T. Sudha

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.


Big data is nothing but unstructured and structured data which is not possible to process by our traditional system its not only have the volume of data also velocity and verity of data, Processing means ( store and analyze for knowledge information to take decision), Every living, non living and each and every device generates tremendous amount of data every fraction of seconds, Hadoop is a software frame work to process big data to get knowledge out of stored data and enhance the business and solve the societal problems, Hadoop basically have two important components HDFS and Map Reduce HDFS for store and mapreduce to process. HDFS includes name node and data nodes for storage, Map-Reduce includes frame works of Job tracker and Task tracker. Whenever client request Hadoop to store name node responds with available free memory data nodes then client will write data to respective data nodes then replication factor of hadoop copies the blocks of data with other data nodes to overcome fault tolerance Name node stores the meta of data nodes. Replication is for back-up as hadoop HDFS uses commodity hardware for storage, also name node have back-up secondary name node as only point of failure the hadoop. Whenever clients want to process the data, client request the name node Job tracker then Name node communicate to Task tracker for task done. All the above components of hadoop are frame works on-top of OS for efficient utilization and manage the system recourses for big data processing. Big data processing performance is measured with bench marks programs in our research work we compared the processing i.e. execution time of bench mark program word count with Hadoop Map-Reduce python Jar code, PIG script and Hive query with same input file big.txt. and we can say that Hive is much faster than PIG and Map-reduce Python jar code Map-reduce execution time is 1m, 29sec Pig Execution time is 57 sec Hive execution time is 31 sec


Author(s):  
Khyati R Nirmal ◽  
K.V.V. Satyanarayana

<p><span>In recent times Big Data Analysis are imminent as essential area in the field of Computer Science. Taking out of significant information from Big Data by separating the data in to distinct group is crucial task and it is beyond the scope of commonly used personal machine. It is necessary to adopt the distributed environment similar to map reduce paradigm and migrate the data mining algorithm using it. In Data Mining the partition based K Means Clustering is one of the broadly used algorithms for grouping data according to the degree of similarities between data. It requires the number of K and initial centroid of cluster as input. By surveying the parameters preferred by algorithm or opted by user influence the functionality of Algorithm. It is the necessity to migrate the K means Clustering on MapReduce and predicts the value of k using machine learning approach. For selecting the initial cluster the efficient method is to be devised and united with it. This paper is comprised the survey of several methods for predicting the value of K in K means Clustering and also contains the survey of different methodologies to find out initial center of the cluster. Along with initial value of k and initial centroid selection the objective of proposed work is to compact with analysis of categorical data.</span></p>


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jui-Chan Huang ◽  
Po-Chang Ko ◽  
Cher-Min Fong ◽  
Sn-Man Lai ◽  
Hsin-Hung Chen ◽  
...  

With the increase in the number of online shopping users, customer loyalty is directly related to product sales. This research mainly explores the statistical modeling and simulation of online shopping customer loyalty based on machine learning and big data analysis. This research mainly uses machine learning clustering algorithm to simulate customer loyalty. Call the k-means interactive mining algorithm based on the Hash structure to perform data mining on the multidimensional hierarchical tree of corporate credit risk, continuously adjust the support thresholds for different levels of data mining according to specific requirements and select effective association rules until satisfactory results are obtained. After conducting credit risk assessment and early warning modeling for the enterprise, the initial preselected model is obtained. The information to be collected is first obtained by the web crawler from the target website to the temporary web page database, where it will go through a series of preprocessing steps such as completion, deduplication, analysis, and extraction to ensure that the crawled web page is correctly analyzed, to avoid incorrect data due to network errors during the crawling process. The correctly parsed data will be stored for the next step of data cleaning or data analysis. For writing a Java program to parse HTML documents, first set the subject keyword and URL and parse the HTML from the obtained file or string by analyzing the structure of the website. Secondly, use the CSS selector to find the web page list information, retrieve the data, and store it in Elements. In the overall fit test of the model, the root mean square error approximation (RMSEA) value is 0.053, between 0.05 and 0.08. The results show that the model designed in this study achieves a relatively good fitting effect and strengthens customers’ perception of shopping websites, and relationship trust plays a greater role in maintaining customer loyalty.


Sign in / Sign up

Export Citation Format

Share Document