The Big Data Mining Approach for Finding top rated URL

2015 ◽  
Vol 7 (1) ◽  
pp. 17-32
Author(s):  
J.S. Shyam Mohan ◽  
P. Shanmugapriya ◽  
Bhamidipati Vinay Pawan Kumar

Abstract Finding out the widely used URL’s from online shopping sites for any particular category is a difficult task as there are many heterogeneous and multi-dimensional data set which depends on various factors. Traditional data mining methods are limited to homogenous data source, so they fail to sufficiently consider the characteristics of heterogeneous data. This paper presents a consistent Big Data mining search which performs analytics on text data to find the top rated URL’s. Though many heuristic search methods are available, our proposed method solves the problem of searching compared with traditional methods in data mining. The sample results are obtained in optimal time and are compared with other methods which is effective and efficient.

2017 ◽  
Vol 2017 ◽  
pp. 1-12 ◽  
Author(s):  
Shuo Li ◽  
Zhanjie Song ◽  
Wenhuan Lu ◽  
Daniel Sun ◽  
Jianguo Wei

The privacy is a major concern in big data mining approach. In this paper, we propose a novel self-recovery speech watermarking framework with consideration of trustable communication in big data mining. In the framework, the watermark is the compressed version of the original speech. The watermark is embedded into the least significant bit (LSB) layers. At the receiver end, the watermark is used to detect the tampered area and recover the tampered speech. To fit the complexity of the scenes in big data infrastructures, the LSB is treated as a parameter. This work discusses the relationship between LSB and other parameters in terms of explicit mathematical formulations. Once the LSB layer has been chosen, the best choices of other parameters are then deduced using the exclusive method. Additionally, we observed that six LSB layers are the limit for watermark embedding when the total bit layers equaled sixteen. Experimental results indicated that when the LSB layers changed from six to three, the imperceptibility of watermark increased, while the quality of the recovered signal decreased accordingly. This result was a trade-off and different LSB layers should be chosen according to different application conditions in big data infrastructures.


The distance measure is the core idea of data mining techniques such as classification, clustering, and statistical analysis and so on. All clustering taxonomies such as partition, hierarchical, density, grid, model, fuzzy and graphs used to distance measures for the data point’s categorization under difference cluster, cluster construction and validation. Big data mining is the advanced concept of data mining respect to the big data dimensions. When traditional clustering algorithm is used under the big data mining the distance measure is needed for scalable under big data mining and support to a huge size dataset, heterogeneous data and sources, and velocity characteristics of the big data. From a theoretically, practically and the existing research perspective, the paper focuses on volume, variety, and velocity big data criterion for identifying a distance measure for the big data mining and recognize how to distance measure works under clustering taxonomy. This study also analyzed all distance measures accuracy with the help of a confusion matrix through clustering.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 154035-154043 ◽  
Author(s):  
Hangjun Zhou ◽  
Guang Sun ◽  
Sha Fu ◽  
Jing Liu ◽  
Xingxing Zhou ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document