An Improved Integrated Hash and Attributed based Encryption Model on High Dimensional Data in Cloud Environment

Cloud computing is a distributed architecture where user can store their private, public or any application software components on it. Many cloud based privacy protection solutions have been implemented, however most of them only focus on limited data resources and storage format. Data confidentiality and inefficient data access methods are the major issues which block the cloud users to store their high dimensional data. With more and more cloud based applications are being available and stored on various cloud servers, a novel multi-user based privacy protection mechanism need to design and develop to improve the privacy protection on high dimensional data. In this paper, a novel integrity algorithm with attribute based encryption model was implemented to ensure confidentiality for high dimensional data security on cloud storage. The main objective of this model is to store, transmit and retrieve the high dimensional cloud data with low computational time and high security. Experimental results show that the proposed model has high data scalability, less computational time and low memory usage compared to traditional cloud based privacy protection models.

Download Full-text

On Cluster-Aware Supervised Learning: Frameworks, Convergent Algorithms, and Applications

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.1053 ◽

2021 ◽

Author(s):

Shutong Chen ◽

Weijun Xie

Keyword(s):

Supervised Learning ◽

Random Forests ◽

Stationary Point ◽

Clustering Analysis ◽

Prediction Accuracy ◽

High Dimensional Data ◽

Computational Time ◽

High Dimensional ◽

Support Vector ◽

Numerical Studies

This paper proposes a cluster-aware supervised learning (CluSL) framework, which integrates the clustering analysis with supervised learning. The objective of CluSL is to simultaneously find the best clusters of the data points and minimize the sum of loss functions within each cluster. This framework has many potential applications in healthcare, operations management, manufacturing, and so on. Because CluSL, in general, is nonconvex, we develop a regularized alternating minimization (RAM) algorithm to solve it, where at each iteration, we penalize the distance between the current clustering solution and the one from the previous iteration. By choosing a proper penalty function, we show that each iteration of the RAM algorithm can be computed efficiently. We further prove that the proposed RAM algorithm will always converge to a stationary point within a finite number of iterations. This is the first known convergence result in cluster-aware learning literature. Furthermore, we extend CluSL to the high-dimensional data sets, termed the F-CluSL framework. In F-CluSL, we cluster features and minimize loss function at the same time. Similarly, to solve F-CluSL, a variant of the RAM algorithm (i.e., F-RAM) is developed and proven to be convergent to an [Formula: see text]-stationary point. Our numerical studies demonstrate that the proposed CluSL and F-CluSL can outperform the existing ones such as random forests and support vector classification, both in the interpretability of learning results and in prediction accuracy. Summary of Contribution: Aligned with the mission and scope of the INFORMS Journal on Computing, this paper proposes a cluster-aware supervised learning (CluSL) framework, which integrates the clustering analysis with supervised learning. Because CluSL is, in general, nonconvex, a regularized alternating projection algorithm is developed to solve it and is proven to always find a stationary solution. We further generalize the framework to the high-dimensional data set, F-CluSL. Our numerical studies demonstrate that the proposed CluSL and F-CluSL can deliver more interpretable learning results and outperform the existing ones such as random forests and support vector classification in computational time and prediction accuracy.

Download Full-text

Local Differential Privacy Protection of High-Dimensional Perceptual Data by the Refined Bayes Network

Sensors ◽

10.3390/s20092516 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2516

Author(s):

Chunhua Ju ◽

Qiuyang Gu ◽

Gongxing Wu ◽

Shuangzhu Zhang

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Differential Privacy ◽

High Dimensional Data ◽

Statistical Characteristics ◽

High Dimensional ◽

Bayes Network ◽

Crowd Sensing ◽

Original Dataset ◽

Perception System

Although the Crowd-Sensing perception system brings great data value to people through the release and analysis of high-dimensional perception data, it causes great hidden danger to the privacy of participants in the meantime. Currently, various privacy protection methods based on differential privacy have been proposed, but most of them cannot simultaneously solve the complex attribute association problem between high-dimensional perception data and the privacy threat problems from untrustworthy servers. To address this problem, we put forward a local privacy protection based on Bayes network for high-dimensional perceptual data in this paper. This mechanism realizes the local data protection of the users at the very beginning, eliminates the possibility of other parties directly accessing the user’s original data, and fundamentally protects the user’s data privacy. During this process, after receiving the data of the user’s local privacy protection, the perception server recognizes the dimensional correlation of the high-dimensional data based on the Bayes network, divides the high-dimensional data attribute set into multiple relatively independent low-dimensional attribute sets, and then sequentially synthesizes the new dataset. It can effectively retain the attribute dimension correlation of the original perception data, and ensure that the synthetic dataset and the original dataset have as similar statistical characteristics as possible. To verify its effectiveness, we conduct a multitude of simulation experiments. Results have shown that the synthetic data of this mechanism under the effective local privacy protection has relatively high data utility.

Download Full-text

Functional Dimension Reduction for Chemometrics

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch100 ◽

2011 ◽

pp. 661-666

Author(s):

Tuomas Kärnä ◽

Amaury Lendasse

Keyword(s):

Least Squares ◽

High Dimensional Data ◽

Computational Time ◽

High Dimensional ◽

Support Vector ◽

Data Set ◽

Spectrometric Data ◽

Function Fitting ◽

Svm Model ◽

Data Points

High dimensional data are becoming more and more common in data analysis. This is especially true in fields that are related to spectrometric data, such as chemometrics. Due to development of more accurate spectrometers one can obtain spectra of thousands of data points. Such a high dimensional data are problematic in machine learning due to increased computational time and the curse of dimensionality (Haykin, 1999; Verleysen & François, 2005; Bengio, Delalleau, & Le Roux, 2006). It is therefore advisable to reduce the dimensionality of the data. In the case of chemometrics, the spectra are usually rather smooth and low on noise, so function fitting is a convenient tool for dimensionality reduction. The fitting is obtained by fixing a set of basis functions and computing the fitting weights according to the least squares error criterion. This article describes a unsupervised method for finding a good function basis that is specifically built to suit the data set at hand. The basis consists of a set of Gaussian functions that are optimized for an accurate fitting. The obtained weights are further scaled using a Delta Test (DT) to improve the prediction performance. Least Squares Support Vector Machine (LS-SVM) model is used for estimation.

Download Full-text

PPDP-PCAO: An Efficient High-Dimensional Data Releasing Method With Differential Privacy Protection

IEEE Access ◽

10.1109/access.2019.2957858 ◽

2019 ◽

Vol 7 ◽

pp. 176429-176437 ◽

Cited By ~ 1

Author(s):

Wanjie Li ◽

Xing Zhang ◽

Xiaohui Li ◽

Guanghui Cao ◽

Qingyun Zhang

Keyword(s):

Privacy Protection ◽

Differential Privacy ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Outlier Detection in High Dimensional Data Based on Elastic Net Regression

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3479.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 325-328

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Absolute Error ◽

Database Systems ◽

Research Area ◽

Large Datasets ◽

Computational Time ◽

High Dimensional ◽

Squared Error ◽

Convergence Results

Outlier detection in large datasets is the dynamic research area in computer science such as data mining, database systems, and distributed systems. Outlier detection faces many challenges due to the absence of data samples from the outlier class. Massive algorithms have been projected to conquer the challenges in this field to improve the efficiency of regression approach for large datasets. Currently, no particular efficient regression technique is designed for outlier detection. In this research, we proposed an ElasticNet regression model for detecting the outliers in high dimensional data. To validate the efficiency and competence of our projected algorithm, it is implemented in the open source software called Weka Explorer. The parameters such as Mean absolute error 0.0022, RMSE 0.0387, Relative absolute error (RAE) 0.4562 and Root relative squared error (RSE) 7.8722 are calculated using annthyroid dataset. ElasticNet model consumes less computational time, generates fast convergence results, provides high accuracy and correctly classified accuracy is 98.25%.

Download Full-text

Utility metric for unsupervised feature selection

PeerJ Computer Science ◽

10.7717/peerj-cs.477 ◽

2021 ◽

Vol 7 ◽

pp. e477

Author(s):

Amalia Villa ◽

Abhijith Mundanad Narayanan ◽

Sabine Van Huffel ◽

Alexander Bertrand ◽

Carolina Varon

Keyword(s):

Feature Selection ◽

Manifold Learning ◽

State Of The Art ◽

High Dimensional Data ◽

Subset Selection ◽

The State ◽

Computational Time ◽

High Dimensional ◽

Learning Stage ◽

Unsupervised Feature Selection

Feature selection techniques are very useful approaches for dimensionality reduction in data analysis. They provide interpretable results by reducing the dimensions of the data to a subset of the original set of features. When the data lack annotations, unsupervised feature selectors are required for their analysis. Several algorithms for this aim exist in the literature, but despite their large applicability, they can be very inaccessible or cumbersome to use, mainly due to the need for tuning non-intuitive parameters and the high computational demands. In this work, a publicly available ready-to-use unsupervised feature selector is proposed, with comparable results to the state-of-the-art at a much lower computational cost. The suggested approach belongs to the methods known as spectral feature selectors. These methods generally consist of two stages: manifold learning and subset selection. In the first stage, the underlying structures in the high-dimensional data are extracted, while in the second stage a subset of the features is selected to replicate these structures. This paper suggests two contributions to this field, related to each of the stages involved. In the manifold learning stage, the effect of non-linearities in the data is explored, making use of a radial basis function (RBF) kernel, for which an alternative solution for the estimation of the kernel parameter is presented for cases with high-dimensional data. Additionally, the use of a backwards greedy approach based on the least-squares utility metric for the subset selection stage is proposed. The combination of these new ingredients results in the utility metric for unsupervised feature selection U2FS algorithm. The proposed U2FS algorithm succeeds in selecting the correct features in a simulation environment. In addition, the performance of the method on benchmark datasets is comparable to the state-of-the-art, while requiring less computational time. Moreover, unlike the state-of-the-art, U2FS does not require any tuning of parameters.

Download Full-text

A randomized block policy gradient algorithm with differential privacy in Content Centric Networks

International Journal of Distributed Sensor Networks ◽

10.1177/15501477211059934 ◽

2021 ◽

Vol 17 (12) ◽

pp. 155014772110599

Author(s):

Lin Wang ◽

Xingang Xu ◽

Xuhui Zhao ◽

Baozhu Li ◽

Ruijuan Zheng ◽

...

Keyword(s):

Privacy Protection ◽

Differential Privacy ◽

Effective Means ◽

High Dimensional Data ◽

Computational Cost ◽

Gradient Methods ◽

Multimedia Data ◽

Gradient Algorithm ◽

High Dimensional ◽

Policy Gradient

Policy gradient methods are effective means to solve the problems of mobile multimedia data transmission in Content Centric Networks. Current policy gradient algorithms impose high computational cost in processing high-dimensional data. Meanwhile, the issue of privacy disclosure has not been taken into account. However, privacy protection is important in data training. Therefore, we propose a randomized block policy gradient algorithm with differential privacy. In order to reduce computational complexity when processing high-dimensional data, we randomly select a block coordinate to update the gradients at each round. To solve the privacy protection problem, we add a differential privacy protection mechanism to the algorithm, and we prove that it preserves the [Formula: see text]-privacy level. We conduct extensive simulations in four environments, which are CartPole, Walker, HalfCheetah, and Hopper. Compared with the methods such as important-sampling momentum-based policy gradient, Hessian-Aided momentum-based policy gradient, REINFORCE, the experimental results of our algorithm show a faster convergence rate than others in the same environment.

Download Full-text

Large Sample Covariance Matrices and High-Dimensional Data Analysis

10.1017/cbo9781107588080 ◽

2015 ◽

Cited By ~ 26

Author(s):

Jianfeng Yao ◽

Shurong Zheng ◽

Zhidong Bai

Keyword(s):

Data Analysis ◽

High Dimensional Data ◽

Covariance Matrices ◽

High Dimensional ◽

Large Sample ◽

Sample Covariance Matrices ◽

Sample Covariance ◽

High Dimensional Data Analysis

Download Full-text

Fractal-Based Methods as a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: A Survey

Informatica ◽

10.15388/informatica.2016.84 ◽

2016 ◽

Vol 27 (2) ◽

pp. 257-281 ◽

Cited By ~ 5

Author(s):

Rasa Karbauskaitė ◽

Gintautas Dzemyda

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Intrinsic Dimensionality

Download Full-text

Control Cloud Data Access Privilge Anonymity with Attributed Based Encryption

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.68 ◽

2017 ◽

Vol 7 (8) ◽

pp. 279

Author(s):

P. Sudheer ◽

T. Lakshmi Surekha

Keyword(s):

Data Privacy ◽

Low Cost ◽

Data Access ◽

Privacy Concerns ◽

Cloud Data ◽

Computing Paradigm ◽

Attribute Based Encryption ◽

Data Content ◽

Identity Privacy ◽

Cloud Servers

Cloud computing is a revolutionary computing paradigm, which enables flexible, on-demand, and low-cost usage of computing resources, but the data is outsourced to some cloud servers, and various privacy concerns emerge from it. Various schemes based on the attribute-based encryption have been to secure the cloud storage. Data content privacy. A semi anonymous privilege control scheme AnonyControl to address not only the data privacy. But also the user identity privacy. AnonyControl decentralizes the central authority to limit the identity leakage and thus achieves semi anonymity. The Anonymity –F which fully prevent the identity leakage and achieve the full anonymity.

Download Full-text