scholarly journals GNS: Forge High Anonymity Graph by Nonlinear Scaling Spectrum

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Yong Zeng ◽  
Yixin Li ◽  
Zhongyuan Jiang ◽  
Jianfeng Ma

It is crucial to generate random graphs with specific structural properties from real graphs, which could anonymize graphs or generate targeted graph data sets. The state-of-the-art method called spectral graph forge (SGF) was proposed at INFOCOM 2018. This method uses a low-rank approximation of the matrix by throwing away some spectrums, which provides privacy protection after distributing graphs while ensuring data availability to a certain extent. As shown in SGF, it needs to discard at least 20% spectrum to defend against deanonymous attacks. However, the data availability will be significantly decreased after more spectrum discarding. Thus, is there a way to generate a graph that guarantees maximum spectrum and anonymity at the same time? To solve this problem, this paper proposes graph nonlinear scaling (GNS). We firmly prove that GNS can preserve all eigenvectors meanwhile providing high anonymity for the forged graph. Precisely, the GNS scales the eigenvalues of the original spectrum and constructs the forged graph with scaled eigenvalues and original eigenvectors. This approach maximizes the preservation of spectrum information to guarantee data availability. Meanwhile, it provides high robustness towards deanonymous attacks. The experimental results show that when SGF discards only 10% of the spectrum, the forged graph has high data availability. At this time, if the distance vector deanonymity algorithm is used to attack the forged graph, almost 100% of the nodes can be identified, while when achieving the same availability, only about 20% of the nodes in the forged graph obtained from GNS can be identified. Moreover, our method is better than SGF in capturing the real graph’s structure in terms of modularity, the number of partitions, and average clustering.

Author(s):  
Д.А. Желтков ◽  
Е.Е. Тыртышников

Матричный крестовый метод является быстрым методом аппроксимации матриц матрицами малого ранга, его сложность составляет $O((m+n)r^2)$ операций. Важной особенностью является то, что если матрица задана не как хранящийся в памяти массив, а как функция от двух целочисленных аргументов, то можно найти еe малоранговое приближение, вычислив лишь $O((m+n)r)$ значений этой функции. Однако в случае сверхбольших размеров матрицы или крайней затратности вычисления еe элементов аппроксимация может занимать существенное время. Ускорить метод для подобных случаев можно с помощью параллельных алгоритмов. В настоящей статье предложен эффективный параллельный алгоритм для случая одинаковой сложности вычисления любого элемента матрицы. The matrix cross approximation method is a fast method based on low-rank matrix approximations with complexity $O((m+n)r^2)$ arithmetic operations. Its main feature consists in the following: if a matrix is not given as an array but is given as a function of two integer arguments, then this method allows one to compute the low-rank approximation of the given matrix by evaluating only $O((m+n)r)$ values of this function. However, if the matrix is extremely large or the evaluation of its elements is computationally expensive, then such an approximation becomes timeconsuming. For such cases, the performance of the method can be improved via parallelization. In this paper we propose an efficient parallel algorithm for the case of an equal computational cost for the evaluation of each matrix element.


2021 ◽  
Vol 47 (3) ◽  
pp. 1-37
Author(s):  
Srinivas Eswar ◽  
Koby Hayashi ◽  
Grey Ballard ◽  
Ramakrishnan Kannan ◽  
Michael A. Matheson ◽  
...  

We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.


Author(s):  
Gianluca Ceruti ◽  
Christian Lubich

AbstractWe propose and analyse a numerical integrator that computes a low-rank approximation to large time-dependent matrices that are either given explicitly via their increments or are the unknown solution to a matrix differential equation. Furthermore, the integrator is extended to the approximation of time-dependent tensors by Tucker tensors of fixed multilinear rank. The proposed low-rank integrator is different from the known projector-splitting integrator for dynamical low-rank approximation, but it retains the important robustness to small singular values that has so far been known only for the projector-splitting integrator. The new integrator also offers some potential advantages over the projector-splitting integrator: It avoids the backward time integration substep of the projector-splitting integrator, which is a potentially unstable substep for dissipative problems. It offers more parallelism, and it preserves symmetry or anti-symmetry of the matrix or tensor when the differential equation does. Numerical experiments illustrate the behaviour of the proposed integrator.


10.2196/20597 ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. e20597
Author(s):  
Ki-Hun Kim ◽  
Kwang-Jae Kim

Background A lifelogs-based wellness index (LWI) is a function for calculating wellness scores based on health behavior lifelogs (eg, daily walking steps and sleep times collected via a smartwatch). A wellness score intuitively shows the users of smart wellness services the overall condition of their health behaviors. LWI development includes estimation (ie, estimating coefficients in LWI with data). A panel data set comprising health behavior lifelogs allows LWI estimation to control for unobserved variables, thereby resulting in less bias. However, these data sets typically have missing data due to events that occur in daily life (eg, smart devices stop collecting data when batteries are depleted), which can introduce biases into LWI coefficients. Thus, the appropriate choice of method to handle missing data is important for reducing biases in LWI estimations with panel data. However, there is a lack of research in this area. Objective This study aims to identify a suitable missing-data handling method for LWI estimation with panel data. Methods Listwise deletion, mean imputation, expectation maximization–based multiple imputation, predictive-mean matching–based multiple imputation, k-nearest neighbors–based imputation, and low-rank approximation–based imputation were comparatively evaluated by simulating an existing case of LWI development. A panel data set comprising health behavior lifelogs of 41 college students over 4 weeks was transformed into a reference data set without any missing data. Then, 200 simulated data sets were generated by randomly introducing missing data at proportions from 1% to 80%. The missing-data handling methods were each applied to transform the simulated data sets into complete data sets, and coefficients in a linear LWI were estimated for each complete data set. For each proportion for each method, a bias measure was calculated by comparing the estimated coefficient values with values estimated from the reference data set. Results Methods performed differently depending on the proportion of missing data. For 1% to 30% proportions, low-rank approximation–based imputation, predictive-mean matching–based multiple imputation, and expectation maximization–based multiple imputation were superior. For 31% to 60% proportions, low-rank approximation–based imputation and predictive-mean matching–based multiple imputation performed best. For over 60% proportions, only low-rank approximation–based imputation performed acceptably. Conclusions Low-rank approximation–based imputation was the best of the 6 data-handling methods regardless of the proportion of missing data. This superiority is generalizable to other panel data sets comprising health behavior lifelogs given their verified low-rank nature, for which low-rank approximation–based imputation is known to perform effectively. This result will guide missing-data handling in reducing coefficient biases in new development cases of linear LWIs with panel data.


Author(s):  
Artem Khoroshev ◽  

The possibility of practical application of BLR-factorization (low-rank approximation of the matrix of un-knowns of a system of linear equations) for finite element modeling of the electromagnetic field topology of nonlinear magnetic systems is considered. A method for estimating the accuracy of the computed solution of the SLAE and the nature of the influence of the given accuracy of the low-rank approximation of the matrix of un-knowns on the upper limit of the relative forward error of the computed solution of the SLAE are shown. Using a model problem as an example, the dependence of the accuracy of calculating the integral characteristics of an electromechanical apparatus on the tolerance of the low-rank approximation of the matrix of unknowns is shown, as well as its effect on the convergence of the process of solving a nonlinear numerical problem. A quantitative assessment of the reduction in the computational complexity of the process of solving a numerical problem and the required amount of computer memory for solving the SLAE is carried out. The applicability of BLR-factorization for finite element modeling of the topology of the electromagnetic field without the use of numerical methods of the Krylov subspace is estimated.


2008 ◽  
Vol 20 (11) ◽  
pp. 2839-2861 ◽  
Author(s):  
Dit-Yan Yeung ◽  
Hong Chang ◽  
Guang Dai

In recent years, metric learning in the semisupervised setting has aroused a lot of research interest. One type of semisupervised metric learning utilizes supervisory information in the form of pairwise similarity or dissimilarity constraints. However, most methods proposed so far are either limited to linear metric learning or unable to scale well with the data set size. In this letter, we propose a nonlinear metric learning method based on the kernel approach. By applying low-rank approximation to the kernel matrix, our method can handle significantly larger data sets. Moreover, our low-rank approximation scheme can naturally lead to out-of-sample generalization. Experiments performed on both artificial and real-world data show very promising results.


2016 ◽  
Vol 27 (6) ◽  
pp. 846-887 ◽  
Author(s):  
MIHAI CUCURINGU ◽  
PUCK ROMBACH ◽  
SANG HOON LEE ◽  
MASON A. PORTER

We introduce several novel and computationally efficient methods for detecting “core–periphery structure” in networks. Core–periphery structure is a type of mesoscale structure that consists of densely connected core vertices and sparsely connected peripheral vertices. Core vertices tend to be well-connected both among themselves and to peripheral vertices, which tend not to be well-connected to other vertices. Our first method, which is based on transportation in networks, aggregates information from many geodesic paths in a network and yields a score for each vertex that reflects the likelihood that that vertex is a core vertex. Our second method is based on a low-rank approximation of a network's adjacency matrix, which we express as a perturbation of a tensor-product matrix. Our third approach uses the bottom eigenvector of the random-walk Laplacian to infer a coreness score and a classification into core and peripheral vertices. We also design an objective function to (1) help classify vertices into core or peripheral vertices and (2) provide a goodness-of-fit criterion for classifications into core versus peripheral vertices. To examine the performance of our methods, we apply our algorithms to both synthetically generated networks and a variety of networks constructed from real-world data sets.


2020 ◽  
Vol 14 (12) ◽  
pp. 2791-2798
Author(s):  
Xiaoqun Qiu ◽  
Zhen Chen ◽  
Saifullah Adnan ◽  
Hongwei He

Sign in / Sign up

Export Citation Format

Share Document