scholarly journals Detecting outlying subspaces for high-dimensional data: the new task, algorithms, and performance

2006 ◽  
Vol 10 (3) ◽  
pp. 333-355 ◽  
Author(s):  
Ji Zhang ◽  
Hai Wang
GigaScience ◽  
2021 ◽  
Vol 10 (10) ◽  
Author(s):  
Thierry Meurers ◽  
Raffael Bild ◽  
Kieu-Mi Do ◽  
Fabian Prasser

Abstract Background Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets. Findings For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets. Conclusion With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing.


2015 ◽  
Vol 710 ◽  
pp. 127-131
Author(s):  
Qing Chao Jiang

In the mining of association rules, the generation of frequent itemsets is a key factor that influence the efficiency and performance of the algorithm. With the increase of data dimension, it is obvious that the traditional association rules mining algorithm can’t meet the demand of high dimensional data mining. On the basis of Apriori algorithm, we put forward Split Mtrix _Apriori algorithm in this paper. By generating the Boolean matrix of the database, Split Mtrix _Apriori algorithm decreased the times of scanning database when generating the frequent itemsets. With adopting grouping processing strategy in the Boolean matrix, the algorithm can still keep high efficiency in dealing with high-dimensional data.So Split Mtrix _Apriori improved the efficiency of association rule mining significantly.


2020 ◽  
Author(s):  
Chitrak Gupta ◽  
John Kevin Cava ◽  
Daipayan Sarkar ◽  
Eric Wilson ◽  
John Vant ◽  
...  

AbstractMolecular dynamics (MD) simulations have emerged to become the back-bone of today’s computational biophysics. Simulation tools such as, NAMD, AMBER and GROMACS have accumulated more than 100,000 users. Despite this remarkable success, now also bolstered by compatibility with graphics processor units (GPUs) and exascale computers, even the most scalable simulations cannot access biologically relevant timescales - the number of numerical integration steps necessary for solving differential equations in a million-to-billion-dimensional space is computationally in-tractable. Recent advancements in Deep Learning has made it such that patterns can be found in high dimensional data. In addition, Deep Learning have also been used for simulating physical dynamics. Here, we utilize LSTMs in order to predict future molecular dynamics from current and previous timesteps, and examine how this physics-guided learning can benefit researchers in computational biophysics. In particular, we test fully connected Feed-forward Neural Networks, Recurrent Neural Networks with LSTM / GRU memory cells with TensorFlow and PyTorch frame-works trained on data from NAMD simulations to predict conformational transitions on two different biological systems. We find that non-equilibrium MD is easier to train and performance improves under the assumption that each atom is independent of all other atoms in the system. Our study represents a case study for high-dimensional data that switches stochastically between fast and slow regimes. Applications of resolving these sets will allow real-world applications in the interpretation of data from Atomic Force Microscopy experiments.


2009 ◽  
Vol 35 (7) ◽  
pp. 859-866
Author(s):  
Ming LIU ◽  
Xiao-Long WANG ◽  
Yuan-Chao LIU

Sign in / Sign up

Export Citation Format

Share Document