Multidimensional Scaling With Very Large Datasets

This work provides a procedure with which to construct and visualize profiles, i.e., groups of individuals with similar characteristics, for weighted and mixed data by combining two classical multivariate techniques, multidimensional scaling (MDS) and the k-prototypes clustering algorithm. The well-known drawback of classical MDS in large datasets is circumvented by selecting a small random sample of the dataset, whose individuals are clustered by means of an adapted version of the k-prototypes algorithm and mapped via classical MDS. Gower’s interpolation formula is used to project remaining individuals onto the previous configuration. In all the process, Gower’s distance is used to measure the proximity between individuals. The methodology is illustrated on a real dataset, obtained from the Survey of Health, Ageing and Retirement in Europe (SHARE), which was carried out in 19 countries and represents over 124 million aged individuals in Europe. The performance of the method was evaluated through a simulation study, whose results point out that the new proposal solves the high computational cost of the classical MDS with low error.

Download Full-text

Pairwise likelihood inference for spatial regressions estimated on very large datasets

Spatial Statistics ◽

10.1016/j.spasta.2013.10.001 ◽

2014 ◽

Vol 7 ◽

pp. 21-39 ◽

Cited By ~ 10

Author(s):

Giuseppe Arbia

Keyword(s):

Large Datasets ◽

Likelihood Inference ◽

Pairwise Likelihood ◽

Very Large Datasets

Download Full-text

Scalable computation of streamlines on very large datasets

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09 ◽

10.1145/1654059.1654076 ◽

2009 ◽

Cited By ~ 40

Author(s):

Dave Pugmire ◽

Hank Childs ◽

Christoph Garth ◽

Sean Ahern ◽

Gunther H. Weber

Keyword(s):

Large Datasets ◽

Very Large Datasets ◽

Scalable Computation

Download Full-text

Distributed processing of very large datasets with DataCutter

Parallel Computing ◽

10.1016/s0167-8191(01)00099-0 ◽

2001 ◽

Vol 27 (11) ◽

pp. 1457-1478 ◽

Cited By ~ 118

Author(s):

Michael D Beynon ◽

Tahsin Kurc ◽

Umit Catalyurek ◽

Chialin Chang ◽

Alan Sussman ◽

...

Keyword(s):

Distributed Processing ◽

Large Datasets ◽

Very Large Datasets

Download Full-text

Choice of species affects phylogenetic stability of deep nodes: an empirical example in Terrabacteria

Bioinformatics ◽

10.1093/bioinformatics/btz121 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3608-3616

Author(s):

Ashley A Superson ◽

Doug Phelan ◽

Allyson Dekovich ◽

Fabia U Battistuzzi

Keyword(s):

Phylogenetic Reconstruction ◽

Relative Weight ◽

Large Datasets ◽

Supplementary Information ◽

Systematic Evaluation ◽

Individual Species ◽

Taxon Sampling ◽

Full Dataset ◽

Very Large Datasets ◽

Choice Of Species

Abstract Motivation The promise of higher phylogenetic stability through increased dataset sizes within tree of life (TOL) reconstructions has not been fulfilled. Among the many possible causes are changes in species composition (taxon sampling) that could influence phylogenetic accuracy of the methods by altering the relative weight of the evolutionary histories of each individual species. This effect would be stronger in clades that are represented by few lineages, which is common in many prokaryote phyla. Indeed, phyla with fewer taxa showed the most discordance among recent TOL studies. We implemented an approach to systematically test how the identity of taxa among a larger dataset and the number of taxa included affected the accuracy of phylogenetic reconstruction. Results Utilizing an empirical dataset within Terrabacteria we found that even within scenarios consisting of the same number of taxa, the species used strongly affected phylogenetic stability. Furthermore, we found that trees with fewer species were more dissimilar to the tree produced from the full dataset. These results hold even when the tree is composed by many phyla and only one of them is being altered. Thus, the effect of taxon sampling in one group does not seem to be buffered by the presence of many other clades, making this issue relevant even to very large datasets. Our results suggest that a systematic evaluation of phylogenetic stability through taxon resampling is advisable even for very large datasets. Availability and implementation https://github.com/BlabOaklandU/PATS.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Automated single particle detection and tracking for large microscopy datasets

Royal Society Open Science ◽

10.1098/rsos.160225 ◽

2016 ◽

Vol 3 (5) ◽

pp. 160225 ◽

Cited By ~ 11

Author(s):

Rhodri S. Wilson ◽

Lei Yang ◽

Alison Dun ◽

Annya M. Smyth ◽

Rory R. Duncan ◽

...

Keyword(s):

Single Molecule ◽

Single Particle ◽

Image Data ◽

Ground Truth ◽

Detection Algorithm ◽

Large Datasets ◽

Single Particle Tracking ◽

Synthetic Image ◽

Particle Detection ◽

Very Large Datasets

Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates.

Download Full-text