Improved Unsupervised Representation Learning of Spatial Transcriptomic Data with Sparse Filtering

We have developed representation learning methods, specifically to address the constraints and advantages of complex spatial data. Sparse filtering (SFt), uses principles of sparsity and mutual information to build representations from both global and local features from a minimal list of samples. Critically, the samples that comprise each representation are listed and ranked by informativeness. We used the Allen Mouse Brain Atlas gene expression data for prototyping and established performance metrics based on representation accuracy to labeled anatomy. SFt, implemented with the PyTorch machine learning libraries for Python, returned the most accurate reconstruction of anatomical ground truth of any method tested. SFt generated gene lists could be further compressed, retaining 95% of informativeness with only 580 genes. Finally, we build classifiers capable of parsing anatomy with >95% accuracy using only 10 derived genes. Sparse learning is a powerful, but underexplored means to derive biologically meaningful representations from complex datasets and a quantitative basis for compressed sensing of classifiable phenomena. SFt should be considered as an alternative to PCA or manifold learning for any high dimensional dataset and the basis for future spatial learning algorithms.

Download Full-text

Fully Data-Driven Pseudohealthy Synthesis for Planning Valve-Sparing Aortic Root Reconstruction using Conditional Variational Autoencoders

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-3072 ◽

2020 ◽

Vol 6 (3) ◽

pp. 284-287

Author(s):

Jannis Hagenah ◽

Mohamad Mehdi ◽

Floris Ernst

Keyword(s):

Aortic Root ◽

Similarity Index ◽

Ground Truth ◽

Representation Learning ◽

Patient Specific ◽

Ultrasound Images ◽

Specific Geometry ◽

The Individual ◽

Native Root ◽

Original Information

AbstractAortic root aneurysm is treated by replacing the dilated root by a grafted prosthesis which mimics the native root morphology of the individual patient. The challenge in predicting the optimal prosthesis size rises from the highly patient-specific geometry as well as the absence of the original information on the healthy root. Therefore, the estimation is only possible based on the available pathological data. In this paper, we show that representation learning with Conditional Variational Autoencoders is capable of turning the distorted geometry of the aortic root into smoother shapes while the information on the individual anatomy is preserved. We evaluated this method using ultrasound images of the porcine aortic root alongside their labels. The observed results show highly realistic resemblance in shape and size to the ground truth images. Furthermore, the similarity index has noticeably improved compared to the pathological images. This provides a promising technique in planning individual aortic root replacement.

Download Full-text

A longitudinal study of e-commerce diversity in Europe

Electronic Commerce Research ◽

10.1007/s10660-021-09466-z ◽

2021 ◽

Author(s):

Adam Sadowski ◽

Karolina Lewandowska-Gwarda ◽

Renata Pisarek-Bartoszewska ◽

Per Engelseth

Keyword(s):

Spatial Data ◽

Online Shopping ◽

Statistical Data ◽

Spatial Diversity ◽

Regional Level ◽

Spatial Data Analysis ◽

Household Preferences ◽

Global And Local ◽

Data Analysis Methods ◽

Local Spatial Autocorrelation

AbstractOwing to increased access to the Internet and the development of electronic commerce, e-commerce has become a common method of shopping in all countries. The purpose of this study is more precisely to research e-commerce diversity in Europe at the regional level and develop the conception of “E-commerce Supply Chain Management”. Statistical data derived from the European Statistical Office were applied to analyse the spatial diversity of e-retailing. Assessments of the regional diversity of e-retailing applied geographic information systems and exploratory spatial data analysis methods such us global and local spatial autocorrelation statistics. Clusters of regions with similar household preferences related to online shopping were identified. A spatial visualisation of the e-retailing diversity phenomenon may be utilised for the reconfiguration of supply chains and to adapt them to actual household preferences related to shopping methods.

Download Full-text

Considerations for performance metrics of metagenomic next generation sequencing analyses

10.1101/2020.12.17.423212 ◽

2020 ◽

Author(s):

Jason G. Kralj ◽

Stephanie L. Servetas ◽

Samuel P. Forry ◽

Scott A. Jackson

Keyword(s):

Performance Metrics ◽

Limit Of Detection ◽

Clinical Performance ◽

Ground Truth ◽

Negative Control ◽

Fitness For Purpose ◽

Performance Metric ◽

The One ◽

Sensitivity Specificity ◽

Harmonic Means

AbstractEvaluating the performance of metagenomics analyses has proven a challenge, due in part to limited ground-truth standards, broad application space, and numerous evaluation methods and metrics. Application of traditional clinical performance metrics (i.e. sensitivity, specificity, etc.) using taxonomic classifiers do not fit the “one-bug-one-test” paradigm. Ultimately, users need methods that evaluate fitness-for-purpose and identify their analyses’ strengths and weaknesses. Within a defined cohort, reporting performance metrics by taxon, rather than by sample, will clarify this evaluation. An estimated limit of detection, positive and negative control samples, and true positive and negative true results are necessary criteria for all investigated taxa. Use of summary metrics should be restricted to comparing results of similar cohorts and data, and should employ harmonic means and continuous products for each performance metric rather than arithmetic mean. Such consideration will ensure meaningful comparisons and evaluation of fitness-for-purpose.

Download Full-text

Unsupervised Representation Learning of Spatial Data via Multimodal Embedding

Proceedings of the 28th ACM International Conference on Information and Knowledge Management - CIKM '19 ◽

10.1145/3357384.3358001 ◽

2019 ◽

Author(s):

Porter Jenkins ◽

Ahmad Farag ◽

Suhang Wang ◽

Zhenhui Li

Keyword(s):

Spatial Data ◽

Representation Learning

Download Full-text

Learning Compositional Representations of Interacting Systems with Restricted Boltzmann Machines: Comparative Study of Lattice Proteins

Neural Computation ◽

10.1162/neco_a_01210 ◽

2019 ◽

Vol 31 (8) ◽

pp. 1671-1717 ◽

Cited By ~ 1

Author(s):

Jérôme Tubiana ◽

Simona Cocco ◽

Rémi Monasson

Keyword(s):

Graphical Model ◽

A Priori ◽

Protein Sequences ◽

Ground Truth ◽

Representation Learning ◽

Statistical Features ◽

Restricted Boltzmann Machines ◽

Interacting Systems ◽

Hidden Layer ◽

Stochastic Mapping

A restricted Boltzmann machine (RBM) is an unsupervised machine learning bipartite graphical model that jointly learns a probability distribution over data and extracts their relevant statistical features. RBMs were recently proposed for characterizing the patterns of coevolution between amino acids in protein sequences and for designing new sequences. Here, we study how the nature of the features learned by RBM changes with its defining parameters, such as the dimensionality of the representations (size of the hidden layer) and the sparsity of the features. We show that for adequate values of these parameters, RBMs operate in a so-called compositional phase in which visible configurations sampled from the RBM are obtained by recombining these features. We then compare the performance of RBM with other standard representation learning algorithms, including principal or independent component analysis (PCA, ICA), autoencoders (AE), variational autoencoders (VAE), and their sparse variants. We show that RBMs, due to the stochastic mapping between data configurations and representations, better capture the underlying interactions in the system and are significantly more robust with respect to sample size than deterministic methods such as PCA or ICA. In addition, this stochastic mapping is not prescribed a priori as in VAE, but learned from data, which allows RBMs to show good performance even with shallow architectures. All numerical results are illustrated on synthetic lattice protein data that share similar statistical features with real protein sequences and for which ground-truth interactions are known.

Download Full-text

Comparing the Expression of Genes Related to Serotonin (5-HT) in C57BL/6J Mice and Humans Based on Data Available at the Allen Mouse Brain Atlas and Allen Human Brain Atlas

Neurology Research International ◽

10.1155/2017/7138926 ◽

2017 ◽

Vol 2017 ◽

pp. 1-14 ◽

Cited By ~ 6

Author(s):

C. A. Acevedo-Triana ◽

L. A. León ◽

F. P. Cardenas

Keyword(s):

Human Brain ◽

Mouse Brain ◽

Expression Patterns ◽

Brain Atlas ◽

Expression Data ◽

Expression Of Genes ◽

High Degree ◽

The Relationship ◽

Allen Mouse Brain Atlas

Brain atlases are tools based on comprehensive studies used to locate biological characteristics (structures, connections, proteins, and gene expression) in different regions of the brain. These atlases have been disseminated to the point where tools have been created to store, manage, and share the information they contain. This study used the data published by the Allen Mouse Brain Atlas (2004) for mice (C57BL/6J) and Allen Human Brain Atlas (2010) for humans (6 donors) to compare the expression of serotonin-related genes. Genes of interest were searched for manually in each case (in situ hybridization for mice and microarrays for humans), normalized expression data (z-scores) were extracted, and the results were graphed. Despite the differences in methodology, quantification, and subjects used in the process, a high degree of similarity was found between expression data. Here we compare expression in a way that allows the use of translational research methods to infer and validate knowledge. This type of study allows part of the relationship between structures and functions to be identified, by examining expression patterns and comparing levels of expression in different states, anatomical correlations, and phenotypes between different species. The study concludes by discussing the importance of knowing, managing, and disseminating comprehensive, open-access studies in neuroscience.

Download Full-text

PubAnatomy 3D: Integrating Medline Exploration with the Allen Mouse Brain Atlas

Frontiers in Neuroinformatics ◽

10.3389/conf.fninf.2013.09.00082 ◽

2013 ◽

Vol 7 ◽

Author(s):

Yang Gang ◽

Dai Manhong ◽

Song Jean ◽

Mirel BarBara ◽

Meng Fan

Keyword(s):

Mouse Brain ◽

Brain Atlas ◽

Allen Mouse Brain Atlas

Download Full-text

Accurate localization of linear probe electrodes across multiple brains

10.1101/2020.02.25.965210 ◽

2020 ◽

Cited By ~ 2

Author(s):

Liu D Liu ◽

Susu Chen ◽

Michael N Economo ◽

Nuo Li ◽

Karel Svoboda

Keyword(s):

Ground Truth ◽

Brain Structures ◽

Brain Regions ◽

Brain Atlas ◽

3 Dimensional ◽

Accurate Localization ◽

Large Numbers ◽

Linear Probe ◽

Electrode Localization ◽

Recording Electrodes

AbstractRecently developed silicon probes have large numbers of recording electrodes on long linear shanks. Specifically, Neuropixels probes have 960 recording electrodes distributed over 9.6 mm shanks. Because of their length, Neuropixels probe recordings in rodents naturally span multiple brain areas. Typical studies collate recordings across several recording sessions and animals. Neurons recorded in different sessions and animals have to be aligned to each other and to a standardized brain coordinate system. Here we report a workflow for accurate localization of individual electrodes in standardized coordinates and aligned across individual brains. This workflow relies on imaging brains with fluorescent probe tracks and warping 3-dimensional image stacks to standardized brain atlases. Electrophysiological features are then used to anchor particular electrodes along the reconstructed tracks to specific locations in the brain atlas and therefore to specific brain structures. We performed ground-truth experiments, in which motor cortex outputs are labelled with ChR2 and a fluorescence protein. Recording from brain regions targeted by these outputs reveals better than 100 μm accuracy for electrode localization.

Download Full-text

Optical to Planar X-ray Mouse Image Mapping in Preclinical Nuclear Medicine Using Conditional Adversarial Networks

Journal of Imaging ◽

10.3390/jimaging7120262 ◽

2021 ◽

Vol 7 (12) ◽

pp. 262

Author(s):

Eleftherios Fysikopoulos ◽

Maritina Rouchota ◽

Vasilis Eleftheriadis ◽

Christina-Anna Gatsiou ◽

Irinaios Pilatis ◽

...

Keyword(s):

Molecular Imaging ◽

Performance Metrics ◽

Ex Vivo ◽

Imaging Techniques ◽

Similarity Index ◽

Zero Point ◽

Ground Truth ◽

Generative Adversarial Network ◽

X Ray ◽

Photographic Images

In the current work, a pix2pix conditional generative adversarial network has been evaluated as a potential solution for generating adequately accurate synthesized morphological X-ray images by translating standard photographic images of mice. Such an approach will benefit 2D functional molecular imaging techniques, such as planar radioisotope and/or fluorescence/bioluminescence imaging, by providing high-resolution information for anatomical mapping, but not for diagnosis, using conventional photographic sensors. Planar functional imaging offers an efficient alternative to biodistribution ex vivo studies and/or 3D high-end molecular imaging systems since it can be effectively used to track new tracers and study the accumulation from zero point in time post-injection. The superimposition of functional information with an artificially produced X-ray image may enhance overall image information in such systems without added complexity and cost. The network has been trained in 700 input (photography)/ground truth (X-ray) paired mouse images and evaluated using a test dataset composed of 80 photographic images and 80 ground truth X-ray images. Performance metrics such as peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM) and Fréchet inception distance (FID) were used to quantitatively evaluate the proposed approach in the acquired dataset.

Download Full-text

Spatial Gap-Filling of ESA CCI Satellite-Derived Soil Moisture Based on Linear Geostatistics

10.20944/preprints201909.0126.v1 ◽

2019 ◽

Author(s):

Ricardo M. Llamas ◽

Mario Guevara ◽

Danny Rorabaugh ◽

Michela Taufer ◽

Rodrigo Vargas

Keyword(s):

Soil Moisture ◽

Spatial Data ◽

Missing Values ◽

Linear Models ◽

European Space Agency ◽

Correlation Coefficients ◽

Ground Truth ◽

Frozen Soil ◽

Ground Truth Data ◽

The Usa

Soil moisture plays a key role in the Earth’s water and carbon cycles, but acquisition of continuous (i.e., gap-free) soil moisture measurements across large regions is a challenging task due to limitations of currently available point measurements. Satellites offer critical information for soil moisture over large areas on a regular basis (e.g., ESA CCI, NASA SMAP), however, there are regions where satellite-derived soil moisture cannot be estimated because of certain circumstances such as high canopy density, frozen soil, or extreme dry conditions. We compared and tested two approaches--Ordinary Kriging (OK) interpolation and General Linear Models (GLM)--to model soil moisture and fill spatial data gaps from the European Space Agency Climate Change Initiative (ESA CCI) version 3.2 (and compared them with version 4.4) from January 2000 to September 2012, over a region of 465,777 km2 across the Midwest of the USA. We tested our proposed methods to fill gaps in the original ESA CCI product, and two data subsets, removing 25% and 50% of the initially available valid pixels. We found a significant correlation coefficient (r = 0.523, RMSE = 0.092 m3m-3) between the original satellite-derived soil moisture product with ground-truth data from the North American Soil Moisture Database (NASMD). Predicted soil moisture using OK also had significant correlation coefficients with NASMD data, when using 100% (r = 0.522, RMSE = 0.092 m3m-3), 75% (r = 0.526, RMSE = 0.092 m3m-3) and 50% (r = 0.53, RMSE = 0.092 m3m-3) of available valid pixels for each month of the study period. GLM had lower but significant correlation coefficients with NASMD data (average r = 0.478, RMSE = 0.092 m3m-3) when using the same subsets of available data (i.e., 100%, 75%, 50%). Our results provide support for OK as a technique to gap-fill spatial missing values of satellite-derived soil moisture products across the Midwest of the USA.

Download Full-text