Nearest neighbor density ratio estimation for large-scale applications in astronomy

With the explosive growth of the data volume in modern applications such as web search and multimedia retrieval, hashing is becoming increasingly important for efficient nearest neighbor (similar item) search. Recently, a number of data-dependent methods have been developed, reflecting the great potential of learning for hashing. Inspired by the classic nonlinear dimensionality reduction algorithm—maximum variance unfolding, we propose a novel unsupervised hashing method, named maximum variance hashing, in this work. The idea is to maximize the total variance of the hash codes while preserving the local structure of the training data. To solve the derived optimization problem, we propose a column generation algorithm, which directly learns the binary-valued hash functions. We then extend it using anchor graphs to reduce the computational cost. Experiments on large-scale image datasets demonstrate that the proposed method outperforms state-of-the-art hashing methods in many cases.

Download Full-text

Unsupervised Recycled FPGA Detection Based on Direct Density Ratio Estimation

2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design (IOLTS) ◽

10.1109/iolts52814.2021.9486698 ◽

2021 ◽

Author(s):

Yuya Isaka ◽

Foisal Ahmed ◽

Michihiro Shintani ◽

Michiko Inoue

Keyword(s):

Density Ratio ◽

Ratio Estimation

Download Full-text

Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning

F1000Research ◽

10.12688/f1000research.14048.1 ◽

2018 ◽

Vol 7 ◽

pp. 233

Author(s):

Jonathan Z.L. Zhao ◽

Eliseos J. Mucaki ◽

Peter K. Rogan

Keyword(s):

Machine Learning ◽

Ionizing Radiation ◽

Radiation Exposure ◽

Large Scale ◽

Nearest Neighbor ◽

Error Rates ◽

Support Vector ◽

Dose Estimation ◽

Gene Signatures ◽

Ionizing Radiation Exposure

Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% (DDB2, PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% (DDB2, CD8A, TALDO1, PCNA, EIF4G2, LCN2, CDKN1A, PRKCH, ENO1, and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.

Download Full-text

Deep Learning Triplet Ordinal Relation Preserving Binary Code for Remote Sensing Image Retrieval Task

Remote Sensing ◽

10.3390/rs13234786 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4786

Author(s):

Zhen Wang ◽

Nannan Wu ◽

Xiaohan Yang ◽

Bingqi Yan ◽

Pingping Liu

Keyword(s):

Remote Sensing ◽

Image Retrieval ◽

Large Scale ◽

Nearest Neighbor ◽

Binary Code ◽

Satellite Observation ◽

Cross Entropy ◽

Retrieval Task ◽

Search Tasks ◽

Low Dimensional

As satellite observation technology rapidly develops, the number of remote sensing (RS) images dramatically increases, and this leads RS image retrieval tasks to be more challenging in terms of speed and accuracy. Recently, an increasing number of researchers have turned their attention to this issue, as well as hashing algorithms, which map real-valued data onto a low-dimensional Hamming space and have been widely utilized to respond quickly to large-scale RS image search tasks. However, most existing hashing algorithms only emphasize preserving point-wise or pair-wise similarity, which may lead to an inferior approximate nearest neighbor (ANN) search result. To fix this problem, we propose a novel triplet ordinal cross entropy hashing (TOCEH). In TOCEH, to enhance the ability of preserving the ranking orders in different spaces, we establish a tensor graph representing the Euclidean triplet ordinal relationship among RS images and minimize the cross entropy between the probability distribution of the established Euclidean similarity graph and that of the Hamming triplet ordinal relation with the given binary code. During the training process, to avoid the non-deterministic polynomial (NP) hard problem, we utilize a continuous function instead of the discrete encoding process. Furthermore, we design a quantization objective function based on the principle of preserving triplet ordinal relation to minimize the loss caused by the continuous relaxation procedure. The comparative RS image retrieval experiments are conducted on three publicly available datasets, including UC Merced Land Use Dataset (UCMD), SAT-4 and SAT-6. The experimental results show that the proposed TOCEH algorithm outperforms many existing hashing algorithms in RS image retrieval tasks.

Download Full-text

An experimental investigation of the effects of compressibility on a turbulent reacting mixing layer

Journal of Fluid Mechanics ◽

10.1017/s002211209700791x ◽

1998 ◽

Vol 356 ◽

pp. 25-64 ◽

Cited By ~ 26

Author(s):

M. F. MILLER ◽

C. T. BOWMAN ◽

M. G. MUNGAL

Keyword(s):

High Speed ◽

Large Scale ◽

Structural Changes ◽

Density Ratio ◽

Mixing Layer ◽

Mixing Layers ◽

Unexpected Result ◽

Entrainment Ratio ◽

Plan View ◽

Compressible Mixing Layer

Experiments were conducted to investigate the effect of compressibility on turbulent reacting mixing layers with moderate heat release. Side- and plan-view visualizations of the reacting mixing layers, which were formed between a high-speed high-temperature vitiated-air stream and a low-speed ambient-temperature hydrogen stream, were obtained using a combined OH/acetone planar laser-induced fluorescence imaging technique. The instantaneous images of OH provide two-dimensional maps of the regions of combustion, and similar images of acetone, which was seeded into the fuel stream, provide maps of the regions of unburned fuel. Two low-compressibility (Mc=0.32, 0.35) reacting mixing layers with differing density ratios and one high-compressibility (Mc=0.70) reacting mixing layer were studied. Higher average acetone signals were measured in the compressible mixing layer than in its low-compressibility counterpart (i.e. same density ratio), indicating a lower entrainment ratio. Additionally, the compressible mixing layer had slightly wider regions of OH and 50% higher OH signals, which was an unexpected result since lowering the entrainment ratio had the opposite effect at low compressibilities. The large-scale structural changes induced by compressibility are believed to be primarily responsible for the difference in the behaviour of the high- and low-compressibility reacting mixing layers. It is proposed that the coexistence of broad regions of OH and high acetone signals is a manifestation of a more biased distribution of mixture compositions in the compressible mixing layer. Other mechanisms through which compressibility can affect the combustion are discussed.

Download Full-text

Inlier-Based Outlier Detection via Direct Density Ratio Estimation

2008 Eighth IEEE International Conference on Data Mining ◽

10.1109/icdm.2008.49 ◽

2008 ◽

Cited By ~ 32

Author(s):

Shohei Hido ◽

Yuta Tsuboi ◽

Hisashi Kashima ◽

Masashi Sugiyama ◽

Takafumi Kanamori

Keyword(s):

Outlier Detection ◽

Density Ratio ◽

Ratio Estimation

Download Full-text

Interactions of Film Cooling Rows: Effects of Hole Geometry and Row Spacing on the Cooling Performance Downstream of the Second Row of Holes

Volume 5: Turbo Expo 2003, Parts A and B ◽

10.1115/gt2003-38195 ◽

2003 ◽

Cited By ~ 5

Author(s):

Christian Saumweber ◽

Achmed Schulz

Keyword(s):

Boundary Layer ◽

Film Cooling ◽

Large Scale ◽

Density Ratio ◽

Operating Conditions ◽

Transfer Coefficients ◽

Film Cooling Effectiveness ◽

Heat Transfer Coefficients ◽

Staggered Arrangement ◽

Film Cooling Holes

A comprehensive set of generic experiments is conducted to investigate the interaction of film cooling rows. Five different film cooling configurations are considered on a large scale basis each consisting of two rows of film cooling holes in staggered arrangement. The hole pitch to diameter ratio within each row is kept constant at P/D = 4. The spacing between the rows is either x/D = 10, 20, or 30. Fanshaped holes or simple cylindrical holes with an inclination angle of 30 deg. and a hole length of 6 hole diameters are used. With a hot gas Mach number of Mam = 0.3, an engine like density ratio of ρc/ρm = 1.75, and a freestream turbulence intensity of Tu = 5.1% are established. Operating conditions are varied in terms of blowing ratio for the upstream and, independently, the downstream row in the range 0.5<M<2.0. The results illustrate the importance of considering ejection into an already film cooled boundary layer. Adiabatic film cooling effectiveness and heat transfer coefficients are significantly increased. The decay of effectiveness with streamwise distance is much less pronounced downstream of the second row primarily due to pre-cooling of the boundary layer by the first row of holes. Additionally, a comparison of measured effectiveness data with predictions according to the widely used superposition model of Sellers [11] is given for two rows of fanshaped holes.

Download Full-text

Region-Based Graph Learning towards Large Scale Image Annotation

Graph-Based Methods in Computer Vision ◽

10.4018/978-1-4666-1891-6.ch013 ◽

2012 ◽

pp. 244-260

Author(s):

Bao Bing-Kun ◽

Yan Shuicheng

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Image Annotation ◽

Learning Algorithm ◽

Label Propagation ◽

Locality Sensitive Hashing ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph ◽

Modeling Data

Graph-based learning provides a useful approach for modeling data in image annotation problems. In this chapter, the authors introduce how to construct a region-based graph to annotate large scale multi-label images. It has been well recognized that analysis in semantic region level may greatly improve image annotation performance compared to that in whole image level. However, the region level approach increases the data scale to several orders of magnitude and lays down new challenges to most existing algorithms. To this end, each image is firstly encoded as a Bag-of-Regions based on multiple image segmentations. And then, all image regions are constructed into a large k-nearest-neighbor graph with efficient Locality Sensitive Hashing (LSH) method. At last, a sparse and region-aware image-based graph is fed into the multi-label extension of the Entropic graph regularized semi-supervised learning algorithm (Subramanya & Bilmes, 2009). In combination they naturally yield the capability in handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets well validate the effectiveness and efficiency of the framework for region-aware and scalable multi-label propagation.

Download Full-text