SHALOS: StatisticalHerschel-ATLAS lensed objects selection

Context. The statistical analysis of large sample of strong lensing events can be a powerful tool to extract astrophysical or cosmological valuable information. Their selection using submillimetre galaxies has been demonstrated to be very effective with more than ∼200 proposed candidates in the case ofHerschel-ATLAS data and several tens in the case of the South Pole Telescope. However, the number of confirmed events is still relatively low, i.e. a few tens, mostly because of the lengthy observational validation process on individual events.Aims. In this work we propose a new methodology with a statistical selection approach to increase by a factor of ∼5 the number of such events within theHerschel-ATLAS data set. Although the methodology can be applied to address several selection problems, it has particular benefits in the case of the identification of strongly lensed galaxies: objectivity, minimal initial constrains in the main parameter space, and preservation of statistical properties.Methods. The proposed methodology is based on the Bhattacharyya distance as a measure of the similarity between probability distributions of properties of two different cross-matched galaxies. The particular implementation for the aim of this work is called SHALOS and it combines the information of four different properties of the pair of galaxies: angular separation, luminosity percentile, redshift, and the ratio of the optical to the submillimetre flux densities.Results. The SHALOS method provides a ranked list of strongly lensed galaxies. The number of candidates within ∼340 deg2of theHerschel-ATLAS surveyed area for the final associated probability,Ptot > 0.7, is 447 and they have an estimated mean amplification factor of 3.12 for a halo with a typical cluster mass. Additional statistical properties of the SHALOS candidates, as the correlation function or the source number counts, are in agreement with previous results indicating the statistical lensing nature of the selected sample.

Download Full-text

An approach for document retrieval using cluster-based inverted indexing

Journal of Information Science ◽

10.1177/01655515211018401 ◽

2021 ◽

pp. 016555152110184

Author(s):

Gunjan Chandwani ◽

Anil Ahlawat ◽

Gaurav Dubey

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

Pearson Correlation ◽

Relevant Information ◽

Document Retrieval ◽

Bhattacharyya Distance ◽

Data Set ◽

Query Matching ◽

Inverted Indexing ◽

Query Optimisation

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.

Download Full-text

Quantifying the structure of strong gravitational lens potentials with uncertainty-aware deep neural networks

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3201 ◽

2020 ◽

Vol 499 (4) ◽

pp. 5641-5652

Author(s):

Georgios Vernardos ◽

Grigorios Tsagkatakis ◽

Yannis Pantazis

Keyword(s):

Confidence Intervals ◽

Galaxy Evolution ◽

Gravitational Lensing ◽

Probability Distributions ◽

Mass Density ◽

Ground Truth ◽

Gaussian Random Fields ◽

Training Data ◽

Gravitational Lens ◽

Data Set

ABSTRACT Gravitational lensing is a powerful tool for constraining substructure in the mass distribution of galaxies, be it from the presence of dark matter sub-haloes or due to physical mechanisms affecting the baryons throughout galaxy evolution. Such substructure is hard to model and is either ignored by traditional, smooth modelling, approaches, or treated as well-localized massive perturbers. In this work, we propose a deep learning approach to quantify the statistical properties of such perturbations directly from images, where only the extended lensed source features within a mask are considered, without the need of any lens modelling. Our training data consist of mock lensed images assuming perturbing Gaussian Random Fields permeating the smooth overall lens potential, and, for the first time, using images of real galaxies as the lensed source. We employ a novel deep neural network that can handle arbitrary uncertainty intervals associated with the training data set labels as input, provides probability distributions as output, and adopts a composite loss function. The method succeeds not only in accurately estimating the actual parameter values, but also reduces the predicted confidence intervals by 10 per cent in an unsupervised manner, i.e. without having access to the actual ground truth values. Our results are invariant to the inherent degeneracy between mass perturbations in the lens and complex brightness profiles for the source. Hence, we can quantitatively and robustly quantify the smoothness of the mass density of thousands of lenses, including confidence intervals, and provide a consistent ranking for follow-up science.

Download Full-text

Utilizarea teoriei valorilor extreme în climatologie

Starea actuală a componentelor de mediu ◽

10.53380/9789975315593.17 ◽

2019 ◽

Author(s):

Valentin Raileanu ◽

Keyword(s):

Maximum Likelihood ◽

Extreme Values ◽

Probability Distributions ◽

Simulated Data ◽

Likelihood Estimation ◽

R Software ◽

Data Set ◽

Data Format ◽

Generalized Pareto ◽

Distribution Parameters

The article briefly describes the history and fields of application of the theory of extreme values, including climatology. The data format, the Generalized Extreme Value (GEV) probability distributions with Bock Maxima, the Generalized Pareto (GP) distributions with Point of Threshold (POT) and the analysis methods are presented. Estimating the distribution parameters is done using the Maximum Likelihood Estimation (MLE) method. Free R software installation, the minimum set of required commands and the GUI in2extRemes graphical package are described. As an example, the results of the GEV analysis of a simulated data set in in2extRemes are presented.

Download Full-text

Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates

10.1101/011767 ◽

2014 ◽

Author(s):

Andreas Tuerk ◽

Gregor Wiktorin ◽

Serhat Güler

Keyword(s):

Probability Distributions ◽

False Positive Rate ◽

Synthetic Data ◽

True Positive Rate ◽

Rna Seq ◽

Microarray Quality Control ◽

Data Set ◽

Rna Transcripts ◽

Positive Rate ◽

Fragment Distribution

Quantification of RNA transcripts with RNA-Seq is inaccurate due to positional fragment bias, which is not represented appropriately by current statistical models of RNA-Seq data. This article introduces the Mix2(rd. "mixquare") model, which uses a mixture of probability distributions to model the transcript specific positional fragment bias. The parameters of the Mix2model can be efficiently trained with the Expectation Maximization (EM) algorithm resulting in simultaneous estimates of the transcript abundances and transcript specific positional biases. Experiments are conducted on synthetic data and the Universal Human Reference (UHR) and Brain (HBR) sample from the Microarray quality control (MAQC) data set. Comparing the correlation between qPCR and FPKM values to state-of-the-art methods Cufflinks and PennSeq we obtain an increase in R2value from 0.44 to 0.6 and from 0.34 to 0.54. In the detection of differential expression between UHR and HBR the true positive rate increases from 0.44 to 0.71 at a false positive rate of 0.1. Finally, the Mix2model is used to investigate biases present in the MAQC data. This reveals 5 dominant biases which deviate from the common assumption of a uniform fragment distribution. The Mix2software is available at http://www.lexogen.com/fileadmin/uploads/bioinfo/mix2model.tgz.

Download Full-text

STATISTICAL PROPERTIES OF THE MAXIMUM RUN OF IRREGULAR SEA WAVES

Coastal Engineering Proceedings ◽

10.9753/icce.v21.48 ◽

1988 ◽

Vol 1 (21) ◽

pp. 48 ◽

Cited By ~ 1

Author(s):

Akira Kimura

Keyword(s):

Probability Distribution ◽

Wave Height ◽

Probability Distributions ◽

Confidence Regions ◽

Statistical Properties ◽

Irregular Waves ◽

Sea Waves ◽

Sea State ◽

Irregular Wave ◽

Distribution Of The Maximum

The probability distribution of the maximum run of irregular wave height is introduced theoretically. Probability distributions for the 2nd maximum, 3rd maximum and further maximum runs are also introduced. Their statistical properties, including the means and their confidence regions, are applied to the verification of experiments with irregular waves in the realization of a "severe sea state" in the test.

Download Full-text

Data set on wind speed, wind direction and wind probability distributions in Puerto Bolivar - Colombia

Data in Brief ◽

10.1016/j.dib.2019.104753 ◽

2019 ◽

Vol 27 ◽

pp. 104753 ◽

Cited By ~ 4

Author(s):

Guillermo Valencia Ochoa ◽

José Núñez Alvarez ◽

Marley Vanegas Chamorro

Keyword(s):

Wind Speed ◽

Wind Direction ◽

Probability Distributions ◽

Data Set

Download Full-text

Faint Galaxy Number-Counts

Symposium - International Astronomical Union ◽

10.1017/s0074180900232403 ◽

1996 ◽

Vol 171 ◽

pp. 225-228

Author(s):

N. Metcalfe ◽

T. Shanks ◽

R. Fong ◽

J. Gardner ◽

N. Roche

Keyword(s):

Wavelength Range ◽

Evolutionary History ◽

Statistical Properties ◽

Normal Galaxies ◽

Number Magnitude ◽

Faint Galaxy ◽

Wide Wavelength Range ◽

History Of ◽

Number Counts ◽

Additional Constraints

Observers studying the cosmology and evolutionary history of our Universe through the statistical properties of ‘normal’ galaxies have four main tools at their disposal. (1) The number-redshift relation. Although a very powerful diagnostic, spectroscopic surveys are currently limited to B < 24m and significantly incomplete in the range, 23m< B < 24m. (2) Galaxy number-magnitude counts. Although by themselves, they cannot constrain models as tightly as spectroscopy, they can be measured ∼ 4m fainter, where cosmological effects are expected to be significant. (3) Galaxy colours over a wide wavelength range, which provide additional constraints. (4) The dependence of galaxy clustering with magnitude. ω(θ) can be measured to the limit of the counts.Here we report on the latest Durham count and clustering work.

Download Full-text

A Generalised Exponential-Lindley Mixture of Poisson Distribution

Nepalese Journal of Statistics ◽

10.3126/njs.v3i0.25575 ◽

2019 ◽

Vol 3 ◽

pp. 11-20

Author(s):

Binod Kumar Sah ◽

A. Mishra

Keyword(s):

Poisson Distribution ◽

Negative Binomial ◽

Probability Distributions ◽

Mixture Distribution ◽

Discrete Data ◽

Likelihood Method ◽

P Value ◽

Data Set ◽

Lindley Distribution ◽

First Four Moments

Background: The exponential and the Lindley (1958) distributions occupy central places among the class of continuous probability distributions and play important roles in statistical theory. A Generalised Exponential-Lindley Distribution (GELD) was given by Mishra and Sah (2015) of which, both the exponential and the Lindley distributions are the particular cases. Mixtures of distributions form an important class of distributions in the domain of probability distributions. A mixture distribution arises when some or all the parameters in a probability function vary according to certain probability law. In this paper, a Generalised Exponential- Lindley Mixture of Poisson Distribution (GELMPD) has been obtained by mixing Poisson distribution with the GELD. Materials and Methods: It is based on the concept of the generalisations of some continuous mixtures of Poisson distribution. Results: The Probability mass of function of generalized exponential-Lindley mixture of Poisson distribution has been obtained by mixing Poisson distribution with GELD. The first four moments about origin of this distribution have been obtained. The estimation of its parameters has been discussed using method of moments and also as maximum likelihood method. This distribution has been fitted to a number of discrete data-sets which are negative binomial in nature and it has been observed that the distribution gives a better fit than the Poisson–Lindley Distribution (PLD) of Sankaran (1970). Conclusion: P-value of the GELMPD is found greater than that in case of PLD. Hence, it is expected to be a better alternative to the PLD of Sankaran for similar type of discrete data-set which is negative binomial in nature.

Download Full-text

Multi-wavelength de-blended Herschel view of the statistical properties of dusty star-forming galaxies across cosmic time

Astronomy and Astrophysics ◽

10.1051/0004-6361/201834093 ◽

2019 ◽

Vol 624 ◽

pp. A98 ◽

Cited By ~ 4

Author(s):

L. Wang ◽

W. J. Pearson ◽

W. Cowley ◽

J. W. Trayford ◽

M. Béthermin ◽

...

Keyword(s):

Star Formation ◽

Dynamic Range ◽

Transition Period ◽

Formation Rate ◽

Cosmic Time ◽

Far Infrared ◽

Statistical Properties ◽

Star Forming ◽

Multi Wavelength ◽

Number Counts

Aims. We study the statistical properties of dusty star-forming galaxies across cosmic time, such as their number counts, luminosity functions (LF), and the dust-obscured star formation rate density (SFRD). Methods. We used the most recent de-blended Herschel catalogue in the COSMOS field to measure the number counts and LFs at far-infrared (FIR) and sub-millimetre (sub-mm) wavelengths. The de-blended catalogue was generated by combining the Bayesian source extraction tool XID+ and an informative prior derived from the associated deep multi-wavelength photometric data. Results. Through our de-confusion technique and based on the deep multi-wavelength photometric information, we are able to achieve more accurate measurements while at the same time probing roughly ten times below the Herschel confusion limit. Our number counts at 250 μm agree well with previous Herschel studies. However, our counts at 350 and 500 μm are below previous Herschel results because previous Herschel studies suffered from source confusion and blending issues. Our number counts at 450 and 870 μm show excellent agreement with previous determinations derived from single-dish and interferometric observations. Our measurements of the LF at 250 μm and the total IR LF agree well with previous results in the overlapping redshift and luminosity range. The increased dynamic range of our measurements allows us to better measure the faint-end of the LF and measure the dust-obscured SFRD out to z ∼ 6. We find that the fraction of obscured star formation activity is at its highest (>80%) around z ∼ 1. We do not find a shift of balance between z ∼ 3 and z ∼ 4 in the SFRD from being dominated by unobscured star formation at higher redshift to obscured star formation at lower redshift. However, we do find 3 < z < 4 to be an interesting transition period as the portion of the total SFRD that is obscured by dust is significantly lower at higher redshifts.

Download Full-text

Calibration of NSRP Models from Extreme Value Distributions

Hydrology ◽

10.3390/hydrology6040089 ◽

2019 ◽

Vol 6 (4) ◽

pp. 89 ◽

Cited By ~ 1

Author(s):

De Luca ◽

Galasso

Keyword(s):

High Resolution ◽

Probability Distributions ◽

Mathematical Expression ◽

Extreme Value ◽

Extreme Value Distributions ◽

Data Set ◽

Maximum Rainfall ◽

High Resolution Data ◽

Frequency Distributions ◽

Annual Maximum Rainfall

In this work, the authors investigated the feasibility of calibrating a model which is suitable for the generation of continuous high-resolution rainfall series, by using only data from annual maximum rainfall (AMR) series, which are usually longer than continuous high-resolution data, or they are the unique available data set for many locations. In detail, the basic version of the Neyman–Scott Rectangular Pulses (NSRP) model was considered, and numerical experiments were carried out, in order to analyze which parameters can mostly influence the extreme value frequency distributions, and whether heavy rainfall reproduction can be improved with respect to the usual calibration with continuous data. The obtained results were highly promising, as the authors found acceptable relationships among extreme value distributions and statistical properties of intensity and duration for the pulses. Moreover, the proposed procedure is flexible, and it is clearly applicable for a generic rainfall generator, in which probability distributions and shape of the pulses, and extreme value distributions can assume any mathematical expression.

Download Full-text