Cooperative photometric redshift estimation

AbstractIn the modern galaxy surveys photometric redshifts play a central role in a broad range of studies, from gravitational lensing and dark matter distribution to galaxy evolution. Using a dataset of ~ 25,000 galaxies from the second data release of the Kilo Degree Survey (KiDS) we obtain photometric redshifts with five different methods: (i) Random forest, (ii) Multi Layer Perceptron with Quasi Newton Algorithm, (iii) Multi Layer Perceptron with an optimization network based on the Levenberg-Marquardt learning rule, (iv) the Bayesian Photometric Redshift model (or BPZ) and (v) a classical SED template fitting procedure (Le Phare). We show how SED fitting techniques could provide useful information on the galaxy spectral type which can be used to improve the capability of machine learning methods constraining systematic errors and reduce the occurrence of catastrophic outliers. We use such classification to train specialized regression estimators, by demonstrating that such hybrid approach, involving SED fitting and machine learning in a single collaborative framework, is capable to improve the overall prediction accuracy of photometric redshifts.

Download Full-text

ANNz2 - Photometric redshift and probability density function estimation using machine-learning

Proceedings of the International Astronomical Union ◽

10.1017/s1743921314010849 ◽

2014 ◽

Vol 10 (S306) ◽

pp. 316-318

Author(s):

Iftach Sadeh

Keyword(s):

Machine Learning ◽

Probability Density ◽

Function Estimation ◽

Photometric Redshifts ◽

Galaxy Surveys ◽

The Public ◽

Photometric Redshift ◽

Machine Learning Methods ◽

Density Function Estimation ◽

Full Probability

AbstractLarge photometric galaxy surveys allow the study of questions at the forefront of science, such as the nature of dark energy. The success of such surveys depends on the ability to measure the photometric redshifts of objects (photo-zs), based on limited spectral data. A new major version of the public photo-z estimation software, ANNz, is presented here. The new code incorporates several machine-learning methods, such as artificial neural networks and boosted decision/regression trees, which are all used in concert. The objective of the algorithm is to dynamically optimize the performance of the photo-z estimation, and to properly derive the associated uncertainties. In addition to single-value solutions, the new code also generates full probability density functions in two independent ways.

Download Full-text

Morpho-photometric redshifts

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz2477 ◽

2019 ◽

Vol 489 (4) ◽

pp. 4802-4808 ◽

Cited By ~ 2

Author(s):

Kristen Menou

Keyword(s):

Poor Performance ◽

Sloan Digital Sky Survey ◽

Gradient Boosting ◽

Learning Tools ◽

Multi Layer Perceptron ◽

Comparable Data ◽

Photometric Redshifts ◽

Data Set ◽

Photometric Redshift ◽

Sky Survey

ABSTRACT Machine learning (ML) is one of two standard approaches (together with SED fitting) for estimating the redshifts of galaxies when only photometric information is available. ML photo-z solutions have traditionally ignored the morphological information available in galaxy images or partly included it in the form of hand-crafted features, with mixed results. We train a morphology-aware photometric redshift machine using modern deep learning tools. It uses a custom architecture that jointly trains on galaxy fluxes, colours, and images. Galaxy-integrated quantities are fed to a Multi-Layer Perceptron (MLP) branch, while images are fed to a convolutional (convnet) branch that can learn relevant morphological features. This split MLP-convnet architecture, which aims to disentangle strong photometric features from comparatively weak morphological ones, proves important for strong performance: a regular convnet-only architecture, while exposed to all available photometric information in images, delivers comparatively poor performance. We present a cross-validated MLP-convnet model trained on 130 000 SDSS-DR12 (Sloan Digital Sky Survey – Data Release 12) galaxies that outperforms a hyperoptimized Gradient Boosting solution (hyperopt+XGBoost), as well as the equivalent MLP-only architecture, on the redshift bias metric. The fourfold cross-validated MLP-convnet model achieves a bias δz/(1 + z) = −0.70 ± 1 × 10−3, approaching the performance of a reference ANNZ2 ensemble of 100 distinct models trained on a comparable data set. The relative performance of the morphology-aware and morphology-blind models indicates that galaxy morphology does improve ML-based photometric redshift estimation.

Download Full-text

A lack of evolution in the very bright end of the galaxy luminosity function from z ≃ 8 to 10

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa313 ◽

2020 ◽

Vol 493 (2) ◽

pp. 2059-2084 ◽

Cited By ~ 10

Author(s):

R A A Bowler ◽

M J Jarvis ◽

J S Dunlop ◽

R J McLure ◽

D J McLeod ◽

...

Keyword(s):

Galaxy Evolution ◽

Luminosity Function ◽

Near Infrared ◽

Good Description ◽

Number Density ◽

Photometric Redshifts ◽

Star Forming ◽

Photometric Redshift ◽

Wide Range ◽

Best Fitting

ABSTRACT We utilize deep near-infrared survey data from the UltraVISTA fourth data release (DR4) and the VIDEO survey, in combination with overlapping optical and Spitzer data, to search for bright star-forming galaxies at z ≳ 7.5. Using a full photometric redshift fitting analysis applied to the ∼6 $\, {\rm deg}^2$ of imaging searched, we find 27 Lyman break galaxies (LBGs), including 20 new sources, with best-fitting photometric redshifts in the range 7.4 < z < 9.1. From this sample, we derive the rest-frame UV luminosity function at z = 8 and z = 9 out to extremely bright UV magnitudes (MUV ≃ −23) for the first time. We find an excess in the number density of bright galaxies in comparison to the typically assumed Schechter functional form derived from fainter samples. Combined with previous studies at lower redshift, our results show that there is little evolution in the number density of very bright (MUV ∼ −23) LBGs between z ≃ 5 and z ≃ 9. The tentative detection of an LBG with best-fitting photometric redshift of z = 10.9 ± 1.0 in our data is consistent with the derived evolution. We show that a double power-law fit with a brightening characteristic magnitude (ΔM*/Δz ≃ −0.5) and a steadily steepening bright-end slope (Δβ/Δz ≃ −0.5) provides a good description of the z > 5 data over a wide range in absolute UV magnitude (−23 < MUV < −17). We postulate that the observed evolution can be explained by a lack of mass quenching at very high redshifts in combination with increasing dust obscuration within the first ${\sim}1 \, {\rm Gyr}$ of galaxy evolution.

Download Full-text

Gaussian mixture models for blended photometric redshifts

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz2687 ◽

2019 ◽

Vol 490 (3) ◽

pp. 3966-3986 ◽

Cited By ~ 1

Author(s):

Daniel M Jones ◽

Alan F Heavens

Keyword(s):

Mixture Models ◽

Model Comparison ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Computationally Efficient ◽

Redshift Distribution ◽

Photometric Redshifts ◽

Galaxy Surveys ◽

Photometric Redshift ◽

Bayesian Model Comparison

ABSTRACT Future cosmological galaxy surveys such as the Large Synoptic Survey Telescope (LSST) will photometrically observe very large numbers of galaxies. Without spectroscopy, the redshifts required for the analysis of these data will need to be inferred using photometric redshift techniques that are scalable to large sample sizes. The high number density of sources will also mean that around half are blended. We present a Bayesian photometric redshift method for blended sources that uses Gaussian mixture models to learn the joint flux–redshift distribution from a set of unblended training galaxies, and Bayesian model comparison to infer the number of galaxies comprising a blended source. The use of Gaussian mixture models renders both of these applications computationally efficient and therefore suitable for upcoming galaxy surveys.

Download Full-text

Improved photometric redshifts with colour-constrained galaxy templates for future wide-area surveys

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa2100 ◽

2020 ◽

Vol 497 (2) ◽

pp. 1935-1945

Author(s):

Bomee Lee ◽

Ranga-Ram Chary

Keyword(s):

Galaxy Evolution ◽

Training Sample ◽

Spectral Energy ◽

Colour Space ◽

Photometric Redshifts ◽

Mid Infrared ◽

Photometric Redshift ◽

Template Library ◽

Galaxy Sample ◽

Best Fitting

ABSTRACT Cosmology and galaxy evolution studies with LSST, Euclid, and Roman, will require accurate redshifts for the detected galaxies. In this study, we present improved photometric redshift estimates for galaxies using a template library that populates three-colour space and is constrained by HST/CANDELS photometry. For the training sample, we use a sample of galaxies having photometric redshifts that allows us to train on a large, unbiased galaxy sample having deep, unconfused photometry at optical-to-mid infrared wavelengths. Galaxies in the training sample are assigned to cubes in 3D colour space, V − H, I − J, and z − H. We then derive the best-fitting spectral energy distributions of the training sample at the fixed CANDELS median photometric redshifts to construct the new template library for each individual colour cube (i.e. colour-cube-based template library). We derive photometric redshifts (photo-z) of our target galaxies using our new colour-cube-based template library and with photometry in only a limited set of bands, as expected for the aforementioned surveys. As a result, our method yields σNMAD of 0.026 and an outlier fraction of 6 per cent using only photometry in the LSST and Euclid/Roman bands. This is an improvement of ∼10 per cent on σNMAD and a reduction in outlier fraction of ∼13 per cent compared to other techniques. In particular, we improve the photo-z precision by about 30 per cent at 2 < z < 3. We also assess photo-z improvements by including K or mid-infrared bands to the ugrizYJH photometry. Our colour-cube-based template library is a powerful tool to constrain photometric redshifts for future large surveys.

Download Full-text

Euclid preparation

Astronomy and Astrophysics ◽

10.1051/0004-6361/202039403 ◽

2020 ◽

Vol 644 ◽

pp. A31

Author(s):

◽

G. Desprez ◽

S. Paltani ◽

J. Coupon ◽

I. Almosallam ◽

...

Keyword(s):

Machine Learning ◽

Probability Distributions ◽

Broad Band ◽

Color Space ◽

Ground Truth ◽

Validation Sample ◽

Photometric Redshifts ◽

Photometric Redshift ◽

Two Samples ◽

Template Fitting

Forthcoming large photometric surveys for cosmology require precise and accurate photometric redshift (photo-z) measurements for the success of their main science objectives. However, to date, no method has been able to produce photo-zs at the required accuracy using only the broad-band photometry that those surveys will provide. An assessment of the strengths and weaknesses of current methods is a crucial step in the eventual development of an approach to meet this challenge. We report on the performance of 13 photometric redshift code single value redshift estimates and redshift probability distributions (PDZs) on a common set of data, focusing particularly on the 0.2 − 2.6 redshift range that the Euclid mission will probe. We designed a challenge using emulated Euclid data drawn from three photometric surveys of the COSMOS field. The data was divided into two samples: one calibration sample for which photometry and redshifts were provided to the participants; and the validation sample, containing only the photometry to ensure a blinded test of the methods. Participants were invited to provide a redshift single value estimate and a PDZ for each source in the validation sample, along with a rejection flag that indicates the sources they consider unfit for use in cosmological analyses. The performance of each method was assessed through a set of informative metrics, using cross-matched spectroscopic and highly-accurate photometric redshifts as the ground truth. We show that the rejection criteria set by participants are efficient in removing strong outliers, that is to say sources for which the photo-z deviates by more than 0.15(1 + z) from the spectroscopic-redshift (spec-z). We also show that, while all methods are able to provide reliable single value estimates, several machine-learning methods do not manage to produce useful PDZs. We find that no machine-learning method provides good results in the regions of galaxy color-space that are sparsely populated by spectroscopic-redshifts, for example z > 1. However they generally perform better than template-fitting methods at low redshift (z < 0.7), indicating that template-fitting methods do not use all of the information contained in the photometry. We introduce metrics that quantify both photo-z precision and completeness of the samples (post-rejection), since both contribute to the final figure of merit of the science goals of the survey (e.g., cosmic shear from Euclid). Template-fitting methods provide the best results in these metrics, but we show that a combination of template-fitting results and machine-learning results with rejection criteria can outperform any individual method. On this basis, we argue that further work in identifying how to best select between machine-learning and template-fitting approaches for each individual galaxy should be pursued as a priority.

Download Full-text

Hybrid Approach to Sentiment Analysis based on Syntactic Analy- sis and Machine Learning

Language and Information ◽

10.29403/li.14.2.9 ◽

2010 ◽

Vol 14 (2) ◽

pp. 159-181

Author(s):

MUNPYO HONG ◽

MIYOUNG SHIN ◽

Shinhye Park ◽

Hyungmin Lee

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Hybrid Approach

Download Full-text

Recommendation Systems for Education: Systematic Review

Electronics ◽

10.3390/electronics10141611 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1611

Author(s):

María Cora Urdaneta-Ponte ◽

Amaia Mendez-Zorrilla ◽

Ibon Oleagordia-Ruiz

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Formal Education ◽

Research Work ◽

Hybrid Approach ◽

Recommendation Systems ◽

Educational Resources ◽

Future Research ◽

Collaborative Approach ◽

Developmental Approach

Recommendation systems have emerged as a response to overload in terms of increased amounts of information online, which has become a problem for users regarding the time spent on their search and the amount of information retrieved by it. In the field of recommendation systems in education, the relevance of recommended educational resources will improve the student’s learning process, and hence the importance of being able to suitably and reliably ensure relevant, useful information. The purpose of this systematic review is to analyze the work undertaken on recommendation systems that support educational practices with a view to acquiring information related to the type of education and areas dealt with, the developmental approach used, and the elements recommended, as well as being able to detect any gaps in this area for future research work. A systematic review was carried out that included 98 articles from a total of 2937 found in main databases (IEEE, ACM, Scopus and WoS), about which it was able to be established that most are geared towards recommending educational resources for users of formal education, in which the main approaches used in recommendation systems are the collaborative approach, the content-based approach, and the hybrid approach, with a tendency to use machine learning in the last two years. Finally, possible future areas of research and development in this field are presented.

Download Full-text

The Classification of Medicinal Plant Leaves Based on Multispectral and Texture Feature Using Machine Learning Approach

Agronomy ◽

10.3390/agronomy11020263 ◽

2021 ◽

Vol 11 (2) ◽

pp. 263

Author(s):

Samreen Naeem ◽

Aqib Ali ◽

Christophe Chesneau ◽

Muhammad H. Tahir ◽

Farrukh Jamal ◽

...

Keyword(s):

Machine Learning ◽

Medicinal Plant ◽

Texture Feature ◽

Stevia Rebaudiana ◽

Ocimum Sanctum ◽

Multi Layer Perceptron ◽

Plant Leaves ◽

Chi Square ◽

Lemon Balm

This study proposes the machine learning based classification of medical plant leaves. The total six varieties of medicinal plant leaves-based dataset are collected from the Department of Agriculture, The Islamia University of Bahawalpur, Pakistan. These plants are commonly named in English as (herbal) Tulsi, Peppermint, Bael, Lemon balm, Catnip, and Stevia and scientifically named in Latin as Ocimum sanctum, Mentha balsamea, Aegle marmelos, Melissa officinalis, Nepeta cataria, and Stevia rebaudiana, respectively. The multispectral and digital image dataset are collected via a computer vision laboratory setup. For the preprocessing step, we crop the region of the leaf and transform it into a gray level format. Secondly, we perform a seed intensity-based edge/line detection utilizing Sobel filter and draw five regions of observations. A total of 65 fused features dataset is extracted, being a combination of texture, run-length matrix, and multi-spectral features. For the feature optimization process, we employ a chi-square feature selection approach and select 14 optimized features. Finally, five machine learning classifiers named as a multi-layer perceptron, logit-boost, bagging, random forest, and simple logistic are deployed on an optimized medicinal plant leaves dataset, and it is observed that the multi-layer perceptron classifier shows a relatively promising accuracy of 99.01% as compared to the competition. The distinct classification accuracy by the multi-layer perceptron classifier on six medicinal plant leaves are 99.10% for Tulsi, 99.80% for Peppermint, 98.40% for Bael, 99.90% for Lemon balm, 98.40% for Catnip, and 99.20% for Stevia.

Download Full-text

New Hybrid Approach for Developing Automated Machine Learning Workflows: A Real Case Application in Evaluation of Marcellus Shale Gas Production

Fuels ◽

10.3390/fuels2030017 ◽

2021 ◽

Vol 2 (3) ◽

pp. 286-303

Author(s):

Vuong Van Pham ◽

Ebrahim Fathi ◽

Fatemeh Belyadi

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Shale Gas ◽

Field Data ◽

Marcellus Shale ◽

Hybrid Approach ◽

Gas Production ◽

Bayesian Optimization ◽

Real Field ◽

Engineering Problems

The success of machine learning (ML) techniques implemented in different industries heavily rely on operator expertise and domain knowledge, which is used in manually choosing an algorithm and setting up the specific algorithm parameters for a problem. Due to the manual nature of model selection and parameter tuning, it is impossible to quantify or evaluate the quality of this manual process, which in turn limits the ability to perform comparison studies between different algorithms. In this study, we propose a new hybrid approach for developing machine learning workflows to help automated algorithm selection and hyperparameter optimization. The proposed approach provides a robust, reproducible, and unbiased workflow that can be quantified and validated using different scoring metrics. We have used the most common workflows implemented in the application of artificial intelligence (AI) and ML in engineering problems including grid/random search, Bayesian search and optimization, genetic programming, and compared that with our new hybrid approach that includes the integration of Tree-based Pipeline Optimization Tool (TPOT) and Bayesian optimization. The performance of each workflow is quantified using different scoring metrics such as Pearson correlation (i.e., R2 correlation) and Mean Square Error (i.e., MSE). For this purpose, actual field data obtained from 1567 gas wells in Marcellus Shale, with 121 features from reservoir, drilling, completion, stimulation, and operation is tested using different proposed workflows. A proposed new hybrid workflow is then used to evaluate the type well used for evaluation of Marcellus shale gas production. In conclusion, our automated hybrid approach showed significant improvement in comparison to other proposed workflows using both scoring matrices. The new hybrid approach provides a practical tool that supports the automated model and hyperparameter selection, which is tested using real field data that can be implemented in solving different engineering problems using artificial intelligence and machine learning. The new hybrid model is tested in a real field and compared with conventional type wells developed by field engineers. It is found that the type well of the field is very close to P50 predictions of the field, which shows great success in the completion design of the field performed by field engineers. It also shows that the field average production could have been improved by 8% if shorter cluster spacing and higher proppant loading per cluster were used during the frac jobs.

Download Full-text