SpArcFiRe: Enhancing Spiral Galaxy Recognition Using Arm Analysis and Random Forests

Automated quantification of galaxy morphology is necessary because the size of upcoming sky surveys will overwhelm human volunteers. Existing classification schemes are inadequate because (a) their uncertainty increases near the boundary of classes and astronomers need more control over these uncertainties; (b) galaxy morphology is continuous rather than discrete; and (c) sometimes we need to know not only the type of an object, but whether a particular image of the object exhibits visible structure. We propose that regression is better suited to these tasks than classification, and focus specifically on determining the extent to which an image of a spiral galaxy exhibits visible spiral structure. We use the human vote distributions from Galaxy Zoo 1 (GZ1) to train a random forest of decision trees to reproduce the fraction of GZ1 humans who vote for the “Spiral” class. We prefer the random forest model over other black box models like neural networks because it allows us to trace post hoc the precise reasoning behind the regression of each image. Finally, we demonstrate that using features from SpArcFiRe—a code designed to isolate and quantify arm structure in spiral galaxies—improves regression results over and above using traditional features alone, across a sample of 470,000 galaxies from the Sloan Digital Sky Survey.

Download Full-text

Enhancing Automatic Prediction of Spirality using SpArcFiRe's Spiral Arm Analysis and Random Forests

10.20944/preprints201806.0279.v1 ◽

2018 ◽

Author(s):

Pedro Silva ◽

Leon T. Cao ◽

Wayne B. Hayes

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Box Models ◽

Machine Learning Model ◽

Human Volunteers ◽

Post Hoc ◽

Black Box Models ◽

Spiral Arm

Automated machine classifications of galaxies are necessary because the size of upcoming surveys will overwhelm human volunteers. We improve upon existing machine classification methods by adding the output of SpArcFiRe to the inputs of a machine learning model. We use the human classifications from Galaxy Zoo 1 (GZ1) to train a random forest of decision trees to reproduce the human vote distributions of the Spiral class. We prefer the random forest model over other black box models like neural networks because it allows us to trace post hoc the precise reasoning behind the classification of each galaxy. We find that, across a sample of 470,000 Sloan galaxies that are large enough that details could be seen if they were there, the combination of SpArcFiRe outputs with existing SDSS features provides a better machine classification than either one alone on comparison to Galaxy Zoo 1. We suggest that adding SpArcFiRe outputs as features to any machine learning algorithm will likely improve its performance.

Download Full-text

SpArcFiRe: Enhancing Spiral Galaxy Recognition using Arm Analysis and Random Forests

10.20944/preprints201806.0279.v2 ◽

2018 ◽

Author(s):

Pedro Silva ◽

Leon T. Cao ◽

Wayne B. Hayes

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Black Box ◽

Box Models ◽

Machine Learning Model ◽

Human Volunteers ◽

Post Hoc ◽

Black Box Models

Download Full-text

Surface Photometry of Spiral Galaxy NGC 5005 and Elliptical Galaxy NGC 4278

Baghdad Science Journal ◽

10.21123/bsj.15.3.314-323 ◽

2018 ◽

Vol 15 (3) ◽

pp. 314-323

Author(s):

Baghdad Science Journal

Keyword(s):

Spiral Galaxy ◽

Surface Brightness ◽

Elliptical Galaxy ◽

Position Angle ◽

Sloan Digital Sky Survey ◽

Surface Photometry ◽

Contour Maps ◽

Sky Survey ◽

Flat Field ◽

The Galaxy

Two galaxies have been chosen, spiral galaxy NGC 5005 and elliptical galaxy NGC 4278 to study their photometric properties by using surface photometric techniques with griz-Filters. Observations are obtained from the Sloan Digital Sky Survey (SDSS). The data reduction of all images have done, like bias and flat field, by SDSS pipeline. The overall structure of the two galaxies (a bulge, a disk), together with isophotal contour maps, surface brightness profiles and a bulge/disk decomposition of the galaxy images were performed, although the disk position angle, ellipticity and inclination of the galaxies have been estimated.

Download Full-text

Deep learning approach for classifying, detecting and predicting photometric redshifts of quasars in the Sloan Digital Sky Survey stripe 82

Astronomy and Astrophysics ◽

10.1051/0004-6361/201731106 ◽

2018 ◽

Vol 611 ◽

pp. A97 ◽

Cited By ~ 11

Author(s):

J. Pasquet-Itam ◽

J. Pasquet

Keyword(s):

Random Forest ◽

Random Forest Classifier ◽

Sloan Digital Sky Survey ◽

Light Curves ◽

Support Vector ◽

Learning Approach ◽

Photometric Redshifts ◽

K Nearest Neighbors ◽

Sky Survey ◽

Extraction Step

We have applied a convolutional neural network (CNN) to classify and detect quasars in the Sloan Digital Sky Survey Stripe 82 and also to predict the photometric redshifts of quasars. The network takes the variability of objects into account by converting light curves into images. The width of the images, noted w, corresponds to the five magnitudes ugriz and the height of the images, noted h, represents the date of the observation. The CNN provides good results since its precision is 0.988 for a recall of 0.90, compared to a precision of 0.985 for the same recall with a random forest classifier. Moreover 175 new quasar candidates are found with the CNN considering a fixed recall of 0.97. The combination of probabilities given by the CNN and the random forest makes good performance even better with a precision of 0.99 for a recall of 0.90. For the redshift predictions, the CNN presents excellent results which are higher than those obtained with a feature extraction step and different classifiers (a K-nearest-neighbors, a support vector machine, a random forest and a Gaussian process classifier). Indeed, the accuracy of the CNN within |Δz| < 0.1 can reach 78.09%, within |Δz| < 0.2 reaches 86.15%, within |Δz| < 0.3 reaches 91.2% and the value of root mean square (rms) is 0.359. The performance of the KNN decreases for the three |Δz| regions, since within the accuracy of |Δz| < 0.1, |Δz| < 0.2, and |Δz| < 0.3 is 73.72%, 82.46%, and 90.09% respectively, and the value of rms amounts to 0.395. So the CNN successfully reduces the dispersion and the catastrophic redshifts of quasars. This new method is very promising for the future of big databases such as the Large Synoptic Survey Telescope.

Download Full-text

The oxygen abundance in the H II regions of the spiral galaxy M101 determined from the Sloan Digital Sky Survey spectra

Kinematics and Physics of Celestial Bodies ◽

10.3103/s0884591307040046 ◽

2007 ◽

Vol 23 (4) ◽

pp. 163-170 ◽

Cited By ~ 2

Author(s):

Yu. S. Sholudchenko ◽

I. Yu. Izotova ◽

L. S. Pilyugin

Keyword(s):

Spiral Galaxy ◽

Sloan Digital Sky Survey ◽

H Ii Regions ◽

Oxygen Abundance ◽

Sky Survey

Download Full-text

A New Approach to Galaxy Morphology. I. Analysis of the Sloan Digital Sky Survey Early Data Release

The Astrophysical Journal ◽

10.1086/373919 ◽

2003 ◽

Vol 588 (1) ◽

pp. 218-229 ◽

Cited By ~ 265

Author(s):

Roberto G. Abraham ◽

Sidney van den Bergh ◽

Preethi Nair

Keyword(s):

Sloan Digital Sky Survey ◽

Early Data ◽

New Approach ◽

Sky Survey ◽

Data Release ◽

Galaxy Morphology

Download Full-text

Detecting neutral hydrogen at z ≳ 3 in large spectroscopic surveys of quasars

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa2388 ◽

2020 ◽

Vol 498 (2) ◽

pp. 1951-1962

Author(s):

Michele Fumagalli ◽

Sotiria Fotopoulou ◽

Laura Thomson

Keyword(s):

Random Forest ◽

Neutral Hydrogen ◽

Sloan Digital Sky Survey ◽

Broad Absorption Line ◽

Quasar Spectra ◽

Sky Survey ◽

William Herschel ◽

High Column

ABSTRACT We present a pipeline based on a random forest classifier for the identification of high column density clouds of neutral hydrogen (i.e. the Lyman limit systems, LLSs) in absorption within large spectroscopic surveys of z ≳ 3 quasars. We test the performance of this method on mock quasar spectra that reproduce the expected data quality of the Dark Energy Spectroscopic Instrument and the WHT (William Herschel Telescope) Enhanced Area Velocity Explorer surveys, finding ${\gtrsim}90{{\ \rm per\ cent}}$ completeness and purity for $N_{\rm H\,\rm{\small I}} \gtrsim 10^{17.2}~\rm cm^{-2}$ LLSs against quasars of g < 23 mag at z ≈ 3.5–3.7. After training and applying our method on 10 000 quasar spectra at z ≈ 3.5–4.0 from the Sloan Digital Sky Survey (Data Release 16), we identify ≈6600 LLSs with $N_{\rm H\,\rm{\small I}} \gtrsim 10^{17.5}~\rm cm^{-2}$ between z ≈ 3.1 and 4.0 with a completeness and purity of ${\gtrsim}90{{\ \rm per\ cent}}$ for the classification of LLSs. Using this sample, we measure a number of LLSs per unit redshift of ℓ(z) = 2.32 ± 0.08 at z = [3.3, 3.6]. We also present results on the performance of random forest for the measurement of the LLS redshifts and H i column densities, and for the identification of broad absorption line quasars.

Download Full-text

Drug Classification using Black-box models and Interpretability

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38203 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1518-1529

Author(s):

Pooja Thakkar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Learning Models ◽

Drug Classification ◽

Box Models ◽

Machine Learning Model ◽

Black Box Models ◽

Insight Into ◽

Machine Learning Models

Abstract: The focus of this study is on drug categorization utilising Machine Learning models, as well as interpretability utilizing LIME and SHAP to get a thorough understanding of the ML models. To do this, the researchers used machine learning models such as random forest, decision tree, and logistic regression to classify drugs. Then, using LIME and SHAP, they determined if these models were interpretable, which allowed them to better understand their results. It may be stated at the conclusion of this paper that LIME and SHAP can be utilised to get insight into a Machine Learning model and determine which attribute is accountable for the divergence in the outcomes. According to the LIME and SHAP results, it is also discovered that Random Forest and Decision Tree ML models are the best models to employ for drug classification, with Na to K and BP being the most significant characteristics for drug classification. Keywords: Machine Learning, Back-box models, LIME, SHAP, Decision Tree

Download Full-text