scholarly journals Machine learning in APOGEE

2019 ◽  
Vol 629 ◽  
pp. A34 ◽  
Author(s):  
Rafael Garcia-Dias ◽  
Carlos Allende Prieto ◽  
Jorge Sánchez Almeida ◽  
Pedro Alonso Palicio

Context. The vast volume of data generated by modern astronomical surveys offers test beds for the application of machine-learning. In these exploratory applications, it is important to evaluate potential existing tools and determine those that are optimal for extracting scientific knowledge from the available observations. Aims. We explore the possibility of using unsupervised clustering algorithms to separate stellar populations with distinct chemical patterns. Methods. Star clusters are likely the most chemically homogeneous populations in the Galaxy, and therefore any practical approach to identifying distinct stellar populations should at least be able to separate clusters from each other. We have applied eight clustering algorithms combined with four dimensionality reduction strategies to automatically distinguish stellar clusters using chemical abundances of 13 elements. Our test-bed sample includes 18 stellar clusters with a total of 453 stars. Results. We have applied statistical tests showing that some pairs of clusters (e.g., NGC 2458–NGC 2420) are indistinguishable from each other when chemical abundances from the Apache Point Galactic Evolution Experiment (APOGEE) are used. However, for most clusters we are able to automatically assign membership with metric scores similar to previous works. The confusion level of the automatically selected clusters is consistent with statistical tests that demonstrate the impossibility of perfectly distinguishing all the clusters from each other. These statistical tests and confusion levels establish a limit for the prospect of blindly identifying stars born in the same cluster based solely on chemical abundances. Conclusion. We find that some of the algorithms we explored are capable of blindly identify stellar populations with similar ages and chemical distributions in the APOGEE data. Even though we are not able to fully separate the clusters from each other, the main confusion arises from clusters with similar ages. Because some stellar clusters are chemically indistinguishable, our study supports the notion of extending weak chemical tagging that involves families of clusters instead of individual clusters.

2018 ◽  
Vol 612 ◽  
pp. A98 ◽  
Author(s):  
Rafael Garcia-Dias ◽  
Carlos Allende Prieto ◽  
Jorge Sánchez Almeida ◽  
Ignacio Ordovás-Pascual

Context. The volume of data generated by astronomical surveys is growing rapidly. Traditional analysis techniques in spectroscopy either demand intensive human interaction or are computationally expensive. In this scenario, machine learning, and unsupervised clustering algorithms in particular, offer interesting alternatives. The Apache Point Observatory Galactic Evolution Experiment (APOGEE) offers a vast data set of near-infrared stellar spectra, which is perfect for testing such alternatives. Aims. Our research applies an unsupervised classification scheme based on K-means to the massive APOGEE data set. We explore whether the data are amenable to classification into discrete classes. Methods. We apply the K-means algorithm to 153 847 high resolution spectra (R ≈ 22 500). We discuss the main virtues and weaknesses of the algorithm, as well as our choice of parameters. Results. We show that a classification based on normalised spectra captures the variations in stellar atmospheric parameters, chemical abundances, and rotational velocity, among other factors. The algorithm is able to separate the bulge and halo populations, and distinguish dwarfs, sub-giants, RC, and RGB stars. However, a discrete classification in flux space does not result in a neat organisation in the parameters’ space. Furthermore, the lack of obvious groups in flux space causes the results to be fairly sensitive to the initialisation, and disrupts the efficiency of commonly-used methods to select the optimal number of clusters. Our classification is publicly available, including extensive online material associated with the APOGEE Data Release 12 (DR12). Conclusions. Our description of the APOGEE database can help greatly with the identification of specific types of targets for various applications. We find a lack of obvious groups in flux space, and identify limitations of the K-means algorithm in dealing with this kind of data.


2018 ◽  
Vol 621 ◽  
pp. A9 ◽  
Author(s):  
Junjie Mao ◽  
Jelle de Plaa ◽  
Jelle S. Kaastra ◽  
Ciro Pinto ◽  
Liyi Gu ◽  
...  

Context. Chemical abundances in the X-ray halos (also known as the intracluster medium, ICM) of clusters and groups of galaxies can be measured via prominent emission line features in their X-ray spectra. Elemental abundances are footprints of time-integrated yields of various stellar populations that have left their specific abundance patterns prior to and during the cluster and group evolution. Aim. We aim to constrain nitrogen abundances in the CHEmical Evolution RGS Sample (CHEERS), which contains 44 nearby groups and clusters of galaxies, to gain a better understanding of their chemical enrichment. Method. We examined the high-resolution spectra of the CHEERS sample and took various systematic effects in the spectral modelling into account. We compared the observed abundance ratios with those in the Galactic stellar populations and also with predictions from stellar yields (low- and intermediate-mass stars, massive stars, and degenerate stars). Results. The nitrogen abundance can only be well constrained (≳3σ) in one cluster of galaxies and seven groups of galaxies. The [O/Fe] – [Fe/H] relation of the ICM is comparable to that for the Galaxy, while the [N/Fe] and [N/O] ratios of the ICM are both higher than in the Galaxy. Future studies on nitrogen radial distributions are required to tell whether the obtained higher [N/Fe] and [N/O] ratios are biased as a result of the small extraction region (r/r500 ≲ 0.05) that we adopt here. Since abundances of odd-Z elements are more sensitive to the initial metallicity of stellar populations, accurate abundance measurements of N, Na, and Al are required to better constrain the chemical enrichment in the X-ray halos of clusters and groups of galaxies.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
José Castela Forte ◽  
Galiya Yeshmagambetova ◽  
Maureen L. van der Grinten ◽  
Bart Hiemstra ◽  
Thomas Kaufmann ◽  
...  

AbstractCritically ill patients constitute a highly heterogeneous population, with seemingly distinct patients having similar outcomes, and patients with the same admission diagnosis having opposite clinical trajectories. We aimed to develop a machine learning methodology that identifies and provides better characterization of patient clusters at high risk of mortality and kidney injury. We analysed prospectively collected data including co-morbidities, clinical examination, and laboratory parameters from a minimally-selected population of 743 patients admitted to the ICU of a Dutch hospital between 2015 and 2017. We compared four clustering methodologies and trained a classifier to predict and validate cluster membership. The contribution of different variables to the predicted cluster membership was assessed using SHapley Additive exPlanations values. We found that deep embedded clustering yielded better results compared to the traditional clustering algorithms. The best cluster configuration was achieved for 6 clusters. All clusters were clinically recognizable, and differed in in-ICU, 30-day, and 90-day mortality, as well as incidence of acute kidney injury. We identified two high mortality risk clusters with at least 60%, 40%, and 30% increased. ICU, 30-day and 90-day mortality, and a low risk cluster with 25–56% lower mortality risk. This machine learning methodology combining deep embedded clustering and variable importance analysis, which we made publicly available, is a possible solution to challenges previously encountered by clustering analyses in heterogeneous patient populations and may help improve the characterization of risk groups in critical care.


Author(s):  
Dhamanpreet Kaur ◽  
Matthew Sobiesk ◽  
Shubham Patil ◽  
Jin Liu ◽  
Puran Bhagat ◽  
...  

Abstract Objective This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data. Materials and Methods We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data. Results Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules. Discussion Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools. Conclusion We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.


2016 ◽  
Vol 11 (S321) ◽  
pp. 50-50
Author(s):  
Daisuke Toyouchi ◽  
Masashi Chiba

AbstractWe investigate the structure and dynamics of the Milky Way (MW) disk stars based on the analysis of the Apache Point Observatory Galactic Evolution Experiment (APOGEE) data, to infer the past evolution histories of the MW disk component(s) possibly affected by radial migration and/or satellite accretions. APOGEE is the first near-infrared spectroscopic survey for a large number of the MW disk stars, providing their radial velocities and chemical abundances without significant dust extinction effects. We here adopt red-clump (RC) stars (Bovy et al. 2014), for which the distances from the Sun are determined precisely, and analyze their radial velocities and chemical abundances in the MW disk regions covering from the Galactocentric distance, R, of 5 kpc to 14 kpc. We investigate their dynamical properties, such as mean rotational velocities, 〈Vφ〉 and velocity dispersions, as a function of R, based on the MCMC Bayesian method. We find that at all radii, the dynamics of alpha-poor stars, which are candidates of young disk stars, is much different from that of alpha-rich stars, which are candidates of old disk stars. We find that our Jeans analysis for our sample stars reveals characteristic spatial and dynamical properties of the MW disk, which are generally in agreement with the recent independent work by Bovy et al. (2015) but with a different method from ours.


2008 ◽  
Vol 480 (2) ◽  
pp. 379-395 ◽  
Author(s):  
L. Pompéia ◽  
V. Hill ◽  
M. Spite ◽  
A. Cole ◽  
F. Primas ◽  
...  

2013 ◽  
Vol 560 ◽  
pp. A44 ◽  
Author(s):  
M. Van der Swaelmen ◽  
V. Hill ◽  
F. Primas ◽  
A. A. Cole

2020 ◽  
Vol 5 (6) ◽  
pp. 651-658 ◽  
Author(s):  
Mirpouya Mirmozaffari ◽  
Azam Boskabadi ◽  
Gohar Azeem ◽  
Reza Massah ◽  
Elahe Boskabadi ◽  
...  

Machine learning grows quickly, which has made numerous academic discoveries and is extensively evaluated in several areas. Optimization, as a vital part of machine learning, has fascinated much consideration of practitioners. The primary purpose of this paper is to combine optimization and machine learning to extract hidden rules, remove unrelated data, introduce the most productive Decision-Making Units (DMUs) in the optimization part, and to introduce the algorithm with the highest accuracy in Machine learning part. In the optimization part, we evaluate the productivity of 30 banks from eight developing countries over the period 2015-2019 by utilizing Data Envelopment Analysis (DEA). An additive Data Envelopment Analysis (DEA) model for measuring the efficiency of decision processes is used. The additive models are often named Slack Based Measure (SBM). This group of models measures efficiency via slack variables. After applying the proposed model, the Malmquist Productivity Index (MPI) is computed to evaluate the productivity of companies. In the machine learning part, we use a specific two-layer data mining filtering pre-processes for clustering algorithms to increase the efficiency and to find the superior algorithm. This study tackles data and methodology-related issues in measuring the productivity of the banks in developing countries and highlights the significance of DMUs productivity and algorithms accuracy in the banking industry by comparing suggested models.


2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


Sign in / Sign up

Export Citation Format

Share Document