scholarly journals Variable star classification using multiview metric learning

2019 ◽  
Vol 491 (3) ◽  
pp. 3805-3819 ◽  
Author(s):  
K B Johnston ◽  
S M Caballero-Nieves ◽  
V Petit ◽  
A M Peter ◽  
R Haber

ABSTRACT Comprehensive observations of variable stars can include time domain photometry in a multitude of filters, spectroscopy, estimates of colour (e.g. U-B), etc. When the objective is to classify variable stars, traditional machine learning techniques distill these various representations (or views) into a single feature vector and attempt to discriminate among desired categories. In this work, we propose an alternative approach that inherently leverages multiple views of the same variable star. Our multiview metric learning framework enables robust characterization of star categories by directly learning to discriminate in a multifaceted feature space, thus, eliminating the need to combine feature representations prior to fitting the machine learning model. We also demonstrate how to extend standard multiview learning, which employs multiple vectorized views, to the matrix-variate case which allows very novel variable star signature representations. The performance of our proposed methods is evaluated on the UCR Starlight and LINEAR data sets. Both the vector and matrix-variate versions of our multiview learning framework perform favourably – demonstrating the ability to discriminate variable star categories.

Author(s):  
Ernesto Dufrechou ◽  
Pablo Ezzatti ◽  
Enrique S Quintana-Ortí

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.


2021 ◽  
Author(s):  
Rogini Runghen ◽  
Daniel B Stouffer ◽  
Giulio Valentino Dalla Riva

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.


2021 ◽  
Vol 3 (4) ◽  
pp. 32-37
Author(s):  
J. Adassuriya ◽  
J. A. N. S. S. Jayasinghe ◽  
K. P. S. C. Jayaratne

Machine learning algorithms play an impressive role in modern technology and address automation problems in many fields as these techniques can be used to identify features with high sensitivity, which humans or other programming techniques aren’t capable of detecting. In addition, the growth of the availability of the data demands the need of faster, accurate, and more reliable automating methods of extracting information, reforming, and preprocessing, and analyzing them in the world of science. The development of machine learning techniques to automate complex manual programs is a time relevant research in astrophysics as it’s a field where, experts are dealing with large sets of data every day. In this study, an automated classification was built for 6 types of star classes Beta Cephei, Delta Scuti, Gamma Doradus, Red Giants, RR Lyrae and RV Tarui with widely varying properties, features extracted from training dataset of stellar light curves obtained from Kepler mission. The Random Forest classification model was used as the Machine Learning model and both periodic and non-periodic features extracted from light curves were used as the inputs to the model. Our implementation achieved an accuracy of 86.5%, an average precision level of 0.86, an average recall value of 0.87, and average F1-Score of 0.86 for the testing dataset obtained from the Kepler mission.


2021 ◽  
Author(s):  
Kalum J. Ost ◽  
David W. Anderson ◽  
David W. Cadotte

With the common adoption of electronic health records and new technologies capable of producing an unprecedented scale of data, a shift must occur in how we practice medicine in order to utilize these resources. We are entering an era in which the capacity of even the most clever human doctor simply is insufficient. As such, realizing “personalized” or “precision” medicine requires new methods that can leverage the massive amounts of data now available. Machine learning techniques provide one important toolkit in this venture, as they are fundamentally designed to deal with (and, in fact, benefit from) massive datasets. The clinical applications for such machine learning systems are still in their infancy, however, and the field of medicine presents a unique set of design considerations. In this chapter, we will walk through how we selected and adjusted the “Progressive Learning framework” to account for these considerations in the case of Degenerative Cervical Myeolopathy. We additionally compare a model designed with these techniques to similar static models run in “perfect world” scenarios (free of the clinical issues address), and we use simulated clinical data acquisition scenarios to demonstrate the advantages of our machine learning approach in providing personalized diagnoses.


2017 ◽  
Vol 152 ◽  
pp. 03011
Author(s):  
Alejandro García-Varela ◽  
Muriel Pérez ◽  
Beatriz Sabogal ◽  
Adolfo Quiroz

Author(s):  
Samreen Naeem ◽  
Aqib Ali ◽  
Jamal Abdul Nasir ◽  
Arooj Fatima ◽  
Farrukh Jamal ◽  
...  

The purpose of this learning is to detect the Corn Seed Fusarium Disease using Hybrid Feature Space and Conventional machine learning (ML) approaches. A novel machine learning approach is employed for the classification of a total of six types of corn seed are collected which contain Infected Fusarium (moniliforme, graminearum, gibberella, verticillioides, kernel) as well as healthy corn seed, based on a multi-feature dataset, which is the grouping of geometric, texture and histogram features extracted from digital images. For each corn seed image, a total of twenty-five multi-features have been developed on every area of interest (AOI), sizes (50 × 50), (100 × 100), (150 × 150), and (200 × 200). A total of seven optimized features were selected by using a machine learning-based algorithm named “Correlation-based Feature Selection”. For experimentation, “Random forest”, “BayesNet” and “LogitBoost” have been employed using an optimized multi-feature user-supplied dataset divided with 70% training and 30 % testing. A comparative analysis of three ML classifiers RF, BN, and LB have been used and a considerably very high classification ratio of 96.67 %, 97.22 %, and 97.78 % have been achieved respectively when the AOI size (200×200) have been deployed to the classifiers.


2020 ◽  
Vol 634 ◽  
pp. A57 ◽  
Author(s):  
W. Dobbels ◽  
M. Baes ◽  
S. Viaene ◽  
S. Bianchi ◽  
J. I. Davies ◽  
...  

Context. Dust plays an important role in shaping a galaxy’s spectral energy distribution (SED). It absorbs ultraviolet (UV) to near-infrared radiation and re-emits this energy in the far-infrared (FIR). The FIR is essential to understand dust in galaxies. However, deep FIR observations require a space mission, none of which are still active today. Aims. We aim to infer the FIR emission across six Herschel bands, along with dust luminosity, mass, and effective temperature, based on the available UV to mid-infrared (MIR) observations. We also want to estimate the uncertainties of these predictions, compare our method to energy balance SED fitting, and determine possible limitations of the model. Methods. We propose a machine learning framework to predict the FIR fluxes from 14 UV–MIR broadband fluxes. We used a low redshift sample by combining DustPedia and H-ATLAS, and extracted Bayesian flux posteriors through SED fitting. We trained shallow neural networks to predict the far-infrared fluxes, uncertainties, and dust properties. We evaluated them on a test set using a root mean square error (RMSE) in log-space. Results. Our results (RMSE = 0.19 dex) significantly outperform UV–MIR energy balance SED fitting (RMSE = 0.38 dex), and are inherently unbiased. We can identify when the predictions are off, for example when the input has large uncertainties on WISE 22 μm, or when the input does not resemble the training set. Conclusions. The galaxies for which we have UV–FIR observations can be used as a blueprint for galaxies that lack FIR data. This results in a “virtual FIR telescope”, which can be applied to large optical-MIR galaxy samples. This helps bridge the gap until the next FIR mission.


Entropy ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. 1084 ◽  
Author(s):  
Wang ◽  
Kou ◽  
Song

In this paper, the risk pattern of e-bike riders in China was examined, based on tree-structured machine learning techniques. Three-year crash/violation data were acquired from the Kunshan traffic police department, China. Firstly, high-risk (HR) electric bicycle (e-bike) riders were defined as those with at-fault crash involvement, while others (i.e. non-at-fault or without crash involvement) were considered as non-high-risk (NHR) riders, based on quasi-induced exposure theory. Then, for e-bike riders, their demographics and previous violation-related features were developed based on the crash/violation records. After that, a systematic machine learning (ML) framework was proposed so as to capture the complex risk patterns of those e-bike riders. An ensemble sampling method was selected to deal with the imbalanced datasets. Four tree-structured machine learning methods were compared, and a gradient boost decision tree (GBDT) appeared to be the best. The feature importance and partial dependence were further examined. Interesting findings include the following: (1) tree-structured ML models are able to capture complex risk patterns and interpret them properly; (2) spatial-temporal violation features were found as important indicators of high-risk e-bike riders; and (3) violation behavior features appeared to be more effective than violation punishment-related features, in terms of identifying high-risk e-bike riders. In general, the proposed ML framework is able to identify the complex crash risk pattern of e-bike riders. This paper provides useful insights for policy-makers and traffic practitioners regarding e-bike safety improvement in China.


2020 ◽  
Author(s):  
Diego A. Delgadillo-Duran ◽  
Cesar A. Vargas-García ◽  
Viviana M. Varón-Ramírez ◽  
Francisco Calderón ◽  
Andrea C. Montenegro ◽  
...  

Abstract Knowing chemical soil properties might be determinant in crop management and total yield production. Traditional property estimation approaches are time-consuming and require complex lab setups, refraining farmers from taking steps towards optimal practices in their crops promptly. Property estimation from spectral signals (vis-NIRS), emerged as a low-cost, non-invasive, and non-destructive alternative. Current approaches use mathematical and statistical techniques, avoiding machine learning framework. Here we propose both regression and classification with machine learning techniques to assess performance in the prediction and infer categories of common soil properties (pH, soil organic matter, Ca, Na, K and Mg), evaluated by the most common metrics. In sugarcane soils, we use regression to estimate properties and classification to assess soil's property status and report the direct relation between spectra bands and direct measure of certain properties. In both cases, we achieved similar performance on similar setups reported in the literature.


Author(s):  
Pedro Maia De Santana

Machine learning techniques applied to radio frequency (RF) signals are used for many applications in addition to data communication. In this paper, the authors propose a machine learning solution for classifying the number of people within an indoor ambient. The main idea is to identify a pattern of received signal characteristics according to the number of people. Experimental measurements are performed using a software-defined radio platform inside a laboratory. The data collected is post-processed by applying a feature mapping technique based on mean, standard deviation, and Shannon information entropy. This feature-space data is then used to train a supervised machine learning network for classifying scenarios with zero, one, two, and three people inside. The proposed solution presents significant accuracy in classification performance.


Sign in / Sign up

Export Citation Format

Share Document