Machine learning applied to multifrequency data in astrophysics: blazar classification

ABSTRACT The study of machine learning (ML) techniques for the autonomous classification of astrophysical sources is of great interest, and we explore its applications in the context of a multifrequency data-frame. We test the use of supervised ML to classify blazars according to its synchrotron peak frequency, either lower or higher than 1015 Hz. We select a sample with 4178 blazars labelled as 1279 high synchrotron peak (HSP: $\rm \nu$-peak > 1015 Hz) and 2899 low synchrotron peak (LSP: $\rm \nu$-peak < 1015 Hz). A set of multifrequency features were defined to represent each source that includes spectral slopes ($\alpha _{\nu _1, \nu _2}$) between the radio, infra-red, optical, and X-ray bands, also considering IR colours. We describe the optimization of five ML classification algorithms that classify blazars into LSP or HSP: Random forests (RFs), support vector machine (SVM), K-nearest neighbours (KNN), Gaussian Naive Bayes (GNB), and the Ludwig auto-ML framework. In our particular case, the SVM algorithm had the best performance, reaching 93 per cent of balanced accuracy. A joint-feature permutation test revealed that the spectral slopes alpha-radio-infrared (IR) and alpha-radio-optical are the most relevant for the ML modelling, followed by the IR colours. This work shows that ML algorithms can distinguish multifrequency spectral characteristics and handle the classification of blazars into LSPs and HSPs. It is a hint for the potential use of ML for the autonomous determination of broadband spectral parameters (as the synchrotron ν-peak), or even to search for new blazars in all-sky data bases.

Download Full-text

MACHINE LEARNING ALGORITHMS FOR IDENTIFICATION OF ABNORMAL GLOW CURVES AND ASSOCIATED ABNORMALITY IN CaSO4:DY-BASED PERSONNEL MONITORING DOSIMETERS

Radiation Protection Dosimetry ◽

10.1093/rpd/ncaa108 ◽

2020 ◽

Vol 190 (3) ◽

pp. 342-351

Author(s):

Munir S Pathan ◽

S M Pradhan ◽

T Palani Selvam

Keyword(s):

Machine Learning ◽

Glow Curve ◽

Good Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Computationally Efficient ◽

Artificial Neural Network Ann ◽

First Time

Abstract In the present study, machine learning (ML) methods for the identification of abnormal glow curves (GC) of CaSO4:Dy-based thermoluminescence dosimeters in individual monitoring are presented. The classifier algorithms, random forest (RF), artificial neural network (ANN) and support vector machine (SVM) are employed for identifying not only the abnormal glow curve but also the type of abnormality. For the first time, the simplest and computationally efficient algorithm based on RF is presented for GC classifications. About 4000 GCs are used for the training and validation of ML algorithms. The performance of all algorithms is compared by using various parameters. Results show a fairly good accuracy of 99.05% for the classification of GCs by RF algorithm. Whereas 96.7% and 96.1% accuracy is achieved using ANN and SVM, respectively. The RF-based classifier is recommended for GC classification as well as in assisting the fault determination of the TLD reader system.

Download Full-text

Extracted features based multi-class classification of orthodontic images

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i4.pp3558-3567 ◽

2020 ◽

Vol 10 (4) ◽

pp. 3558

Author(s):

Hicham Riri ◽

Mohammed Ed-Dhahraouy ◽

Abdelmajid Elmoutaouakkil ◽

Abderrahim Beni-Hssane ◽

Farid Bourzgui

Keyword(s):

Machine Learning ◽

Local Binary Pattern ◽

Principal Component ◽

Machine Learning Algorithms ◽

Support Vector ◽

Linear Discriminant ◽

Nearest Neighbours ◽

Multi Class Classification ◽

Pca Algorithm

The purpose of this study is to investigate computer vision and machine learning methods for classification of orthodontic images in order to provide orthodontists with a solution for multi-class classification of patients’ images to evaluate the evolution of their treatment. Of which, we proposed three algorithms based on extracted features, such as facial features and skin colour using YCbCrcolour space, assigned to nodes of a decision tree to classify orthodontic images: an algorithm for intra-oral images, an algorithm for mould images and an algorithm for extra-oral images. Then, we compared our method by implementing the Local Binary Pattern (LBP) algorithm to extract textural features from images. After that, we applied the principal component analysis (PCA) algorithm to optimize the redundant parameters in order to classify LBP features with six classifiers; Quadratic Support Vector Machine (SVM), Cubic SVM, Radial Basis Function SVM, Cosine K-Nearest Neighbours (KNN), Euclidian KNN, and Linear Discriminant Analysis (LDA). The presented algorithms have been evaluated on a dataset of images of 98 different patients, and experimental results demonstrate the good performances of our proposed method with a high accuracy compared with machine learning algorithms. Where LDA classifier achieves an accuracy of 84.5%.

Download Full-text

Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1770 ◽

2020 ◽

Vol 4 (2) ◽

pp. 329-335

Author(s):

Rusydi Umar ◽

Imam Riadi ◽

Purwono

Keyword(s):

Social Media ◽

Gradient Descent ◽

Classification Model ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Svm Algorithm ◽

Vector Machines ◽

Performance Patterns ◽

A Company

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.

Download Full-text

Early Prediction of Seven-Day Mortality in Intensive Care Unit Using a Machine Learning Model: Results from the SPIN-UTI Project

Journal of Clinical Medicine ◽

10.3390/jcm10050992 ◽

2021 ◽

Vol 10 (5) ◽

pp. 992

Author(s):

Martina Barchitta ◽

Andrea Maugeri ◽

Giuliana Favara ◽

Paolo Marco Riela ◽

Giovanni Gallo ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

Learning Algorithm ◽

Area Under The Curve ◽

Support Vector ◽

Icu Admission ◽

Risk Of Death ◽

Saps Ii ◽

Svm Algorithm

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.

Download Full-text

NLOS Multipath Classification of GNSS Signal Correlation Output Using Machine Learning

Sensors ◽

10.3390/s21072503 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2503

Author(s):

Taro Suzuki ◽

Yoshiharu Amano

Keyword(s):

Machine Learning ◽

Satellite System ◽

Training Data ◽

Support Vector ◽

Positioning Errors ◽

Automated Method ◽

Global Navigation Satellite ◽

Better Than ◽

Signal Correlation

This paper proposes a method for detecting non-line-of-sight (NLOS) multipath, which causes large positioning errors in a global navigation satellite system (GNSS). We use GNSS signal correlation output, which is the most primitive GNSS signal processing output, to detect NLOS multipath based on machine learning. The shape of the multi-correlator outputs is distorted due to the NLOS multipath. The features of the shape of the multi-correlator are used to discriminate the NLOS multipath. We implement two supervised learning methods, a support vector machine (SVM) and a neural network (NN), and compare their performance. In addition, we also propose an automated method of collecting training data for LOS and NLOS signals of machine learning. The evaluation of the proposed NLOS detection method in an urban environment confirmed that NN was better than SVM, and 97.7% of NLOS signals were correctly discriminated.

Download Full-text

Delineating Smallholder Maize Farms from Sentinel-1 Coupled with Sentinel-2 Data Using Machine Learning

Sustainability ◽

10.3390/su13094728 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4728

Author(s):

Zinhle Mashaba-Munghemezulu ◽

George Johannes Chirima ◽

Cilence Munghemezulu

Keyword(s):

Machine Learning ◽

Food Security ◽

Rural Communities ◽

Machine Learning Algorithms ◽

Support Vector ◽

Subsistence Agriculture ◽

Smallholder Farms ◽

Main Driver ◽

Sentinel 2

Rural communities rely on smallholder maize farms for subsistence agriculture, the main driver of local economic activity and food security. However, their planted area estimates are unknown in most developing countries. This study explores the use of Sentinel-1 and Sentinel-2 data to map smallholder maize farms. The random forest (RF), support vector (SVM) machine learning algorithms and model stacking (ST) were applied. Results show that the classification of combined Sentinel-1 and Sentinel-2 data improved the RF, SVM and ST algorithms by 24.2%, 8.7%, and 9.1%, respectively, compared to the classification of Sentinel-1 data individually. Similarities in the estimated areas (7001.35 ± 1.2 ha for RF, 7926.03 ± 0.7 ha for SVM and 7099.59 ± 0.8 ha for ST) show that machine learning can estimate smallholder maize areas with high accuracies. The study concludes that the single-date Sentinel-1 data were insufficient to map smallholder maize farms. However, single-date Sentinel-1 combined with Sentinel-2 data were sufficient in mapping smallholder farms. These results can be used to support the generation and validation of national crop statistics, thus contributing to food security.

Download Full-text

Smart Design Nano-Hybrid Formulations by Machine Learning

Proceedings ◽

10.3390/iecp2020-08700 ◽

2020 ◽

Vol 78 (1) ◽

pp. 5

Author(s):

Raquel de Melo Barbosa ◽

Fabio Fonseca de Oliveira ◽

Gabriel Bezerra Motta Câmara ◽

Tulio Flavio Accioly de Lima e Moura ◽

Fernanda Nervo Raffin ◽

...

Keyword(s):

Machine Learning ◽

Experimental Data ◽

Water Solubility ◽

Inorganic Materials ◽

Fine Tuning ◽

Support Vector ◽

Drug Solubility ◽

Physical Behavior ◽

Best Fit

Nano-hybrid formulations combine organic and inorganic materials in self-assembled platforms for drug delivery. Laponite is a synthetic clay, biocompatible, and a guest of compounds. Poloxamines are amphiphilic four-armed compounds and have pH-sensitive and thermosensitive properties. The association of Laponite and Poloxamine can be used to improve attachment to drugs and to increase the solubility of β-Lapachone (β-Lap). β-Lap has antiviral, antiparasitic, antitumor, and anti-inflammatory properties. However, the low water solubility of β-Lap limits its clinical and medical applications. All samples were prepared by mixing Tetronic 1304 and LAP in a range of 1–20% (w/w) and 0–3% (w/w), respectively. The β-Lap solubility was analyzed by UV-vis spectrophotometry, and physical behavior was evaluated across a range of temperatures. The analysis of data consisted of response surface methodology (RMS), and two kinds of machine learning (ML): multilayer perceptron (MLP) and support vector machine (SVM). The ML techniques, generated from a training process based on experimental data, obtained the best correlation coefficient adjustment for drug solubility and adequate physical classifications of the systems. The SVM method presented the best fit results of β-Lap solubilization. In silico tools promoted fine-tuning, and near-experimental data show β-Lap solubility and classification of physical behavior to be an excellent strategy for use in developing new nano-hybrid platforms.

Download Full-text

Multivariate Analysis for the Classification of Chocolate According to its Percentage of Cocoa by Using Terahertz Time-Domain Spectroscopy (THz-TDS)

Proceedings ◽

10.3390/foods_2020-08029 ◽

2020 ◽

Vol 70 (1) ◽

pp. 109

Author(s):

Jimy Oblitas ◽

Jorge Ruiz

Keyword(s):

Machine Learning ◽

Time Domain ◽

Electromagnetic Pulse ◽

Machine Learning Algorithms ◽

Classification Models ◽

Terahertz Time Domain Spectroscopy ◽

Time Domain Spectroscopy ◽

Svm Algorithm ◽

Classification Of Images

Terahertz time-domain spectroscopy is a useful technique for determining some physical characteristics of materials, and is based on selective frequency absorption of a broad-spectrum electromagnetic pulse. In order to investigate the potential of this technology to classify cocoa percentages in chocolates, the terahertz spectra (0.5–10 THz) of five chocolate samples (50%, 60%, 70%, 80% and 90% of cocoa) were examined. The acquired data matrices were analyzed with the MATLAB 2019b application, from which the dielectric function was obtained along with the absorbance curves, and were classified by using 24 mathematical classification models, achieving differentiations of around 93% obtained by the Gaussian SVM algorithm model with a kernel scale of 0.35 and a one-against-one multiclass method. It was concluded that the combined processing and classification of images obtained from the terahertz time-domain spectroscopy and the use of machine learning algorithms can be used to successfully classify chocolates with different percentages of cocoa.

Download Full-text