scholarly journals Automatic Search of Cataclysmic Variables Based on LightGBM in LAMOST-DR7

Universe ◽  
2021 ◽  
Vol 7 (11) ◽  
pp. 438
Author(s):  
Zhiyuan Hu ◽  
Jianyu Chen ◽  
Bin Jiang ◽  
Wenyu Wang

The search for special and rare celestial objects has always played an important role in astronomy. Cataclysmic Variables (CVs) are special and rare binary systems with accretion disks. Most CVs are in the quiescent period, and their spectra have the emission lines of Balmer series, HeI, and HeII. A few CVs in the outburst period have the absorption lines of Balmer series. Owing to the scarcity of numbers, expanding the spectral data of CVs is of positive significance for studying the formation of accretion disks and the evolution of binary star system models. At present, the research for astronomical spectra has entered the era of Big Data. The Large Sky Area Multi-Object Fiber Spectroscopy Telescope (LAMOST) has produced more than tens of millions of spectral data. the latest released LAMOST-DR7 includes 10.6 million low-resolution spectral data in 4926 sky regions, providing ideal data support for searching CV candidates. To process and analyze the massive amounts of spectral data, this study employed the Light Gradient Boosting Machine (LightGBM) algorithm, which is based on the ensemble tree model to automatically conduct the search in LAMOST-DR7. Finally, 225 CV candidates were found and four new CV candidates were verified by SIMBAD and published catalogs. This study also built the Gradient Boosting Decision Tree (GBDT), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost) models and used Accuracy, Precision, Recall, the F1-score, and the ROC curve to compare the four models comprehensively. Experimental results showed that LightGBM is more efficient. The search for CVs based on LightGBM not only enriches the existing CV spectral library, but also provides a reference for the data mining of other rare celestial objects in massive spectral data.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jong Ho Kim ◽  
Haewon Kim ◽  
Ji Su Jang ◽  
Sung Mi Hwang ◽  
So Young Lim ◽  
...  

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.


Risks ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 202
Author(s):  
Ge Gao ◽  
Hongxin Wang ◽  
Pengbin Gao

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.


2021 ◽  
Author(s):  
Seong Hwan Kim ◽  
Eun-Tae Jeon ◽  
Sungwook Yu ◽  
Kyungmi O ◽  
Chi Kyung Kim ◽  
...  

Abstract We aimed to develop a novel prediction model for early neurological deterioration (END) based on an interpretable machine learning (ML) algorithm for atrial fibrillation (AF)-related stroke and to evaluate the prediction accuracy and feature importance of ML models. Data from multi-center prospective stroke registries in South Korea were collected. After stepwise data preprocessing, we utilized logistic regression, support vector machine, extreme gradient boosting, light gradient boosting machine (LightGBM), and multilayer perceptron models. We used the Shapley additive explanations (SHAP) method to evaluate feature importance. Of the 3,623 stroke patients, the 2,363 who had arrived at the hospital within 24 hours of symptom onset and had available information regarding END were included. Of these, 318 (13.5%) had END. The LightGBM model showed the highest area under the receiver operating characteristic curve (0.778, 95% CI, 0.726 - 0.830). The feature importance analysis revealed that fasting glucose level and the National Institute of Health Stroke Scale score were the most influential factors. Among ML algorithms, the LightGBM model was particularly useful for predicting END, as it revealed new and diverse predictors. Additionally, the SHAP method can be adjusted to individualize the features’ effects on the predictive power of the model.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mohammad-Reza Mohammadi ◽  
Fahime Hadavimoghaddam ◽  
Maryam Pourmahdi ◽  
Saeid Atashrouz ◽  
Muhammad Tajammal Munir ◽  
...  

AbstractDue to industrial development, designing and optimal operation of processes in chemical and petroleum processing plants require accurate estimation of the hydrogen solubility in various hydrocarbons. Equations of state (EOSs) are limited in accurately predicting hydrogen solubility, especially at high-pressure or/and high-temperature conditions, which may lead to energy waste and a potential safety hazard in plants. In this paper, five robust machine learning models including extreme gradient boosting (XGBoost), adaptive boosting support vector regression (AdaBoost-SVR), gradient boosting with categorical features support (CatBoost), light gradient boosting machine (LightGBM), and multi-layer perceptron (MLP) optimized by Levenberg–Marquardt (LM) algorithm were implemented for estimating the hydrogen solubility in hydrocarbons. To this end, a databank including 919 experimental data points of hydrogen solubility in 26 various hydrocarbons was gathered from 48 different systems in a broad range of operating temperatures (213–623 K) and pressures (0.1–25.5 MPa). The hydrocarbons are from six different families including alkane, alkene, cycloalkane, aromatic, polycyclic aromatic, and terpene. The carbon number of hydrocarbons is ranging from 4 to 46 corresponding to a molecular weight range of 58.12–647.2 g/mol. Molecular weight, critical pressure, and critical temperature of solvents along with pressure and temperature operating conditions were selected as input parameters to the models. The XGBoost model best fits all the experimental solubility data with a root mean square error (RMSE) of 0.0007 and an average absolute percent relative error (AAPRE) of 1.81%. Also, the proposed models for estimating the solubility of hydrogen in hydrocarbons were compared with five EOSs including Soave–Redlich–Kwong (SRK), Peng–Robinson (PR), Redlich–Kwong (RK), Zudkevitch–Joffe (ZJ), and perturbed-chain statistical associating fluid theory (PC-SAFT). The XGBoost model introduced in this study is a promising model that can be applied as an efficient estimator for hydrogen solubility in various hydrocarbons and is capable of being utilized in the chemical and petroleum industries.


1982 ◽  
Vol 69 ◽  
pp. 219-230 ◽  
Author(s):  
G. Hensler

AbstractA numerical method for 3D magnetohydrodynamical investigations of accretion disks in close binary systems is presented, which allows for good spatial resolution of structures (hot spot, accretion column). The gas is treated as individual gas cells (pseudo-particles) whose motion is calculated within a grid consisting of one spherical inner part for 3D MHD and two plane outer parts. Viscous interactions of the gas cells are taken into account by a special treatment connected with the grid geometry.We present one result of 2D hydrodynamical calculations for a binary applying the following parameters which are representative for Cataclysmic Variables: M1 = 1 Mʘ, r1 = 10-2 Rʘ, M2 = 0.5 Mʘ, p = 0.2 d, M = 10-9 Mʘ y-1.Column density and radiative flux distributions over the disk are shown and briefly discussed by comparison with the theoretical understanding of these Dwarf Novae drawn from observations.


Author(s):  
Saifur Rahman ◽  
Muhammad Irfan ◽  
Mohsin Raza ◽  
Khawaja Moyeezullah Ghori ◽  
Shumayla Yaqoob ◽  
...  

Physical activity is essential for physical and mental health, and its absence is highly associated with severe health conditions and disorders. Therefore, tracking activities of daily living can help promote quality of life. Wearable sensors in this regard can provide a reliable and economical means of tracking such activities, and such sensors are readily available in smartphones and watches. This study is the first of its kind to develop a wearable sensor-based physical activity classification system using a special class of supervised machine learning approaches called boosting algorithms. The study presents the performance analysis of several boosting algorithms (extreme gradient boosting—XGB, light gradient boosting machine—LGBM, gradient boosting—GB, cat boosting—CB and AdaBoost) in a fair and unbiased performance way using uniform dataset, feature set, feature selection method, performance metric and cross-validation techniques. The study utilizes the Smartphone-based dataset of thirty individuals. The results showed that the proposed method could accurately classify the activities of daily living with very high performance (above 90%). These findings suggest the strength of the proposed system in classifying activity of daily living using only the smartphone sensor’s data and can assist in reducing the physical inactivity patterns to promote a healthier lifestyle and wellbeing.


Electronics ◽  
2019 ◽  
Vol 8 (7) ◽  
pp. 743 ◽  
Author(s):  
Alice Stazio ◽  
Juan G. Victores ◽  
David Estevez ◽  
Carlos Balaguer

The examination of Personal Protective Equipment (PPE) to assure the complete integrity of health personnel in contact with infected patients is one of the most necessary tasks when treating patients affected by infectious diseases, such as Ebola. This work focuses on the study of machine vision techniques for the detection of possible defects on the PPE that could arise after contact with the aforementioned pathological patients. A preliminary study on the use of image classification algorithms to identify blood stains on PPE subsequent to the treatment of the infected patient is presented. To produce training data for these algorithms, a synthetic dataset was generated from a simulated model of a PPE suit with blood stains. Furthermore, the study proceeded with the utilization of images of the PPE with a physical emulation of blood stains, taken by a real prototype. The dataset reveals a great imbalance between positive and negative samples; therefore, all the selected classification algorithms are able to manage this kind of data. Classifiers range from Logistic Regression and Support Vector Machines, to bagging and boosting techniques such as Random Forest, Adaptive Boosting, Gradient Boosting and eXtreme Gradient Boosting. All these algorithms were evaluated on accuracy, precision, recall and F 1 score; and additionally, execution times were considered. The obtained results report promising outcomes of all the classifiers, and, in particular Logistic Regression resulted to be the most suitable classification algorithm in terms of F 1 score and execution time, considering both datasets.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Seyed Ali Madani ◽  
Mohammad-Reza Mohammadi ◽  
Saeid Atashrouz ◽  
Ali Abedi ◽  
Abdolhossein Hemmati-Sarapardeh ◽  
...  

AbstractAccurate prediction of the solubility of gases in hydrocarbons is a crucial factor in designing enhanced oil recovery (EOR) operations by gas injection as well as separation, and chemical reaction processes in a petroleum refinery. In this work, nitrogen (N2) solubility in normal alkanes as the major constituents of crude oil was modeled using five representative machine learning (ML) models namely gradient boosting with categorical features support (CatBoost), random forest, light gradient boosting machine (LightGBM), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost). A large solubility databank containing 1982 data points was utilized to establish the models for predicting N2 solubility in normal alkanes as a function of pressure, temperature, and molecular weight of normal alkanes over broad ranges of operating pressure (0.0212–69.12 MPa) and temperature (91–703 K). The molecular weight range of normal alkanes was from 16 to 507 g/mol. Also, five equations of state (EOSs) including Redlich–Kwong (RK), Soave–Redlich–Kwong (SRK), Zudkevitch–Joffe (ZJ), Peng–Robinson (PR), and perturbed-chain statistical associating fluid theory (PC-SAFT) were used comparatively with the ML models to estimate N2 solubility in normal alkanes. Results revealed that the CatBoost model is the most precise model in this work with a root mean square error of 0.0147 and coefficient of determination of 0.9943. ZJ EOS also provided the best estimates for the N2 solubility in normal alkanes among the EOSs. Lastly, the results of relevancy factor analysis indicated that pressure has the greatest influence on N2 solubility in normal alkanes and the N2 solubility increases with increasing the molecular weight of normal alkanes.


Agronomy ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 1550
Author(s):  
Mohamed Zakaria Gouda ◽  
El Mehdi Nagihi ◽  
Lotfi Khiari ◽  
Jacques Gallichand ◽  
Mahmoud Ismail

Soil texture is a key soil property influencing many agronomic practices including fertilization and liming. Therefore, an accurate estimation of soil texture is essential for adopting sustainable soil management practices. In this study, we used different machine learning algorithms trained on vis–NIR spectra from existing soil spectral libraries (ICRAF and LUCAS) to predict soil textural fractions (sand–silt–clay %). In addition, we predicted the soil textural groups (G1: Fine, G2: Medium, and G3: Coarse) using routine chemical characteristics as auxiliary. With the ICRAF dataset, multilayer perceptron resulted in good predictions for sand and clay (R2 = 0.78 and 0.85, respectively) and categorical boosting outperformed the other algorithms (random forest, extreme gradient boosting, linear regression) for silt prediction (R2 = 0.81). For the LUCAS dataset, categorical boosting consistently showed a high performance for sand, silt, and clay predictions (R2 = 0.79, 0.76, and 0.85, respectively). Furthermore, the soil texture groups (G1, G2, and G3) were classified using the light gradient boosted machine algorithm with a high accuracy (83% and 84% for ICRAF and LUCAS, respectively). These results, using spectral data, are very promising for rapid diagnosis of soil texture and group in order to adjust agricultural practices.


Author(s):  
R. Madhuri ◽  
S. Sistla ◽  
K. Srinivasa Raju

Abstract Assessing floods and their likely impact in climate change scenarios will enable the facilitation of sustainable management strategies. In this study, five machine learning (ML) algorithms, namely (i) Logistic Regression, (ii) Support Vector Machine, (iii) K-nearest neighbor, (iv) Adaptive Boosting (AdaBoost) and (v) Extreme Gradient Boosting (XGBoost), were tested for Greater Hyderabad Municipal Corporation (GHMC), India, to evaluate their clustering abilities to classify locations (flooded or non-flooded) for climate change scenarios. A geo-spatial database, with eight flood influencing factors, namely, rainfall, elevation, slope, distance from nearest stream, evapotranspiration, land surface temperature, normalised difference vegetation index and curve number, was developed for 2000, 2006 and 2016. XGBoost performed the best, with the highest mean area under curve score of 0.83. Hence, XGBoost was adopted to simulate the future flood locations corresponding to probable highest rainfall events under four Representative Concentration Pathways (RCPs), namely, 2.6, 4.5, 6.0 and 8.5 along with other flood influencing factors for 2040, 2056, 2050 and 2064, respectively. The resulting ranges of flood risk probabilities are predicted as 39–77%, 16–39%, 42–63% and 39–77% for the respective years.


Sign in / Sign up

Export Citation Format

Share Document