scholarly journals Predicting Asteroid Types: Importance of Individual and Combined Features

Author(s):  
Hanna Klimczak ◽  
Wojciech Kotłowski ◽  
Dagmara Oszkiewicz ◽  
Francesca DeMeo ◽  
Agnieszka Kryszczyńska ◽  
...  

Asteroid taxonomies provide a link to surface composition and mineralogy of those objects, although that connection is not fully unique. Currently, one of the most commonly used asteroid taxonomies is that of Bus-DeMeo. The spectral range covering 0.45–2.45 μm is used to assign a taxonomic class in that scheme. Such observations are only available for a few hundreds of asteroids (out of over one million). On the other hand, a growing amount of space and ground-based surveys delivers multi-filter photometry, which is often used in predicting asteroid types. Those surveys are typically dedicated to studying other astronomical objects, and thus not optimized for asteroid taxonomic classifications. The goal of this study was to quantify the importance and performance of different asteroid spectral features, parameterizations, and methods in predicting the asteroid types. Furthermore, we aimed to identify the key spectral features that can be used to optimize future surveys toward asteroid characterization. Those broad surveys typically are restricted to a few bands; therefore, selecting those that best link them to asteroid taxonomy is crucial in light of maximizing the science output for solar system studies. First, we verified that with the increased number of asteroid spectra, the Bus–DeMeo procedure to create taxonomy still produces the same overall scheme. Second, we confirmed that machine learning methods such as naive Bayes, support vector machine (SVM), gradient boosting, and multilayer networks can reproduce that taxonomic classification at a high rate of over 81% balanced accuracy for types and 93% for complexes. We found that multilayer perceptron with three layers of 32 neurons and stochastic gradient descent solver, batch size of 32, and adaptive learning performed the best in the classification task. Furthermore, the top five features (spectral slope and reflectance at 1.05, 0.9, 0.65, and 1.1 μm) are enough to obtain a balanced accuracy of 93% for the prediction of complexes and six features (spectral slope and reflectance at 1.4, 1.05, 0.9, 0.95, and 0.65 μm) to obtain 81% balanced accuracy for taxonomic types. Thus, to optimize future surveys toward asteroid classification, we recommend using filters that cover those features.

2021 ◽  
Author(s):  
ANKIT GHOSH ◽  
ALOK KOLE

<p>Smart grid is an essential concept in the transformation of the electricity sector into an intelligent digitalized energy network that can deliver optimal energy from the source to the consumers. Smart grids being self-sufficient systems are constructed through the integration of information, telecommunication, and advanced power technologies with the existing electricity systems. Artificial Intelligence (AI) is an important technology driver in smart grids. The application of AI techniques in smart grid is becoming more apparent because the traditional modelling optimization and control techniques have their own limitations. Machine Learning (ML) being a sub-set of AI enables intelligent decision-making and response to sudden changes in the customer energy demands, unexpected disruption of power supply, sudden variations in renewable energy output or any other catastrophic events in a smart grid. This paper presents the comparison among some of the state-of-the-art ML algorithms for predicting smart grid stability. The dataset that has been selected contains results from simulations of smart grid stability. Enhanced ML algorithms such as Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), Naïve Bayes (NB), Decision Tree (DT), Random Forest (RF), Stochastic Gradient Descent (SGD) classifier, XGBoost and Gradient Boosting classifiers have been implemented to forecast smart grid stability. A comparative analysis among the different ML models has been performed based on the following evaluation metrics such as accuracy, precision, recall, F1-score, AUC-ROC, and AUC-PR curves. The test results that have been obtained have been quite promising with the XGBoost classifier outperforming all the other models with an accuracy of 97.5%, recall of 98.4%, precision of 97.6%, F1-score of 97.9%, AUC-ROC of 99.8% and AUC-PR of 99.9%. </p>


Author(s):  
Pawar A B ◽  
Jawale M A ◽  
Kyatanavar D N

Usages of Natural Language Processing techniques in the field of detection of fake news is analyzed in this research paper. Fake news are misleading concepts spread by invalid resources can provide damages to human-life, society. To carry out this analysis work, dataset obtained from web resource OpenSources.co is used which is mainly part of Signal Media. The document frequency terms as TF-IDF of bi-grams used in correlation with PCFG (Probabilistic Context Free Grammar) on a set of 11,000 documents extracted as news articles. This set tested on classification algorithms namely SVM (Support Vector Machines), Stochastic Gradient Descent, Bounded Decision Trees, Gradient Boosting algorithm with Random Forests. In experimental analysis, found that combination of Stochastic Gradient Descent with TF-IDF of bi-grams gives an accuracy of 77.2% in detecting fake contents, which observes with PCFGs having slight recalling defects


2021 ◽  
Author(s):  
ANKIT GHOSH ◽  
ALOK KOLE

<p>Smart grid is an essential concept in the transformation of the electricity sector into an intelligent digitalized energy network that can deliver optimal energy from the source to the consumers. Smart grids being self-sufficient systems are constructed through the integration of information, telecommunication, and advanced power technologies with the existing electricity systems. Artificial Intelligence (AI) is an important technology driver in smart grids. The application of AI techniques in smart grid is becoming more apparent because the traditional modelling optimization and control techniques have their own limitations. Machine Learning (ML) being a sub-set of AI enables intelligent decision-making and response to sudden changes in the customer energy demands, unexpected disruption of power supply, sudden variations in renewable energy output or any other catastrophic events in a smart grid. This paper presents the comparison among some of the state-of-the-art ML algorithms for predicting smart grid stability. The dataset that has been selected contains results from simulations of smart grid stability. Enhanced ML algorithms such as Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), Naïve Bayes (NB), Decision Tree (DT), Random Forest (RF), Stochastic Gradient Descent (SGD) classifier, XGBoost and Gradient Boosting classifiers have been implemented to forecast smart grid stability. A comparative analysis among the different ML models has been performed based on the following evaluation metrics such as accuracy, precision, recall, F1-score, AUC-ROC, and AUC-PR curves. The test results that have been obtained have been quite promising with the XGBoost classifier outperforming all the other models with an accuracy of 97.5%, recall of 98.4%, precision of 97.6%, F1-score of 97.9%, AUC-ROC of 99.8% and AUC-PR of 99.9%. </p>


Author(s):  
R. Ilehag ◽  
J. Leitloff ◽  
M. Weinmann ◽  
A. Schenk

Abstract. Classification of urban materials using remote sensing data, in particular hyperspectral data, is common practice. Spectral libraries can be utilized to train a classifier since they provide spectral features about selected urban materials. However, urban materials can have similar spectral characteristic features due to high inter-class correlation which can lead to misclassification. Spectral libraries rarely provide imagery of their samples, which disables the possibility of classifying urban materials with additional textural information. Thus, this paper conducts material classification comparing the benefits of using close-range acquired spectral and textural features. The spectral features consist of either the original spectra, a PCA-based encoding or the compressed spectral representation of the original spectra retrieved using a deep autoencoder. The textural features are generated using a deep denoising convolutional autoencoder. The spectral and textural features are gathered from the recently published spectral library KLUM. Three classifiers are used, the two well-established Random Forest and Support Vector Machine classifiers in addition to a Histogram-based Gradient Boosting Classification Tree. The achieved overall accuracy was within the range of 70–80% with a standard deviation between 2–10% across all classification approaches. This indicates that the amount of samples still is insufficient for some of the material classes for this classification task. Nonetheless, the classification results indicate that the spectral features are more important for assigning material labels than the textural features.


2020 ◽  
Vol 12 (11) ◽  
pp. 187 ◽  
Author(s):  
Amgad Muneer ◽  
Suliman Mohamed Fati

The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00).


2020 ◽  
Vol 8 (6) ◽  
pp. 3226-3232

Predicting the probability of hospital readmission is one of the most vital issues and is considered to be an important research area in the healthcare sector. For curing any of the diseases that might arise, there shall be some essential resources such as medical staff, expertise, beds and rooms. This secures getting excellent medical service. For example, heart failure (HF) or diabetes is a syndrome that could reduce the living quality of patients and has a serious influence on systems of healthcare. The previously mentioned diseases can result in high rate of readmission and hence high rate of costs as well. In this case, algorithms of machine learning are utilized to curb readmissions levels and improve the life quality of patients. Unluckily, a comparatively few numbers of researches in the literature endeavored to address this issue while a large proportion of researches were interested in predicting the probability of detecting diseases. Despite there is a plainly visible shortage on this topic, this paper seeks to spot most of the studies related to predict the probability of hospital readmission by the usage of machine learning techniques such as such as Logistic Regression (LR), Support Vector Machine (SVM), Artificial Neural Networks (ANNs), Linear Discriminant Analysis (LDA), Bayes algorithm, Random Forest (RF), Decision Trees (DTs), AdaBoost and Gradient Boosting (GB). Specifically, we explore the different techniques used in a medical area under the machine learning research field. In addition, we define four features that are used as criteria for an effective comparison among the employed techniques. These features include goal, data size, method, and performance. Furthermore, some recommendations are drawn from the comparison which is related to the selection of the best techniques in the medical field. Based on the outcomes of this research, it was found out that (bagging and DT) is the best technique to predict diabetes, whereas SVM is the best technique when it comes to prediction the breast cancer, and hospital readmission.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Min Kim ◽  
Younghyun Kang ◽  
Seng Chan You ◽  
Hyung-Deuk Park ◽  
Sang-Soo Lee ◽  
...  

AbstractTo assess the utility of machine learning (ML) algorithms in predicting clinically relevant atrial high-rate episodes (AHREs), which can be recorded by a pacemaker. We aimed to develop ML-based models to predict clinically relevant AHREs based on the clinical parameters of patients with implanted pacemakers in comparison to logistic regression (LR). We included 721 patients without known atrial fibrillation or atrial flutter from a prospective multicenter (11 tertiary hospitals) registry comprising all geographical regions of Korea from September 2017 to July 2020. Predictive models of clinically relevant AHREs were developed using the random forest (RF) algorithm, support vector machine (SVM) algorithm, and extreme gradient boosting (XGB) algorithm. Model prediction training was conducted by seven hospitals, and model performance was evaluated using data from four hospitals. During a median follow-up of 18 months, clinically relevant AHREs were noted in 104 patients (14.4%). The three ML-based models improved the discrimination of the AHREs (area under the receiver operating characteristic curve: RF: 0.742, SVM: 0.675, and XGB: 0.745 vs. LR: 0.669). The XGB model had a greater resolution in the Brier score (RF: 0.008, SVM: 0.008, and XGB: 0.021 vs. LR: 0.013) than the other models. The use of the ML-based models in patient classification was associated with improved prediction of clinically relevant AHREs after pacemaker implantation.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
T Kuznetsova ◽  
N Cauwenberghs ◽  
F Haddad ◽  
A Alonso-Betanzos ◽  
C Vens

Abstract Background Current heart failure guidelines emphasize the importance of timely detection of subclinical left ventricular (LV) remodelling and dysfunction for more precise risk stratification of asymptomatic subjects. Both LV diastolic dysfunction (LVDD) and LV hypertrophy (LVH) as assessed by echocardiography are known independent prognostic markers of future cardiovascular events in the community. However, selective screening strategies of individuals at risk who would benefit most from in-depth cardiac phenotyping are lacking. Purpose We assess the utility of several Machine Learning (ML) classifiers built on clinical and biochemical features for detecting subclinical LV abnormalities. Methods We included 1407 participants (mean age, 51 years, 51% women) randomly recruited from the general population. We used echocardiographic parameters reflecting LV diastolic function and structure to define LV abnormalities (LVDD, n=239; LVH, n=135). After that four supervised ML algorithms (Random Forest (RF), Gradient Boosting (GD), Stochastic Gradient Descent (SGD) and Support Vector Machines (SV)) were built based on routine clinical, hemodynamic and laboratory data (features; n=61) to categorize LVDD and LVH (two prediction tasks). We applied a 10-fold stratified cross-validation set-up. Results ML classifiers exhibited a high area under the ROC (AUC) for predicting LVDD with values between 88.5% and 93.1% (Figure, left panel). Age, BMI, different components of blood pressure, antihypertensive treatment, routine biomarkers such as serum electrolytes, creatinine, blood sugar, leptin, uric acid, lipid profile, as well as blood cell counts were the top selected features for predicting LVDD. Prediction AUC of ML algorithms for detection of LVH was somewhat lower than for LVDD and ranged from 72.5% to 78.7% (Figure, right panel). The top selected features for LVH classifier were similar to those of LVDD, but also included social class, serum gamma-glutamyl transferase, fasting insulin, plasma renin activity and cortisol. ROC curves (sensitivity-1-specificity) Conclusions ML algorithms combining routinely measured clinical and laboratory data have shown high accuracy of LVDD and LVH prediction. These ML classifiers might be useful to preselect individuals at risk for further in depth echocardiographic examination, monitoring and implementation of preventive strategies in order to delay transition to disease symptoms.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2647
Author(s):  
Jianming Zhang ◽  
Junxiang Lian ◽  
Zhaoxiang Yi ◽  
Shuwang Yang ◽  
Ying Shan

In order to detect gravitational waves and characterise their sources, three laser links were constructed with three identical satellites, such that interferometric measurements for scientific experiments can be carried out. The attitude of the spacecraft in the initial phase of laser link docking is provided by a star sensor (SSR) onboard the satellite. If the attitude measurement capacity of the SSR is improved, the efficiency of establishing laser linking can be elevated. An important technology for satellite attitude determination using SSRs is star identification. At present, a guide star catalogue (GSC) is the only basis for realising this. Hence, a method for improving the GSC, in terms of storage, completeness, and uniformity, is studied in this paper. First, the relationship between star numbers in the field of view (FOV) of a staring SSR, together with the noise equivalent angle (NEA) of the SSR—which determines the accuracy of the SSR—is discussed. Then, according to the relationship between the number of stars (NOS) in the FOV, the brightness of the stars, and the size of the FOV, two constraints are used to select stars in the SAO GSC. Finally, the performance of the GSCs generated by Decision Trees (DC), K-Nearest Neighbours (KNN), Support Vector Machine (SVM), the Magnitude Filter Method (MFM), Gradient Boosting (GB), a Neural Network (NN), Random Forest (RF), and Stochastic Gradient Descent (SGD) is assessed. The results show that the GSC generated by the KNN method is better than those of other methods, in terms of storage, uniformity, and completeness. The KNN-generated GSC is suitable for high-accuracy spacecraft applications, such as gravitational detection satellites.


Sign in / Sign up

Export Citation Format

Share Document