Solar Flare Prediction Using Two-tier Ensemble with Deep Learning and Gradient Boosting Machine

Author(s):  
Chau Pham ◽  
Vung Pham ◽  
Tommy Dang
2022 ◽  
Vol 15 (1) ◽  
pp. 1-20
Author(s):  
Ravinder Kumar ◽  
Lokesh Kumar Shrivastav

Designing a system for analytics of high-frequency data (Big data) is a very challenging and crucial task in data science. Big data analytics involves the development of an efficient machine learning algorithm and big data processing techniques or frameworks. Today, the development of the data processing system is in high demand for processing high-frequency data in a very efficient manner. This paper proposes the processing and analytics of stochastic high-frequency stock market data using a modified version of suitable Gradient Boosting Machine (GBM). The experimental results obtained are compared with deep learning and Auto-Regressive Integrated Moving Average (ARIMA) methods. The results obtained using modified GBM achieves the highest accuracy (R2 = 0.98) and minimum error (RMSE = 0.85) as compared to the other two approaches.


Author(s):  
Sandeep Kumar ◽  
K. K. Singh

Abstract Rain garden are effective in reducing storm water runoff, whose efficiency depends upon several parameters such as soil type, vegetation and metrological factors. Evaluation of rain gardens has been done by various researchers. However, knowledge for sound design of rain gardens is still very limited, particularly the accurate modeling of infiltration rate and how much it differs from infiltration of natural ground surface. The present study uses experimentally observed infiltration rate of rain gardens with different types of vegetation (grass, candytuft, marigold and daisy with different plant densities) and flow conditions. After that, modeling has been done by the popular infiltration model i.e. Philip's model (which is valid for natural ground surface) and soft computing tools viz. Gradient Boosting Machine (GBM) and Deep Learning (DL). Results suggest a promising performance (in terms of CC, RMSE, MAE, MSE and NSE) by GBM and DL in comparison to the relation proposed by Philip's model (1957). Most of the values predicted by both GBM and DL are within scatter limits of ±5%, whereas the values by Philips model are within the range of ±25% error lines and even outside. GBM performs better than DL as the values of the correlation coefficients and Nash-Sutcliffe model efficiency (NSE) coefficient are the highest and the root mean square error is the lowest. The results of the study will be useful in selection of plant type and their density of the rain garden in the urban area.


2020 ◽  
Vol 22 (6) ◽  
pp. 1603-1619
Author(s):  
Mohammad Ali Ghorbani ◽  
Farzin Salmasi ◽  
Mandeep Kaur Saggi ◽  
Amandeep Singh Bhatia ◽  
Ercan Kahya ◽  
...  

Abstract Gates in dams and irrigation canals have been used for the purpose of controlling discharge or water surface regulation. To compute the discharge under a gate, discharge coefficient (Cd) should be first determined precisely. From a novel point of view, this study investigates the effect of sill shape under the vertical sluice gate on Cd using four artificial intelligence methods, which are used to estimate Cd, (i) random forest (RF), (ii) deep learning (DL), (iii) gradient boosting machine (GBM), and (iv) generalized linear model (GLM). A sluice gate along with twelve different forms of sills was fabricated and tested in the University of Tabriz, Iran. Different flow rates were considered in the hydraulic laboratory with four gate openings. As a result, a total of 180 runs could be tested. The results showed that the installation of sill under the vertical gate has a positive effect on flow discharge. Sill shapes can be characterized by their hydraulic radius (Rs). Sensitivity analysis among the dimensionless parameters proved that Rs/G (the ratio of the hydraulic radius of the sills with respect to the gate opening) has a significant role in the determination of Cd. A semi-circular sill shape has a more positive effect on the increase of Cd than the other shapes.


2022 ◽  
Vol 15 (1) ◽  
pp. 1-19
Author(s):  
Ravinder Kumar ◽  
Lokesh Kumar Shrivastav

Stochastic time series analysis of high-frequency stock market data is a very challenging task for the analysts due to the lack availability of efficient tool and techniques for big data analytics. This has opened the door of opportunities for the developer and researcher to develop intelligent and machine learning based tools and techniques for data analytics. This paper proposed an ensemble for stock market data prediction using three most prominent machine learning based techniques. The stock market dataset with raw data size of 39364 KB with all attributes and processed data size of 11826 KB having 872435 instances. The proposed work implements an ensemble model comprises of Deep Learning, Gradient Boosting Machine (GBM) and distributed Random Forest techniques of data analytics. The performance results of the ensemble model are compared with each of the individual methods i.e. deep learning, Gradient Boosting Machine (GBM) and Random Forest. The ensemble model performs better and achieves the highest accuracy of 0.99 and lowest error (RMSE) of 0.1.


2021 ◽  
pp. bjophthalmol-2020-318609
Author(s):  
Wei Wang ◽  
Xiaotong Han ◽  
Jiaqing Zhang ◽  
Xianwen Shang ◽  
Jason Ha ◽  
...  

Background/aimsTo investigate the feasibility and accuracy of using machine learning (ML) techniques on self-reported questionnaire data to predict the 10-year risk of cataract surgery, and to identify meaningful predictors of cataract surgery in middle-aged and older Australians.MethodsBaseline information regarding demographic, socioeconomic, medical history and family history, lifestyle, dietary and self-rated health status were collected as risk factors. Cataract surgery events were confirmed by the Medicare Benefits Schedule Claims dataset. Three ML algorithms (random forests [RF], gradient boosting machine and deep learning) and one traditional regression algorithm (logistic model) were compared on the accuracy of their predictions for the risk of cataract surgery. The performance was assessed using 10-fold cross-validation. The main outcome measures were areas under the receiver operating characteristic curves (AUCs).ResultsIn total, 207 573 participants, aged 45 years and above without a history of cataract surgery at baseline, were recruited from the 45 and Up Study. The performance of gradient boosting machine (AUC 0.790, 95% CI 0.785 to 0.795), RF (AUC 0.785, 95% CI 0.780 to 0.790) and deep learning (AUC 0.781, 95% CI 0.775 to 61 0.786) were robust and outperformed the traditional logistic regression method (AUC 0.767, 95% CI 0.762 to 0.773, all p<0.05). Age, self-rated eye vision and health insurance were consistently identified as important predictors in all models.ConclusionsThe study demonstrated that ML modelling was able to reasonably accurately predict the 10-year risk of cataract surgery based on questionnaire data alone and was marginally superior to the conventional logistic model.


Author(s):  
Ahmet Haşim Yurttakal ◽  
Hasan Erbay ◽  
Türkan İkizceli ◽  
Seyhan Karaçavuş ◽  
Cenker Biçer

Breast cancer is the most common cancer that progresses from cells in the breast tissue among women. Early-stage detection could reduce death rates significantly, and the detection-stage determines the treatment process. Mammography is utilized to discover breast cancer at an early stage prior to any physical sign. However, mammography might return false-negative, in which case, if it is suspected that lesions might have cancer of chance greater than two percent, a biopsy is recommended. About 30 percent of biopsies result in malignancy that means the rate of unnecessary biopsies is high. So to reduce unnecessary biopsies, recently, due to its excellent capability in soft tissue imaging, Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE-MRI) has been utilized to detect breast cancer. Nowadays, DCE-MRI is a highly recommended method not only to identify breast cancer but also to monitor its development, and to interpret tumorous regions. However, in addition to being a time-consuming process, the accuracy depends on radiologists’ experience. Radiomic data, on the other hand, are used in medical imaging and have the potential to extract disease characteristics that can not be seen by the naked eye. Radiomics are hard-coded features and provide crucial information about the disease where it is imaged. Conversely, deep learning methods like convolutional neural networks(CNNs) learn features automatically from the dataset. Especially in medical imaging, CNNs’ performance is better than compared to hard-coded features-based methods. However, combining the power of these two types of features increases accuracy significantly, which is especially critical in medicine. Herein, a stacked ensemble of gradient boosting and deep learning models were developed to classify breast tumors using DCE-MRI images. The model makes use of radiomics acquired from pixel information in breast DCE-MRI images. Prior to train the model, radiomics had been applied to the factor analysis to refine the feature set and eliminate unuseful features. The performance metrics, as well as the comparisons to some well-known machine learning methods, state the ensemble model outperforms its counterparts. The ensembled model’s accuracy is 94.87% and its AUC value is 0.9728. The recall and precision are 1.0 and 0.9130, respectively, whereas F1-score is 0.9545.


Sign in / Sign up

Export Citation Format

Share Document