Regression-Guided Clustering: A Semisupervised Method for Circulation-to-Environment Synoptic Classification

AbstractRegression-guided clustering is introduced as a means of constructing circulation-to-environment synoptic climatological classifications. Rather than applying an unsupervised clustering algorithm to synoptic-scale atmospheric circulation data, one instead augments the atmospheric circulation dataset with predictions from a supervised regression model linking circulation to environment. The combined dataset is then entered into the clustering algorithm. The level of influence of the environmental dataset can be controlled by a simple weighting factor. The method is generic in that the choice of regression model and clustering algorithm is left to the user. Examples are given using standard multivariate linear regression models and the k-means clustering algorithm, both established methods in synoptic climatology. Results for southern British Columbia, Canada, indicate that model performance can be made to range between that of a fully unsupervised algorithm and a fully supervised algorithm.

Download Full-text

A Case Study of a Ross Ice Shelf Airstream Event: A New Perspective*

Monthly Weather Review ◽

10.1175/2009mwr2880.1 ◽

2009 ◽

Vol 137 (11) ◽

pp. 4030-4046 ◽

Cited By ~ 31

Author(s):

Daniel F. Steinhoff ◽

Saptarshi Chaudhuri ◽

David H. Bromwich

Keyword(s):

Large Scale ◽

Model Performance ◽

Thermal Infrared ◽

Integrated Approach ◽

Cloud Formation ◽

Synoptic Scale ◽

Ice Shelf ◽

Low Level ◽

Ross Ice Shelf

Abstract A case study illustrating cloud processes and other features associated with the Ross Ice Shelf airstream (RAS), in Antarctica, is presented. The RAS is a semipermanent low-level wind regime primarily over the western Ross Ice Shelf, linked to the midlatitude circulation and formed from terrain-induced and large-scale forcing effects. An integrated approach utilizes Moderate Resolution Imaging Spectroradiometer (MODIS) satellite imagery, automatic weather station (AWS) data, and Antarctic Mesoscale Prediction System (AMPS) forecast output to study the synoptic-scale and mesoscale phenomena involved in cloud formation over the Ross Ice Shelf during a RAS event. A synoptic-scale cyclone offshore of Marie Byrd Land draws moisture across West Antarctica to the southern base of the Ross Ice Shelf. Vertical lifting associated with flow around the Queen Maud Mountains leads to cloud formation that extends across the Ross Ice Shelf to the north. The low-level cloud has a warm signature in thermal infrared imagery, resembling a surface feature of turbulent katabatic flow typically ascribed to the RAS. Strategically placed AWS sites allow assessment of model performance within and outside of the RAS signature. AMPS provides realistic simulation of conditions aloft but experiences problems at low levels due to issues with the model PBL physics. Key meteorological features of this case study, within the context of previous studies on longer time scales, are inferred to be common occurrences. The assumption that warm thermal infrared signatures are surface features is found to be too restrictive.

Download Full-text

Assessment of Discharge and Sediment Flows in a River Through a Combined Hydraulic and Hydrologic Routing Technique

10.21203/rs.3.rs-459084/v2 ◽

2021 ◽

Author(s):

Abebe Tadesse Bulti

Keyword(s):

Regression Model ◽

River Basin ◽

Swat Model ◽

Model Performance ◽

Simulation Models ◽

Flood Routing ◽

Recent Approach ◽

Routing Method ◽

Routing Techniques ◽

Routing Methods

Abstract An advancement on flood routing techniques is important for a good perdiction and forecast of the flow discharge in a river basins. Hydraulic and hydrologic routing techniques are widely applied in most simulation models separately. A combined hydrologic and hydraulic routing method is a recent approach that used to improve the modeling effort in hydrological studies. The main drawback of hydrologic routing methods was inaccuracy on downstream areas of the river basin, where the effect of hydraulic structures and the river dynamics processes are dominant. The hydraulic routing approaches are relatively good on a downstream reaches of a river. This research was done on the Awash River basin, at the upstream areas of a Koka dam. A combined hydrologic and hydraulic approach was used to assess the discharge and sediment flow in the river basin. The hydrologic routing method was applied at an upstream part of a river basin through a SWAT model. HEC-RAS model was applied at the middle and downstream areas of the study basin based on hydraulic routing principle. A combined routing method can improve the result from a simulation process and increases an accuracy on a prediction of the peak flow. It can simulate a flow discharges for both short and long-term duration, with good model performance indicators. Besides, sediment modeling was done by comparing a regression model, SWAT model, and combination of HEC-RAS and SWAT model. The result from the sediment modeling indicates that the regression model and combined model show good agreement in predicting the suspended sediment in the river basin. The integrated application of such different type of models can be one of the option for sediment modeling.

Download Full-text

Machine Learning for Mortality Prediction in Pediatric Myocarditis

Frontiers in Pediatrics ◽

10.3389/fped.2021.644922 ◽

2021 ◽

Vol 9 ◽

Author(s):

Fu-Sheng Chou ◽

Laxmi V. Ghimire

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Regression Model ◽

Current Knowledge ◽

External Validation ◽

Model Development ◽

Kidney Injury ◽

High Specificity ◽

Mortality Prediction ◽

Linear Regression Models

Background: Pediatric myocarditis is a rare disease. The etiologies are multiple. Mortality associated with the disease is 5–8%. Prognostic factors were identified with the use of national hospitalization databases. Applying these identified risk factors for mortality prediction has not been reported.Methods: We used the Kids' Inpatient Database for this project. We manually curated fourteen variables as predictors of mortality based on the current knowledge of the disease, and compared performance of mortality prediction between linear regression models and a machine learning (ML) model. For ML, the random forest algorithm was chosen because of the categorical nature of the variables. Based on variable importance scores, a reduced model was also developed for comparison.Results: We identified 4,144 patients from the database for randomization into the primary (for model development) and testing (for external validation) datasets. We found that the conventional logistic regression model had low sensitivity (~50%) despite high specificity (>95%) or overall accuracy. On the other hand, the ML model struck a good balance between sensitivity (89.9%) and specificity (85.8%). The reduced ML model with top five variables (mechanical ventilation, cardiac arrest, ECMO, acute kidney injury, ventricular fibrillation) were sufficient to approximate the prediction performance of the full model.Conclusions: The ML algorithm performs superiorly when compared to the linear regression model for mortality prediction in pediatric myocarditis in this retrospective dataset. Prospective studies are warranted to further validate the applicability of our model in clinical settings.

Download Full-text

Improving Model Performance by Including Post-ICU Frailty in a Cox Proportional Hazard Regression Model With Time-Varying Covariates

CHEST Journal ◽

10.1016/j.chest.2021.06.072 ◽

2021 ◽

Vol 160 (6) ◽

pp. e678-e679

Author(s):

Xu Ma

Keyword(s):

Regression Model ◽

Model Performance ◽

Time Varying ◽

Proportional Hazard ◽

Hazard Regression ◽

Cox Proportional Hazard ◽

Cox Proportional Hazard Regression ◽

Time Varying Covariates

Download Full-text

Estimation in Dynamic Linear Regression Models with Infinite Variance Errors

Econometric Theory ◽

10.1017/s0266466600007982 ◽

1993 ◽

Vol 9 (4) ◽

pp. 570-588 ◽

Cited By ~ 28

Author(s):

Keith Knight

Keyword(s):

Asymptotic Behavior ◽

Linear Regression ◽

Regression Model ◽

Linear Regression Model ◽

Regression Models ◽

Infinite Variance ◽

Linear Regression Models ◽

Asymptotically Normal ◽

Second Moments ◽

True Values

This paper considers the asymptotic behavior of M-estimates in a dynamic linear regression model where the errors have infinite second moments but the exogenous regressors satisfy the standard assumptions. It is shown that under certain conditions, the estimates of the parameters corresponding to the exogenous regressors are asymptotically normal and converge to the true values at the standard n−½ rate.

Download Full-text

A Betweenness Centrality Guided Clustering Algorithm and Its Applications to Cancer Diagnosis

Mining Intelligence and Knowledge Exploration - Lecture Notes in Computer Science ◽

10.1007/978-3-319-71928-3_4 ◽

2017 ◽

pp. 35-42 ◽

Cited By ~ 2

Author(s):

R. Jothi

Keyword(s):

Cancer Diagnosis ◽

Betweenness Centrality ◽

Clustering Algorithm ◽

Guided Clustering

Download Full-text

Forecasting Of Covid-19 Cases Using Machine Learning Approach

Current Respiratory Medicine Reviews ◽

10.2174/1573398x17666210129131009 ◽

2021 ◽

Vol 17 ◽

Author(s):

Sachin Kumar ◽

Karan Veer

Keyword(s):

Machine Learning ◽

Regression Model ◽

Model Performance ◽

Real Data ◽

Absolute Error ◽

Viral Disease ◽

Support Vector ◽

Family Welfare ◽

Accuracy Score ◽

Learning Approaches

Aims: The objective of this research is to predict the covid-19 cases in India based on the machine learning approaches. Background: Covid-19, a respiratory disease caused by one of the coronavirus family members, has led to a pandemic situation worldwide in 2020. This virus was detected firstly in Wuhan city of China in December 2019. This viral disease has taken less than three months to spread across the globe. Objective: In this paper, we proposed a regression model based on the Support vector machine (SVM) to forecast the number of deaths, the number of recovered cases, and total confirmed cases for the next 30 days. Method: For prediction, the data is collected from Github and the ministry of India's health and family welfare from March 14, 2020, to December 3, 2020. The model has been designed in Python 3.6 in Anaconda to forecast the forecasting value of corona trends until September 21, 2020. The proposed methodology is based on the prediction of values using SVM based regression model with polynomial, linear, rbf kernel. The dataset has been divided into train and test datasets with 40% and 60% test size and verified with real data. The model performance parameters are evaluated as a mean square error, mean absolute error, and percentage accuracy. Results and Conclusion: The results show that the polynomial model has obtained 95 % above accuracy score, linear scored above 90%, and rbf scored above 85% in predicting cumulative death, conformed cases, and recovered cases.

Download Full-text

Identification of Circulating Fluidized Bed Boiler Bed Temperature Based on Hyper-Plane-Shaped Fuzzy C-Regression Model

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026820500297 ◽

2020 ◽

Vol 19 (04) ◽

pp. 2050029

Author(s):

Jianzhong Shi

Keyword(s):

Regression Model ◽

Fluidized Bed ◽

Membership Function ◽

Clustering Algorithm ◽

Circulating Fluidized Bed ◽

Fuzzy Model ◽

Identification Algorithm ◽

Identification Process ◽

Bed Temperature ◽

Hyper Plane

Bed temperature in dense-phase zone is the key parameter of circulating fluidized bed (CFB) boiler for stable combustion and economic operation. It is difficult to establish an accurate bed temperature model as the complexity of circulating fluidized bed combustion system. T-S fuzzy model was widely applied in the system identification for it can approximate complex nonlinear system with high accuracy. Fuzzy c-regression model (FCRM) clustering based on hyper-plane-shaped distance has the advantages in describing T-S fuzzy model, and Gaussian function was adapted in antecedent membership function of T-S fuzzy model. However, Gaussian fuzzy membership function was more suitable for clustering algorithm using point to point distance, such as fuzzy c-means (FCM). In this paper, a hyper-plane-shaped FCRM clustering algorithm for T-S fuzzy model identification algorithm is proposed. The antecedent membership function of proposed identification algorithm is defined by a hyper-plane-shaped membership function and an improved fuzzy partition method is applied. To illustrate the efficiency of the proposed identification algorithm, the algorithm is applied in four nonlinear systems which shows higher identification accuracy and simplified identification process. At last, the algorithm is used in a circulating fluidized bed boiler bed temperature identification process, and gets better identification result.

Download Full-text

Mechanisms of variability in decadal sea-level trends in the Baltic Sea over the 20th century

Earth System Dynamics ◽

10.5194/esd-8-1031-2017 ◽

2017 ◽

Vol 8 (4) ◽

pp. 1031-1046 ◽

Cited By ~ 3

Author(s):

Sitar Karabil ◽

Eduardo Zorita ◽

Birgit Hünicke

Keyword(s):

Regression Model ◽

Baltic Sea ◽

Sea Level ◽

Atmospheric Circulation ◽

Data Sets ◽

Climatic Data ◽

The Baltic Sea ◽

Underlying Factor ◽

Coastal Sea ◽

The Baltic

Abstract. Coastal sea-level trends in the Baltic Sea display decadal-scale variations around a long-term centennial trend. In this study, we analyse the spatial and temporal characteristics of the decadal trend variations and investigate the links between coastal sea-level trends and atmospheric forcing on a decadal timescale. For this analysis, we use monthly means of sea-level and climatic data sets. The sea-level data set is composed of long tide gauge records and gridded sea surface height (SSH) reconstructions. Climatic data sets are composed of sea-level pressure, air temperature, precipitation, evaporation, and climatic variability indices. The analysis indicates that atmospheric forcing is a driving factor of decadal sea-level trends. However, its effect is geographically heterogeneous. This impact is large in the northern and eastern regions of the Baltic Sea. In the southern Baltic Sea area, the impacts of atmospheric circulation on decadal sea-level trends are smaller. To identify the influence of the large-scale factors other than the effect of atmospheric circulation in the same season on Baltic Sea sea-level trends, we filter out the direct signature of atmospheric circulation for each season separately on the Baltic Sea level through a multivariate linear regression model and analyse the residuals of this regression model. These residuals hint at a common underlying factor that coherently drives the decadal sea-level trends in the whole Baltic Sea. We found that this underlying effect is partly a consequence of decadal precipitation trends in the Baltic Sea basin in the previous season. The investigation of the relation between the AMO index and sea-level trends implies that this detected underlying factor is not connected to oceanic forcing driven from the North Atlantic region.

Download Full-text