scholarly journals Regularized Feature Selection in Categorical PLS for Multicollinear Data

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Tahir Mehmood

Article presents the algorithm which models the categorical multicollinear data by providing the balance in model accuracy on test data and number of selected features in the model. In all scientific fields, multicollinear data is being generated, where obviously some variables are noise and some are influential reference to response variable. Features and response appeared to be categorical in mathematical and statistical modeling of public health data. These datasets usually appeared to collinear, where partial least squares (PLS) is the potential method, which is not feature selection at its default level and deals with quantitative features. Recently, categorical PLS (Cat-PLS) is introduced. We have implemented the regularized feature selection in Cat-PLS where filter-based feature selection and categorical mean through Cramer’s V, Phi coefficient, Tschuprow’s T coefficient, Contingency Coefficient, and Yule’s Q and Yule’s Y are used. Monte carlo simulation with 100 runs indicates Cramer  V ∗ VIP is the better choice in terms of better model performance, number of feature selection, and interpretations for modeling the stillbirths, which is taken as the case study. The framework can be used in related areas to explore and model the related data structures.

2017 ◽  
Vol 12 (No. 1) ◽  
pp. 51-59 ◽  
Author(s):  
N. Dragičević ◽  
B. Karleuša ◽  
N. Ožanić

In recent decades, various methods for erosion intensity and sediment production assessment have been developed. The necessity for better model performance has led to the more frequent application of the method sensitivity and uncertainty assessments in order to decrease errors that arise from the model concept and its main assumptions. The analysis presented in this paper refers to the application of the Gavrilović method (Erosion Potential Method), an empirical and semi-quantitative method that can estimate the amount of sediment production and sediment transport as well as the erosion intensity and indicate the areas potentially threatened by erosion. The emphasis in this paper is given upon the method sensitivity analysis that has not previously been conducted for the Gavrilović method. The sensitivity analysis was conducted for fourteen different parameters included in the method, all in relation to different model outputs. Each parameter was perceived and discussed individually in relation to its effect upon the method outputs, and ranked into categories depending on their influence on one or more model outputs. The objective of the analysis was to explore the constraints of the Gavrilović method and the method response to changes deriving from the each individual parameter in an attempt to provide a better understanding of the method, the weight and the contribution of each parameter in the overall method. The parameters that could potentially be used in future research, for method modification and calibration in areas with different catchment characteristics (e.g. climate, geological, etc.) were identified. The most sensitive model parameters resulting from conducted sensitivity analysis for the Gavrilović method are also those considered to be significant in the scientific literature on erosion. The Gavrilović method sensitivity analysis has been done on a case study for the Dubracina catchment area, Croatia.


2020 ◽  
Vol 12 (6) ◽  
pp. 2208 ◽  
Author(s):  
Jamie E. Filer ◽  
Justin D. Delorit ◽  
Andrew J. Hoisington ◽  
Steven J. Schuldt

Remote communities such as rural villages, post-disaster housing camps, and military forward operating bases are often located in remote and hostile areas with limited or no access to established infrastructure grids. Operating these communities with conventional assets requires constant resupply, which yields a significant logistical burden, creates negative environmental impacts, and increases costs. For example, a 2000-member isolated village in northern Canada relying on diesel generators required 8.6 million USD of fuel per year and emitted 8500 tons of carbon dioxide. Remote community planners can mitigate these negative impacts by selecting sustainable technologies that minimize resource consumption and emissions. However, the alternatives often come at a higher procurement cost and mobilization requirement. To assist planners with this challenging task, this paper presents the development of a novel infrastructure sustainability assessment model capable of generating optimal tradeoffs between minimizing environmental impacts and minimizing life-cycle costs over the community’s anticipated lifespan. Model performance was evaluated using a case study of a hypothetical 500-person remote military base with 864 feasible infrastructure portfolios and 48 procedural portfolios. The case study results demonstrated the model’s novel capability to assist planners in identifying optimal combinations of infrastructure alternatives that minimize negative sustainability impacts, leading to remote communities that are more self-sufficient with reduced emissions and costs.


2021 ◽  
Author(s):  
Birgid Schömig-Markiefka ◽  
Alexey Pryalukhin ◽  
Wolfgang Hulla ◽  
Andrey Bychkov ◽  
Junya Fukuoka ◽  
...  

AbstractDigital pathology provides a possibility for computational analysis of histological slides and automatization of routine pathological tasks. Histological slides are very heterogeneous concerning staining, sections’ thickness, and artifacts arising during tissue processing, cutting, staining, and digitization. In this study, we digitally reproduce major types of artifacts. Using six datasets from four different institutions digitized by different scanner systems, we systematically explore artifacts’ influence on the accuracy of the pre-trained, validated, deep learning-based model for prostate cancer detection in histological slides. We provide evidence that any histological artifact dependent on severity can lead to a substantial loss in model performance. Strategies for the prevention of diagnostic model accuracy losses in the context of artifacts are warranted. Stress-testing of diagnostic models using synthetically generated artifacts might be an essential step during clinical validation of deep learning-based algorithms.


Water ◽  
2020 ◽  
Vol 13 (1) ◽  
pp. 37
Author(s):  
Tomás de Figueiredo ◽  
Ana Caroline Royer ◽  
Felícia Fonseca ◽  
Fabiana Costa de Araújo Schütz ◽  
Zulimar Hernández

The European Space Agency Climate Change Initiative Soil Moisture (ESA CCI SM) product provides soil moisture estimates from radar satellite data with a daily temporal resolution. Despite validation exercises with ground data that have been performed since the product’s launch, SM has not yet been consistently related to soil water storage, which is a key step for its application for prediction purposes. This study aimed to analyse the relationship between soil water storage (S), which was obtained from soil water balance computations with ground meteorological data, and soil moisture, which was obtained from radar data, as affected by soil water storage capacity (Smax). As a case study, a 14-year monthly series of soil water storage, produced via soil water balance computations using ground meteorological data from northeast Portugal and Smax from 25 mm to 150 mm, were matched with the corresponding monthly averaged SM product. Linear (I) and logistic (II) regression models relating S with SM were compared. Model performance (r2 in the 0.8–0.9 range) varied non-monotonically with Smax, with it being the highest at an Smax of 50 mm. The logistic model (II) performed better than the linear model (I) in the lower range of Smax. Improvements in model performance obtained with segregation of the data series in two subsets, representing soil water recharge and depletion phases throughout the year, outlined the hysteresis in the relationship between S and SM.


Author(s):  
Stefan Hahn ◽  
Jessica Meyer ◽  
Michael Roitzsch ◽  
Christiaan Delmaar ◽  
Wolfgang Koch ◽  
...  

Spray applications enable a uniform distribution of substances on surfaces in a highly efficient manner, and thus can be found at workplaces as well as in consumer environments. A systematic literature review on modelling exposure by spraying activities has been conducted and status and further needs have been discussed with experts at a symposium. This review summarizes the current knowledge about models and their level of conservatism and accuracy. We found that extraction of relevant information on model performance for spraying from published studies and interpretation of model accuracy proved to be challenging, as the studies often accounted for only a small part of potential spray applications. To achieve a better quality of exposure estimates in the future, more systematic evaluation of models is beneficial, taking into account a representative variety of spray equipment and application patterns. Model predictions could be improved by more accurate consideration of variation in spray equipment. Inter-model harmonization with regard to spray input parameters and appropriate grouping of spray exposure situations is recommended. From a user perspective, a platform or database with information on different spraying equipment and techniques and agreed standard parameters for specific spraying scenarios from different regulations may be useful.


2020 ◽  
pp. 147592172097970
Author(s):  
Liangliang Cheng ◽  
Vahid Yaghoubi ◽  
Wim Van Paepegem ◽  
Mathias Kersemans

The Mahalanobis–Taguchi system is considered as a promising and powerful tool for handling binary classification cases. Though, the Mahalanobis–Taguchi system has several restrictions in screening useful features and determining the decision boundary in an optimal manner. In this article, an integrated Mahalanobis classification system is proposed which builds on the concept of Mahalanobis distance and its space. The integrated Mahalanobis classification system integrates the decision boundary searching process, based on particle swarm optimizer, directly into the feature selection phase for constructing the Mahalanobis distance space. This integration (a) avoids the need for user-dependent input parameters and (b) improves the classification performance. For the feature selection phase, both the use of binary particle swarm optimizer and binary gravitational search algorithm is investigated. To deal with possible overfitting problems in case of sparse data sets, k-fold cross-validation is considered. The integrated Mahalanobis classification system procedure is benchmarked with the classical Mahalanobis–Taguchi system as well as the recently proposed two-stage Mahalanobis classification system in terms of classification performance. Results are presented on both an experimental case study of complex-shaped metallic turbine blades with various damage types and a synthetic case study of cylindrical dogbone samples with creep and microstructural damage. The results indicate that the proposed integrated Mahalanobis classification system shows good and robust classification performance.


Sign in / Sign up

Export Citation Format

Share Document