The Power of Sampling and Stacking for the PAKDD-2007 Cross-Selling Problem

Author(s):  
Paulo J.L. Adeodato ◽  
Germano C. Vasconcelos ◽  
Adrian L. Arnaud ◽  
Rodrigo C.L.V. Cunha ◽  
Domingos S.M.P. Monteiro ◽  
...  

This article presents an efficient solution for the PAKDD-2007 Competition cross-selling problem. The solution is based on a thorough approach which involves the creation of new input variables, efficient data preparation and transformation, adequate data sampling strategy and a combination of two of the most robust modeling techniques. Due to the complexity imposed by the very small amount of examples in the target class, the approach for model robustness was to produce the median score of the 11 models developed with an adapted version of the 11-fold cross-validation process and the use of a combination of two robust techniques via stacking, the MLP neural network and the n-tuple classifier. Despite the problem complexity, the performance on the prediction data set (unlabeled samples), measured through KS2 and ROC curves was shown to be very effective and finished as the first runner-up solution of the competition.

2021 ◽  
pp. 004051752110205
Author(s):  
Xueqing Zhao ◽  
Ke Fan ◽  
Xin Shi ◽  
Kaixuan Liu

Virtual reality is a technology that allows users to completely interact with a computer-simulated environment, and put on new clothes to check the effect without taking off their clothes. In this paper, a virtual fit evaluation of pants using the Adaptive Network Fuzzy Inference System (ANFIS), VFE-ANFIS for short, is proposed. There are two stages of the VFE-ANFIS: training and evaluation. In the first stage, we trained some key pressure parameters by using the VFE-ANFIS; these key pressure parameters were collected from real try-on and virtual try-on of pants by users. In the second stage, we evaluated the fit by using the trained VFE-ANFIS, in which some key pressure parameters of pants from a new user were determined and we output the evaluation results, fit or unfit. In addition, considering the small number of input samples, we used the 10-fold cross-validation method to divide the data set into a training set and a testing set; the test accuracy of the VFE-ANFIS was 94.69% ± 2.4%, and the experimental results show that our proposed VFE-ANFIS could be applied to the virtual fit evaluation of pants.


Geophysics ◽  
2014 ◽  
Vol 79 (1) ◽  
pp. IM1-IM9 ◽  
Author(s):  
Nathan Leon Foks ◽  
Richard Krahenbuhl ◽  
Yaoguo Li

Compressive inversion uses computational algorithms that decrease the time and storage needs of a traditional inverse problem. Most compression approaches focus on the model domain, and very few, other than traditional downsampling focus on the data domain for potential-field applications. To further the compression in the data domain, a direct and practical approach to the adaptive downsampling of potential-field data for large inversion problems has been developed. The approach is formulated to significantly reduce the quantity of data in relatively smooth or quiet regions of the data set, while preserving the signal anomalies that contain the relevant target information. Two major benefits arise from this form of compressive inversion. First, because the approach compresses the problem in the data domain, it can be applied immediately without the addition of, or modification to, existing inversion software. Second, as most industry software use some form of model or sensitivity compression, the addition of this adaptive data sampling creates a complete compressive inversion methodology whereby the reduction of computational cost is achieved simultaneously in the model and data domains. We applied the method to a synthetic magnetic data set and two large field magnetic data sets; however, the method is also applicable to other data types. Our results showed that the relevant model information is maintained after inversion despite using 1%–5% of the data.


2010 ◽  
Vol 4 (4) ◽  
pp. 355-363 ◽  
Author(s):  
Hiroshi Yachi ◽  
◽  
Hiroshi Tachiya

This paper proposes a calibration method for parallel mechanisms usingResponse Surface Methodology. This method is a statistical approach to estimating an unknown input-output relationship using a small set of efficient data collected on an intended system. Although identifying locations causing positional errors in a parallel mechanism and precisely measuring the position and posture of the output point are difficult, the proposed calibration method based onResponse Surface Methodologyaims to compensate for positional and postural errors, without indentifying the locations causing these errors, by using a small yet efficient measurement data set. This study analyzes the effectiveness of the method we propose by applying it to a Stewart platform, which is a typical spatial 6-DOF parallel mechanism.


2021 ◽  
Author(s):  
Vazken Andréassian ◽  
Léonard Santos ◽  
Torben Sonnenborg ◽  
Alban de Lavenne ◽  
Göran Lindström ◽  
...  

<p>Hydrological models are increasingly used under evolving climatic conditions. They should thus be evaluated regarding their temporal transferability (application in different time periods) and extrapolation capacity (application beyond the range of known past conditions). In theory, parameters of hydrological models are independent of climate. In practice, however, many published studies based on the Split-Sample Test (Klemeš, 1986), have shown that model performances decrease systematically when it is used out of its calibration period. The RAT test proposed here aims at evaluating model robustness to a changing climate by assessing potential undesirable dependencies of hydrological model performances to climate variables. The test compares, over a long data period, the annual value of several climate variables (temperature, precipitation and aridity index) and the bias of the model over each year. If a significant relation exists between the climatic variable and the bias, the model is not considered to be robust to climate change on the catchment. The test has been compared to the Generalized Split-Sample Test (Coron et al., 2012) and showed similar results.</p><p>Here, we report on a large scale application of the test for three hydrological models with different level of complexity (GR6J, HYPE, MIKE-SHE) on a data set of 352 catchments in Denmark, France and Sweden. The results show that the test behaves differently given the evaluated variable (be temperature, precipitation or aridity) and the hydrological characteristics of each catchment. They also show that, although of different level of complexity, the robustness of the three models is similar on the overall data set. However, they are not robust on the same catchments and, then, are not sensitive to the same hydrological characteristics. This example highlights the applicability of the RAT test regardless of the model set-up and calibration procedure and its ability to provide a first evaluation of the model robustness to climate change.</p><p> </p><p><strong>References</strong></p><p>Coron, L., V. Andréassian, C. Perrin, J. Lerat, J. Vaze, M. Bourqui, and F. Hendrickx, 2012. Crash testing hydrological models in contrasted climate conditions: An experiment on 216 Australian catchments, Water Resour. Res., 48, W05552, doi:10.1029/2011WR011721</p><p>Klemeš, V., 1986. Operational testing of hydrological simulation models, Hydrol. Sci. J., 31, 13–24, doi:10.1080/02626668609491024</p><p> </p>


2021 ◽  
Author(s):  
Bernhard Schmid

<p>The work reported here builds upon a previous pilot study by the author on ANN-enhanced flow rating (Schmid, 2020), which explored the use of electrical conductivity (EC) in addition to stage to obtain ‘better’, i.e. more accurate and robust, estimates of streamflow. The inclusion of EC has an advantage, when the relationship of EC versus flow rate is not chemostatic in character. In the majority of cases, EC is, indeed, not chemostatic, but tends to decrease with increasing discharge (so-called dilution behaviour), as reported by e.g. Moatar et al. (2017), Weijs et al. (2013) and Tunqui Neira et al.(2020). This is also in line with this author’s experience.</p><p>The research presented here takes the neural network based approach one major step further and incorporates the temporal rate of change in stage and the direction of change in EC among the input variables (which, thus, comprise stage, EC, change in stage and direction of change in EC). Consequently, there are now 4 input variables in total employed as predictors of flow rate. Information on the temporal changes in both flow rate and EC helps the Artificial Neural Network (ANN) characterize hysteretic behaviour, with EC assuming different values for falling and rising flow rate, respectively, as described, for instance, by Singley et al. (2017).</p><p>The ANN employed is of the Multilayer Perceptron (MLP) type, with stage, EC, change in stage and direction of change in EC of the Mödling data set (Schmid, 2020) as input variables. Summarising the stream characteristics, the Mödling brook can be described as a small Austrian stream with a catchment of fairly mixed composition (forests, agricultural and urbanized areas). The relationship of EC versus flow reflects dilution behaviour. Neural network configuration 4-5-1 (the 4 input variables mentioned above, 5 hidden nodes and discharge as the single output) with learning rate 0.05 and momentum 0.15 was found to perform best, with testing average RMSE (root mean square error) of the scaled output after 100,000 epochs amounting to 0.0138 as compared to 0.0216 for the (best performing) 2-5-1 MLP with stage and EC as inputs only.    </p><p> </p><p>References</p><p>Moatar, F., Abbott, B.W., Minaudo, C., Curie, F. and Pinay, G.: Elemental properties, hydrology, and biology interact to shape concentration-discharge curves for carbon, nutrients, sediment and major ions. Water Resources Res., 53, 1270-1287, 2017.</p><p>Schmid, B.H.: Enhanced flow rating using neural networks with water stage and electrical conductivity as predictors. EGU2020-1804, EGU General Assembly 2020.</p><p>Singley, J.G., Wlostowski, A.N., Bergstrom, A.J., Sokol, E.R., Torrens, C.L., Jaros, C., Wilson, C.,E., Hendrickson, P.J. and Gooseff, M.N.: Characterizing hyporheic exchange processes using high-frequency electrical conductivity-discharge relationships on subhourly to interannual timescales. Water Resources Res. 53, 4124-4141, 2017.</p><p>Tunqui Neira, J.M., Andréassian, V., Tallec, G. and Mouchel, J.-M.: A two-sided affine power scaling relationship to represent the concentration-discharge relationship. Hydrol. Earth Syst. Sci. 24, 1823-1830, 2020.</p><p>Weijs, S.V., Mutzner, R. and Parlange, M.B.: Could electrical conductivity replace water level in rating curves for alpine streams? Water Resources Research 49, 343-351, 2013.</p>


Author(s):  
Li Yang ◽  
Qi Wang ◽  
Yu Rao

Abstract Film Cooling is an important and widely used technology to protect hot sections of gas turbines. The last decades witnessed a fast growth of research and publications in the field of film cooling. However, except for the correlations for single row film cooling and the Seller correlation for cooling superposition, there were rarely generalized models for film cooling under superposition conditions. Meanwhile, the numerous data obtained for complex hole distributions were not emerged or integrated from different sources, and recent new data had no avenue to contribute to a compatible model. The technical barriers that obstructed the generalization of film cooling models are: a) the lack of a generalizable model; b) the large number of input variables to describe film cooling. The present study aimed at establishing a generalizable model to describe multiple row film cooling under a large parameter space, including hole locations, hole size, hole angles, blowing ratios etc. The method allowed data measured within different streamwise lengths and different surface areas to be integrated in a single model, in the form 1-D sequences. A Long Short Term Memory model was designed to model the local behavior of film cooling. Careful training, testing and validation were conducted to regress the model. The presented results showed that the method was accurate within the CFD data set generated in this study. The presented method could serve as a base model that allowed past and future film cooling research to contribute to a common data base. Meanwhile, the model could also be transferred from simulation data sets to experimental data sets using advanced machine learning algorithms in the future.


BMJ Open ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. e037161
Author(s):  
Hyunmin Ahn

ObjectivesWe investigated the usefulness of machine learning artificial intelligence (AI) in classifying the severity of ophthalmic emergency for timely hospital visits.Study designThis retrospective study analysed the patients who first visited the Armed Forces Daegu Hospital between May and December 2019. General patient information, events and symptoms were input variables. Events, symptoms, diagnoses and treatments were output variables. The output variables were classified into four classes (red, orange, yellow and green, indicating immediate to no emergency cases). About 200 cases of the class-balanced validation data set were randomly selected before all training procedures. An ensemble AI model using combinations of fully connected neural networks with the synthetic minority oversampling technique algorithm was adopted.ParticipantsA total of 1681 patients were included.Major outcomesModel performance was evaluated using accuracy, precision, recall and F1 scores.ResultsThe accuracy of the model was 99.05%. The precision of each class (red, orange, yellow and green) was 100%, 98.10%, 92.73% and 100%. The recalls of each class were 100%, 100%, 98.08% and 95.33%. The F1 scores of each class were 100%, 99.04%, 95.33% and 96.00%.ConclusionsWe provided support for an AI method to classify ophthalmic emergency severity based on symptoms.


2018 ◽  
Vol 218 ◽  
pp. 01007 ◽  
Author(s):  
Erwin Nashrullah ◽  
Abdul Halim

Analysing and simulating the dynamic behaviour of home power system as a part of community-based energy system needs load model of either aggregate or dis-aggregate power use. Moreover, in the context of home energy efficiency, development of specific and accurate residential load model can help system designer to develop a tool for reducing energy consumption effectively. In this paper, a new method for developing two types of residential polynomial load model is presented. In the research, computation technique of model parameters is provided based on median filter and least square estimation and implemented by MATLAB. We use AMPDs data set, which have 1-minute data sampling, to show the effectiveness of proposed method. After simulation is carried out, the performance evaluation of model is provided through exploring root mean-squared error between original data and model output. From simulation results, it could be concluded that proposed model is enough for helping system designer to analyse home power energy use.


2020 ◽  
Vol 48 (5) ◽  
pp. 030006052091922
Author(s):  
Qiao Yang ◽  
Xian Zhong Jiang ◽  
Yong Fen Zhu ◽  
Fang Fang Lv

Objective We aimed to analyze the risk factors and to establish a predictive tool for the occurrence of bloodstream infections (BSI) in patients with cirrhosis. Methods A total of 2888 patients with cirrhosis were retrospectively included. Multivariate analysis for risk factors of BSI were tested using logistic regression. Multivariate logistic regression was validated using five-fold cross-validation. Results Variables that were independently associated with incidence of BSI were white blood cell count (odds ratio [OR] = 1.094, 95% confidence interval [CI] 1.063–1.127)], C-reactive protein (OR = 1.005, 95% CI 1.002–1.008), total bilirubin (OR = 1.003, 95% CI 1.002–1.004), and previous antimicrobial exposure (OR = 4.556, 95% CI 3.369–6.160); albumin (OR = 0.904, 95% CI 0.883–0.926), platelet count (OR = 0.996, 95% CI 0.994–0.998), and serum creatinine (OR = 0.989, 95% CI 0.985–0.994) were associated with lower odds of BSI. The area under receiver operating characteristic (ROC) curve of the risk assessment scale was 0.850, and its sensitivity and specificity were 0.762 and 0.801, respectively. There was no significant difference between the ROC curves of cross-validation and risk assessment. Conclusions We developed a predictive tool for BSI in patients with cirrhosis, which could help with early identification of such episodes at admission, to improve outcome in these patients.


Sign in / Sign up

Export Citation Format

Share Document