variables selection
Recently Published Documents


TOTAL DOCUMENTS

108
(FIVE YEARS 25)

H-INDEX

17
(FIVE YEARS 3)

Molecules ◽  
2022 ◽  
Vol 27 (2) ◽  
pp. 335
Author(s):  
Ning Ai ◽  
Yibo Jiang ◽  
Sainab Omar ◽  
Jiawei Wang ◽  
Luyue Xia ◽  
...  

Near-infrared (NIR) spectroscopy and characteristic variables selection methods were used to develop a quick method for the determination of cellulose, hemicellulose, and lignin contents in Sargassum horneri. Calibration models for cellulose, hemicellulose, and lignin in Sargassum horneri were established using partial least square regression methods with full variables (full-PLSR). The PLSR calibration models were established by four characteristic variables selection methods, including interval partial least square (iPLS), competitive adaptive reweighted sampling (CARS), correlation coefficient (CC), and genetic algorithm (GA). The results showed that the performance of the four calibration models, namely iPLS-PLSR, CARS-PLSR, CC-PLSR, and GA-PLSR, was better than the full-PLSR calibration model. The iPLS method was best in the performance of the models. For iPLS-PLSR, the determination coefficient (R2), root mean square error (RMSE), and residual predictive deviation (RPD) of the prediction set were as follows: 0.8955, 0.8232%, and 3.0934 for cellulose, 0.8669, 0.4697%, and 2.7406 for hemicellulose, and 0.7307, 0.7533%, and 1.9272 for lignin, respectively. These findings indicate that the NIR calibration models can be used to predict cellulose, hemicellulose, and lignin contents in Sargassum horneri quickly and accurately.


2021 ◽  
Vol 12 (8) ◽  
pp. 2380-2497
Author(s):  
Mirele Marques Borges ◽  
Cláudio José Müller

The research aimed to investigate the stages of a Machine Learning model process creation in order to predict the indicator over the number of medical appointments per day done in the area of ​​supplementary health in the region of Porto Alegre / RS - Brazil and to propose a metric for anomalies detection. Literature review and applied case study was used as a methodology in this paper, besides was used the statistical software called R, in order to prepare the data and create the model. The stages of the case study was: database extraction, division of the base in training and testing, creation of functions and feature engineering, variables selection and correlation analysis, choice of the algorithms with cross-validation and tuning, training of models, application of the models in the test data, selection of the best model and proposal of the metric for anomalies detection. At the end of these stages, it was possible to select the best model in terms of MAE (Mean Absolute Error), the Random Forest, which was the algorithm with better performance when compared to Linear Regression and Neural Network. It also makes possible to identified nine anomaly points and thirty-eight warning points using the standard deviation metric. It was concluded, through the proposed methodology and the results obtained, that the steps of feature engineering and variables selection were essential for the creation and selection of the model, in addition, the proposed metric achieved the objective of generates alerts in the indicator, showing cases with possible problems or opportunities.


2021 ◽  
Vol 2078 (1) ◽  
pp. 012012
Author(s):  
Song Yao ◽  
Lipeng Cui ◽  
Sining Ma

Abstract In recent years, the sparse model is a research hotspot in the field of artificial intelligence. Since the Lasso model ignores the group structure among variables, and can only achieve the selection of scattered variables. Besides, Group Lasso can only select groups of variables. To address this problem, the Sparse Group Log Ridge model is proposed, which can select both groups of variables and variables in one group. Then the MM algorithm combined with the block coordinate descent algorithm can be used for solving. Finally, the advantages of the model in terms of variables selection and prediction are shown through the experiment.


Author(s):  
Lilian Passos Scatalon ◽  
Rogério Eduardo Garcia ◽  
Ellen Francine Barbosa

The selection of variables in a given experiment is crucial, since it is the theoretical foundation that guides how data should be collected and analyzed. However, selecting variables is an intricate activity, especially considering areas such as Software Engineering and Education, whose studies should also consider human-related variables in the design. In this scenario, we aim to investigate how a support mechanism helps in the variables selection activity of the experiment process. To do so, we conducted a preliminary study on the use of an experimental framework composed of a catalog of variables. We explored the domain of the integration of software testing into programming education. Participants were divided into two groups (ad hoc and framework support) and asked to select variables for a given experiment goal. We analyzed the results by identifying threats to validity in their experimental design drafts. Results show a significant number of threats of type inadequate explication of constructs for both groups. Nonetheless, the framework helped to increase the clarity of concepts selected as variables. The cause of most raised threats, even with the framework support, was an inaccuracy in selecting the values of such variables (i.e. treatments and fixed values).


2021 ◽  
Vol 12 (1) ◽  
pp. 75-81
Author(s):  
М. A. Khodasevich ◽  
D. A. Borisevich

The aim of the work was a multivariate calibration of the concentration of unrefined sunflower oil, considered as adulteration, in a mixture with flaxseed oil. The relevance of the study is due to the need to develop a simple and effective method for detecting the falsification of flaxseed oil which is superior in the content of essential polyunsaturated fatty acids to olive oil. A few works only are devoted to identifying adulteration of flaxseed oil, unlike olive oil.Multivariate calibration carried out using a model based on the principal component analysis, cluster analysis and projection to latent structures of absorbance spectra in UV, visible and near IR ranges. Calibration uses three methods for spectral variables selection: the successive projections algorithm, the method of searching combination moving window, and method for ranking variables by correlation coefficient.The application of the successive projections algorithm, ranking variables by correlation coefficient and searching combination moving window makes it possible to reduce the value of the root mean square error of prediction from 0.63 % for wideband projection to latent structures to 0.46 %, 0.50 %, and 0.03 %, respectively.The developed method of multivariate calibration by projection to latent structures of absorbance spectra in UV, visible and near IR ranges using the spectral variables selection by searching combination moving window is a simple and effective method of detecting adulteration of flaxseed oil.


2021 ◽  
Author(s):  
Gunta Lazdane ◽  
◽  
Dace Rezeberga ◽  
Ieva Briedite ◽  
Inara Kantane ◽  
...  

The results of the anonymous online survey of people living in Latvia age 18 and over, using internationally (I-SHARE) and nationally validated questionnaire. Data include following variables: Selection, socio-demographics, social distancing measures, couple and family relationships, sexual behavior, access to condoms and contraceptives, access to reproductive health services, antenatal care, pregnancy and maternal and child health, abortion, sexual and gender-based violence, HIV/STI, mental health, and nutrition. (2021-02-08)


2020 ◽  
Vol 12 (4) ◽  
pp. 47-60
Author(s):  
Michaela Kavčáková ◽  
Kristína Kočišová

The aim of the paper is to explore possibilities of diagnosis corporate credit risk through DEA and design an appropriate model for diagnosis of credit risk, which can be used in different sectors of national economy (e.g. agricultural, service sector or industry and innovation sector). The model differs from the conventional application of DEA because of variables selection and construction of production-possibility frontier. We illustrate application of models on sample 110 randomly selected companies during the 2013-2017 period. The reason for choosing the ICT companies is the fact that this sector is considered to be driving force behind the growth of the economy. The data has been obtained from Finstat. The results are divided into identification of 3 zones of corporate financial health with a different stage of credit risk. They show that DEA achieves a satisfactory value of a correct classification into the relevant zone (financial health, grey, and financial distress zone), but also the relatively high error rate of the DEA in the identification of companies in financial distress.


Author(s):  
Mikhail Bazilevskii ◽  

When constructing a regression model, the primary problem faced by the researcher is that it is not clear what the equation of connection between the explained and explanatory variables should be. This initial stage of construction the selection of the model structural specification is called. When choosing a regression specification in parallel, the question arises of which explanatory variables should be included in the equation. This is the problem of variables selection in regression models. Its essence is to single out from the set of “candidates” for inclusion a subset of the most informative of them based on some quality criterion. The article is devoted to the problem of variables selection in regression models estimated using the ordinary least squares. The previously proposed approach to selection a given number of variables based on mixed 0–1 linear programming is considered. The unknown parameters in this problem are the beta coefficients of standardized regression and Boolean variables that are responsible for the occurrence of factors in the model. The optimal values ​​of unknown parameters are found on the basis of maximizing the value of the coefficient of determination of regression. Unfortunately, to solve the problem under consideration, it is required to manually set the number of selected factors, which is often impossible to determine in advance. Therefore, the goal was to formalize the problem so that as a result of its solution the optimal number of selected regressors was also determined. For this purpose, the adjusted determination coefficient, depending on the number of model factors, was used as the objective function. As a result, the problem of mixed integer linear programming was formulated. The unknown parameters in it are still beta coefficients and Boolean variables, as well as an integer variable – the number of regressors. Based on data on prices and characteristics of sedans and hatchbacks of the American automobile industry, a computational experiment was carried out confirming the correctness of the developed mathematical apparatus. The problem formalized in this work in the form of a mixed integer linear programming looks more preferable from a computational point of view than the same problem formalized in modern scientific literature as a mixed quadratic linear programming.


IOP SciNotes ◽  
2020 ◽  
Vol 1 (1) ◽  
pp. 014201
Author(s):  
Qianqian Li ◽  
Yue Huang ◽  
Kuangda Tian

Sign in / Sign up

Export Citation Format

Share Document