surrogate variables
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 6)

H-INDEX

7
(FIVE YEARS 1)

Molecules ◽  
2021 ◽  
Vol 26 (21) ◽  
pp. 6357
Author(s):  
Fabricio A. Chiappini ◽  
Mirta R. Alcaraz ◽  
Graciela M. Escandar ◽  
Héctor C. Goicoechea ◽  
Alejandro C. Olivieri

In this review, recent advances and applications using multi-way calibration protocols based on the processing of multi-dimensional chromatographic data are discussed. We first describe the various modes in which multi-way chromatographic data sets can be generated, including some important characteristics that should be taken into account for the selection of an adequate data processing model. We then discuss the different manners in which the collected instrumental data can be arranged, and the most usually applied models and algorithms for the decomposition of the data arrays. The latter activity leads to the estimation of surrogate variables (scores), useful for analyte quantitation in the presence of uncalibrated interferences, achieving the second-order advantage. Recent experimental reports based on multi-way liquid and gas chromatographic data are then reviewed. Finally, analytical figures of merit that should always accompany quantitative calibration reports are described.


Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1210
Author(s):  
Elzbieta Turska ◽  
Szymon Jurga ◽  
Jaroslaw Piskorski

We apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is many missing data as well as the being heavily unbalanced (there are few severe mood disorder cases). We find that all algorithms are specific, but only the rpart algorithm is sensitive; i.e., it is able to detect cases of real cases mood disorder. The conclusion of this paper is that this is caused by the fact that the rpart algorithm uses the surrogate variables to handle missing data. The most important social-studies-related result is that the adolescents’ relationships with their parents are the single most important factor in developing mood disorders—far more important than other factors, such as the socio-economic status or school success.


Tourism ◽  
2020 ◽  
Vol 68 (4) ◽  
pp. 389-401
Author(s):  
Silvana Astudillo ◽  
Ana Serrano ◽  
Diana López ◽  
Barbara Sofía Pasaco González

Airbnb in Ecuador is a platform offering since 2008 a new lodging concept that best can be described as a sharing economy model through the rental of private rooms and apartments. The article provides an overview of Airbnb’s activities in Ecuador, more in particular in 22 cities, respectively the capitals of 22 provinces, using the 16 metrics available on the platform of AirDNA. Factor analysis was applied to reduce the number of variables to three main surrogate variables (lodging typology, prices and rates, market metrics) that characterizes Airbnb and retains the original factor variability. Additionally, based on the occupation frequency of Airbnb’s rental places the cluster analysis permitted to group the cities in which Airbnb is active on the basis of the following indicators: amazon destinations, traditions, sun and beach, nature culture and events, and the country’s capital. The research provided a clear image of Airbnb’s approach and impact on the formal accommodation sector, which ultimately will enable the sector to come up with innovative products to compete more efficiently Airbnb’s market range.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Antonio Federico ◽  
Veera Hautanen ◽  
Nils Christian ◽  
Andreas Kremer ◽  
Angela Serra ◽  
...  

Abstract We present manually curated transcriptomics data of psoriasis and atopic dermatitis patients retrieved from the NCBI Gene Expression Omnibus and EBI ArrayExpress repositories. We collected 39 transcriptomics datasets, deriving from DNA microarrays and RNA-Sequencing technologies, for a total of 1677 samples. We provide quality-checked, homogenised and preprocessed gene expression matrices and their corresponding metadata tables along with the estimated surrogate variables. These data represent a ready-made valuable source of knowledge for translational researchers in the dermatology field.


2020 ◽  
Vol 4 (s1) ◽  
pp. 112-113
Author(s):  
Nikolay A. Ivanov ◽  
Nadia Dahmane ◽  
Jeffrey P. Greenfield ◽  
Christopher E. Mason

OBJECTIVES/GOALS: It has been previously shown that pediatric high-grade glioma (pHGG) survival is different between sexes. We set out to find out whether there are sex-specific differences in the genomic landscapes of pHGG that may underlie this sex disparity. METHODS/STUDY POPULATION: We downloaded Illumina 450k DNAm data from ArrayExpress and GeneExpressionOmnibus. The minfi package was used to process raw DNAm data. Sex chromosomes and CpGs that are common SNPs were removed. Surrogate variables (SVs) were estimated via the sva Bioconductor package. Differentially methylated CpGs were identified by fitting a multiple linear regression model for the DNAm level at each CpG, with independent variables being sex (a binary variable) and the estimated SVs. RNAseq data was downloaded from Cavatica, and differential gene expression analysis was carried out via the DESeq2 package. RESULTS/ANTICIPATED RESULTS: In the pediatric glioblastoma (GBM) DNAm data [58 female & 91 male IDH wt samples; ages 0.1–21 yrs;], we found 7,371 differentially methylated cytosines (DMCs) at FDR≤0.05. Of the DMCs, 289 had DNAm differences between male and female samples ≥10%. The majority of probes (68%) were in CpG islands, shelves, or shores. We also found 4 differentially methylated regions (DMRs) between sexes (FWER≤0.1). In the adult GBM DNAm samples [32 F & 32 M IDH wt samples; ages 22–75 yrs], we found only 117 DMCs at FDR≤0.05, and no DMRs. In the RNAseq dataset [68 F & 54 M pHGG samples, ages 0.08–30.6 yrs], we found 383 differentially expressed genes (at FDR≤0.05), and 16 of them (4%) overlapped a DMC. DISCUSSION/SIGNIFICANCE OF IMPACT: Our findings demonstrate that pHGG exhibits sex-specific methylome differences. Interestingly, this difference is greater in the pediatric population as compared to adults. The pHGG transcriptome also differs by sex, which may be related to differential DNAm in a minority of cases.


2019 ◽  
Vol 35 (19) ◽  
pp. 3663-3671 ◽  
Author(s):  
Stephan Seifert ◽  
Sven Gundlach ◽  
Silke Szymczak

Abstract Motivation It has been shown that the machine learning approach random forest can be successfully applied to omics data, such as gene expression data, for classification or regression and to select variables that are important for prediction. However, the complex relationships between predictor variables, in particular between causal predictor variables, make the interpretation of currently applied variable selection techniques difficult. Results Here we propose a new variable selection approach called surrogate minimal depth (SMD) that incorporates surrogate variables into the concept of minimal depth (MD) variable importance. Applying SMD, we show that simulated correlation patterns can be reconstructed and that the increased consideration of variable relationships improves variable selection. When compared with existing state-of-the-art methods and MD, SMD has higher empirical power to identify causal variables while the resulting variable lists are equally stable. In conclusion, SMD is a promising approach to get more insight into the complex interplay of predictor variables and outcome in a high-dimensional data setting. Availability and implementation https://github.com/StephanSeifert/SurrogateMinimalDepth. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 315 (5) ◽  
pp. R945-R962 ◽  
Author(s):  
Peter Bie

The classical concepts of human sodium balance include 1) a total pool of Na+ of ≈4,200 mmol (total body sodium, TBS) distributed primarily in the extracellular fluid (ECV) and bone, 2) intake variations of 0.03 to ≈6 mmol·kg body mass−1·day−1, 3) asymptotic transitions between steady states with a halftime (T½) of 21 h, 4) changes in TBS driven by sodium intake measuring ≈1.3 day [ΔTBS/Δ(Na+ intake/day)], 5) adjustment of Na+ excretion to match any diet thus providing metabolic steady state, and 6) regulation of TBS via controlled excretion (90–95% renal) mediated by surrogate variables. The present focus areas include 1) uneven, nonosmotic distribution of increments in TBS primarily in “skin,” 2) long-term instability of TBS during constant Na+ intake, and 3) physiological regulation of renal Na+ excretion primarily by neurohumoral mechanisms dependent on ECV rather than arterial pressure. Under physiological conditions 1) the nonosmotic distribution of Na+ seems conceptually important, but quantitatively ill defined; 2) long-term variations in TBS represent significant deviations from steady state, but the importance is undetermined; and 3) the neurohumoral mechanisms of sodium homeostasis competing with pressure natriuresis are essential for systematic analysis of short-term and long-term regulation of TBS. Sodium homeostasis and blood pressure regulation are intimately related. Real progress is slow and will accelerate only through recognition of the present level of ignorance. Nonosmotic distribution of sodium, pressure natriuresis, and volume-mediated regulation of renal sodium excretion are essential intertwined concepts in need of clear definitions, conscious models, and future attention.


Author(s):  
Christos Diou ◽  
Pantelis Lelekas ◽  
Anastasios Delopoulos

(1) Background: Evidence-based policymaking requires data about the local population's socioeconomic status (SES) at detailed geographical level, however, such information is often not available, or is too expensive to acquire. Researchers have proposed solutions to estimate SES indicators by analyzing Google Street View images, however, these methods are also resource-intensive, since they require large volumes of manually labeled training data. (2) Methods: We propose a methodology for automatically computing surrogate variables of SES indicators using street images of parked cars and deep multiple instance learning. Our approach does not require any manually created labels, apart from data already available by statistical authorities, while the entire pipeline for image acquisition, parked car detection, car classification, and surrogate variable computation is fully automated. The proposed surrogate variables are then used in linear regression models to estimate the target SES indicators. (3) Results: We implement and evaluate a model based on the proposed surrogate variable at 30 municipalities of varying SES in Greece. Our model has $R^2=0.76$ and a correlation coefficient of $0.874$ with the true unemployment rate, while it achieves a mean absolute percentage error of $0.089$ and mean absolute error of $1.87$ on a held-out test set. Similar results are also obtained for other socioeconomic indicators, related to education level and occupational prestige. (4) Conclusions: The proposed methodology can be used to estimate SES indicators at the local level automatically, using images of parked cars detected via Google Street View, without the need for any manual labeling effort.


2018 ◽  
Vol 4 (11) ◽  
pp. 125 ◽  
Author(s):  
Christos Diou ◽  
Pantelis Lelekas ◽  
Anastasios Delopoulos

(1) Background: Evidence-based policymaking requires data about the local population’s socioeconomic status (SES) at detailed geographical level, however, such information is often not available, or is too expensive to acquire. Researchers have proposed solutions to estimate SES indicators by analyzing Google Street View images, however, these methods are also resource-intensive, since they require large volumes of manually labeled training data. (2) Methods: We propose a methodology for automatically computing surrogate variables of SES indicators using street images of parked cars and deep multiple instance learning. Our approach does not require any manually created labels, apart from data already available by statistical authorities, while the entire pipeline for image acquisition, parked car detection, car classification, and surrogate variable computation is fully automated. The proposed surrogate variables are then used in linear regression models to estimate the target SES indicators. (3) Results: We implement and evaluate a model based on the proposed surrogate variable at 30 municipalities of varying SES in Greece. Our model has R 2 = 0 . 76 and a correlation coefficient of 0 . 874 with the true unemployment rate, while it achieves a mean absolute percentage error of 0 . 089 and mean absolute error of 1 . 87 on a held-out test set. Similar results are also obtained for other socioeconomic indicators, related to education level and occupational prestige. (4) Conclusions: The proposed methodology can be used to estimate SES indicators at the local level automatically, using images of parked cars detected via Google Street View, without the need for any manual labeling effort.


Sign in / Sign up

Export Citation Format

Share Document