scholarly journals Validating the genomic signature of pediatric septic shock

2008 ◽  
Vol 34 (1) ◽  
pp. 127-134 ◽  
Author(s):  
Natalie Cvijanovich ◽  
Thomas P. Shanley ◽  
Richard Lin ◽  
Geoffrey L. Allen ◽  
Neal J. Thomas ◽  
...  

We previously generated genome-wide expression data (microarray) from children with septic shock having the potential to lead the field into novel areas of investigation. Herein we seek to validate our data through a bioinformatic approach centered on a validation patient cohort. Forty-two children with a clinical diagnosis of septic shock and 15 normal controls served as the training data set, while 30 separate children with septic shock and 14 separate normal controls served as the test data set. Class prediction modeling using the training data set and the previously reported genome-wide expression signature of pediatric septic shock correctly identified 95–100% of controls and septic shock patients in the test data set, depending on the class prediction algorithm and the gene selection method. Subjecting the test data set to an identical filtering strategy as that used for the training data set, demonstrated 75% concordance between the two gene lists. Subjecting the test data set to a purely statistical filtering strategy, with highly stringent correction for multiple comparisons, demonstrated <50% concordance with the previous gene filtering strategy. However, functional analysis of this statistics-based gene list demonstrated similar functional annotations and signaling pathways as that seen in the training data set. In particular, we validated that pediatric septic shock is characterized by large-scale repression of genes related to zinc homeostasis and lymphocyte function. These data demonstrate that the previously reported genome-wide expression signature of pediatric septic shock is applicable to a validation cohort of patients.

Author(s):  
Yanxiang Yu ◽  
◽  
Chicheng Xu ◽  
Siddharth Misra ◽  
Weichang Li ◽  
...  

Compressional and shear sonic traveltime logs (DTC and DTS, respectively) are crucial for subsurface characterization and seismic-well tie. However, these two logs are often missing or incomplete in many oil and gas wells. Therefore, many petrophysical and geophysical workflows include sonic log synthetization or pseudo-log generation based on multivariate regression or rock physics relations. Started on March 1, 2020, and concluded on May 7, 2020, the SPWLA PDDA SIG hosted a contest aiming to predict the DTC and DTS logs from seven “easy-to-acquire” conventional logs using machine-learning methods (GitHub, 2020). In the contest, a total number of 20,525 data points with half-foot resolution from three wells was collected to train regression models using machine-learning techniques. Each data point had seven features, consisting of the conventional “easy-to-acquire” logs: caliper, neutron porosity, gamma ray (GR), deep resistivity, medium resistivity, photoelectric factor, and bulk density, respectively, as well as two sonic logs (DTC and DTS) as the target. The separate data set of 11,089 samples from a fourth well was then used as the blind test data set. The prediction performance of the model was evaluated using root mean square error (RMSE) as the metric, shown in the equation below: RMSE=sqrt(1/2*1/m* [∑_(i=1)^m▒〖(〖DTC〗_pred^i-〖DTC〗_true^i)〗^2 + 〖(〖DTS〗_pred^i-〖DTS〗_true^i)〗^2 ] In the benchmark model, (Yu et al., 2020), we used a Random Forest regressor and conducted minimal preprocessing to the training data set; an RMSE score of 17.93 was achieved on the test data set. The top five models from the contest, on average, beat the performance of our benchmark model by 27% in the RMSE score. In the paper, we will review these five solutions, including preprocess techniques and different machine-learning models, including neural network, long short-term memory (LSTM), and ensemble trees. We found that data cleaning and clustering were critical for improving the performance in all models.


2021 ◽  
Author(s):  
Louise Bloch ◽  
Christoph M. Friedrich

Abstract Background: The prediction of whether Mild Cognitive Impaired (MCI) subjects will prospectively develop Alzheimer's Disease (AD) is important for the recruitment and monitoring of subjects for therapy studies. Machine Learning (ML) is suitable to improve early AD prediction. The etiology of AD is heterogeneous, which leads to noisy data sets. Additional noise is introduced by multicentric study designs and varying acquisition protocols. This article examines whether an automatic and fair data valuation method based on Shapley values can identify subjects with noisy data. Methods: An ML-workow was developed and trained for a subset of the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. The validation was executed for an independent ADNI test data set and for the Australian Imaging, Biomarker and Lifestyle Flagship Study of Ageing (AIBL) cohort. The workow included volumetric Magnetic Resonance Imaging (MRI) feature extraction, subject sample selection using data Shapley, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) for model training and Kernel SHapley Additive exPlanations (SHAP) values for model interpretation. This model interpretation enables clinically relevant explanation of individual predictions. Results: The XGBoost models which excluded 116 of the 467 subjects from the training data set based on their Logistic Regression (LR) data Shapley values outperformed the models which were trained on the entire training data set and which reached a mean classification accuracy of 58.54 % by 14.13 % (8.27 percentage points) on the independent ADNI test data set. The XGBoost models, which were trained on the entire training data set reached a mean accuracy of 60.35 % for the AIBL data set. An improvement of 24.86 % (15.00 percentage points) could be reached for the XGBoost models if those 72 subjects with the smallest RF data Shapley values were excluded from the training data set. Conclusion: The data Shapley method was able to improve the classification accuracies for the test data sets. Noisy data was associated with the number of ApoEϵ4 alleles and volumetric MRI measurements. Kernel SHAP showed that the black-box models learned biologically plausible associations.


2012 ◽  
Vol 51 (01) ◽  
pp. 39-44 ◽  
Author(s):  
K. Matsuoka ◽  
K. Yoshino

SummaryObjectives: The aim of this study is to present a method of assessing psychological tension that is optimized to every individual on the basis of the heart rate variability (HRV) data which, to eliminate the influence of the inter-individual variability, are measured in a long time period during daily life.Methods: HRV and body accelerations were recorded from nine normal subjects for two months of normal daily life. Fourteen HRV indices were calculated with the HRV data at 512 seconds prior to the time of every mental tension level report. Data to be analyzed were limited to those with body accelerations of 30 mG (0.294 m/s2) and lower. Further, the differences from the reference values in the same time zone were calculated with both the mental tension score (Δtension) and HRV index values (ΔHRVI). The multiple linear regression model that estimates Δtension from the scores for principal components of ΔHRVI were then constructed for each individual. The data were divided into training data set and test data set in accordance with the twofold cross validation method. Multiple linear regression coefficients were determined using the training data set, and with the optimized model its generalization capability was checked using the test data set.Results: The subjects’ mean Pearson correlation coefficient was 0.52 with the training data set and 0.40 with the test data set. The subjects’ mean coefficient of determination was 0.28 with the training data set and 0.11 with the test data set.Conclusion: We proposed a method of assessing psychological tension that is optimized to every individual based on HRV data measured over a long period of daily life.


2020 ◽  
Vol 58 (8) ◽  
pp. 1667-1679
Author(s):  
Benedikt Franke ◽  
J. Weese ◽  
I. Waechter-Stehle ◽  
J. Brüning ◽  
T. Kuehne ◽  
...  

Abstract The transvalvular pressure gradient (TPG) is commonly estimated using the Bernoulli equation. However, the method is known to be inaccurate. Therefore, an adjusted Bernoulli model for accurate TPG assessment was developed and evaluated. Numerical simulations were used to calculate TPGCFD in patient-specific geometries of aortic stenosis as ground truth. Geometries, aortic valve areas (AVA), and flow rates were derived from computed tomography scans. Simulations were divided in a training data set (135 cases) and a test data set (36 cases). The training data was used to fit an adjusted Bernoulli model as a function of AVA and flow rate. The model-predicted TPGModel was evaluated using the test data set and also compared against the common Bernoulli equation (TPGB). TPGB and TPGModel both correlated well with TPGCFD (r > 0.94), but significantly overestimated it. The average difference between TPGModel and TPGCFD was much lower: 3.3 mmHg vs. 17.3 mmHg between TPGB and TPGCFD. Also, the standard error of estimate was lower for the adjusted model: SEEModel = 5.3 mmHg vs. SEEB = 22.3 mmHg. The adjusted model’s performance was more accurate than that of the conventional Bernoulli equation. The model might help to improve non-invasive assessment of TPG.


Heart ◽  
2018 ◽  
Vol 104 (23) ◽  
pp. 1921-1928 ◽  
Author(s):  
Ming-Zher Poh ◽  
Yukkee Cheung Poh ◽  
Pak-Hei Chan ◽  
Chun-Ka Wong ◽  
Louise Pun ◽  
...  

ObjectiveTo evaluate the diagnostic performance of a deep learning system for automated detection of atrial fibrillation (AF) in photoplethysmographic (PPG) pulse waveforms.MethodsWe trained a deep convolutional neural network (DCNN) to detect AF in 17 s PPG waveforms using a training data set of 149 048 PPG waveforms constructed from several publicly available PPG databases. The DCNN was validated using an independent test data set of 3039 smartphone-acquired PPG waveforms from adults at high risk of AF at a general outpatient clinic against ECG tracings reviewed by two cardiologists. Six established AF detectors based on handcrafted features were evaluated on the same test data set for performance comparison.ResultsIn the validation data set (3039 PPG waveforms) consisting of three sequential PPG waveforms from 1013 participants (mean (SD) age, 68.4 (12.2) years; 46.8% men), the prevalence of AF was 2.8%. The area under the receiver operating characteristic curve (AUC) of the DCNN for AF detection was 0.997 (95% CI 0.996 to 0.999) and was significantly higher than all the other AF detectors (AUC range: 0.924–0.985). The sensitivity of the DCNN was 95.2% (95% CI 88.3% to 98.7%), specificity was 99.0% (95% CI 98.6% to 99.3%), positive predictive value (PPV) was 72.7% (95% CI 65.1% to 79.3%) and negative predictive value (NPV) was 99.9% (95% CI 99.7% to 100%) using a single 17 s PPG waveform. Using the three sequential PPG waveforms in combination (<1 min in total), the sensitivity was 100.0% (95% CI 87.7% to 100%), specificity was 99.6% (95% CI 99.0% to 99.9%), PPV was 87.5% (95% CI 72.5% to 94.9%) and NPV was 100% (95% CI 99.4% to 100%).ConclusionsIn this evaluation of PPG waveforms from adults screened for AF in a real-world primary care setting, the DCNN had high sensitivity, specificity, PPV and NPV for detecting AF, outperforming other state-of-the-art methods based on handcrafted features.


2019 ◽  
Author(s):  
Jacob Schreiber ◽  
Jeffrey Bilmes ◽  
William Stafford Noble

AbstractMotivationRecent efforts to describe the human epigenome have yielded thousands of uniformly processed epigenomic and transcriptomic data sets. These data sets characterize a rich variety of biological activity in hundreds of human cell lines and tissues (“biosamples”). Understanding these data sets, and specifically how they differ across biosamples, can help explain many cellular mechanisms, particularly those driving development and disease. However, due primarily to cost, the total number of assays that can be performed is limited. Previously described imputation approaches, such as Avocado, have sought to overcome this limitation by predicting genome-wide epigenomics experiments using learned associations among available epigenomic data sets. However, these previous imputations have focused primarily on measurements of histone modification and chromatin accessibility, despite other biological activity being crucially important.ResultsWe applied Avocado to a data set of 3,814 tracks of data derived from the ENCODE compendium, spanning 400 human biosamples and 84 assays. The resulting imputations cover measurements of chromatin accessibility, histone modification, transcription, and protein binding. We demonstrate the quality of these imputations by comprehensively evaluating the model’s predictions and by showing significant improvements in protein binding performance compared to the top models in an ENCODE-DREAM challenge. Additionally, we show that the Avocado model allows for efficient addition of new assays and biosamples to a pre-trained model, achieving high accuracy at predicting protein binding, even with only a single track of training data.AvailabilityTutorials and source code are available under an Apache 2.0 license at https://github.com/jmschrei/[email protected] or [email protected]


Jurnal Segara ◽  
2020 ◽  
Vol 16 (3) ◽  
Author(s):  
Arip Rahman

Shallow water bathymetry estimation from remote sensing data has been increasing widespread, as an alternative to traditional bathymetry measurement that has disturbed by technical and logistic problem. Deriving bathymetry data from Sentinel 2A images, at visible wavelength (blue, green and red) 10 meter spatial resolution was carried out around the waters of the Kemujan Island Karimunjawa National Park Central Java. Amount of 1280 points data are used as training data sets and 854 points data as test data set produced from sounding. Dark Object Substraction (DOS) has been to correct atmospherically the Sentinel-2A images. Several algorithm has been applied to derive bathymetry data, including: linear transform, ratio transform and support vector machine (SVM). The highest correlation between depth prediction and observe resulted from SVM algorithm with a coefficient of determination (R2) 0.71 (training data) and 0.56 (test data). The assessment of the accuracy of the three methods using RMSE and MAE values, the SVM algorithm has the smallest value (< 1 m). This indicates that the SVM algorithm has a high accuracy compared to the other two methods. The bathymetry map derived from Sentinel 2A imagery cannot be used as a reference for navigation.


2002 ◽  
Vol 92 (5) ◽  
pp. 553-562 ◽  
Author(s):  
S. Chakraborty ◽  
C. D. Fernandes ◽  
M. J. d' A. Charchar ◽  
M. R. Thomas

Pathogenic variation in Colletotrichum gloeosporioides infecting species of the tropical pasture legume Stylosanthes at its center of diversity was determined from 296 isolates collected from wild host population and selected germ plasm of S. capitata, S. guianensis, S. scabra, and S. macrocephala in Brazil. A putative host differential set comprising 11 accessions was selected from a bioassay of 18 isolates on 19 host accessions using principal component analysis. A similar analysis of anthracnose severity data for a subset of 195 isolates on the 11 differentials indicated that an adequate summary of pathogenic variation could be obtained using only five of these differentials. Of the five differentials, S. seabrana ‘Primar’ was resistant and S. scabra ‘Fitzroy’ was susceptible to most isolates. A cluster analysis was used to determine eight natural race clusters using the 195 isolates. Linear discriminant functions were developed for eight race clusters using the 195 isolates as the training data set, and these were applied to classify a test data set of the remaining 101 isolates. All except 11 isolates of the test data set were classified into one of the eight race clusters. Over 10% of the 296 isolates were weakly pathogenic to all five differentials and another 40% were virulent on just one differential. The unclassified isolates represent six new races with unique virulence combinations, of which one isolate is virulent on all five differentials. The majority of isolates came from six field sites, and Shannon's index of diversity indicated considerable variation between sites. Pathogenic diversity was extensive at three sites where selected germ plasm were under evaluation, and complex race clusters and unclassified isolates representing new races were more prevalent at these sites compared with sites containing wild Stylosanthes populations.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lili Lu ◽  
Yuru Shang ◽  
Dietmar Zechner ◽  
Christina Susanne Mullins ◽  
Michael Linnebacher ◽  
...  

Background: If the diagnosis of neuroendocrine neoplasm (NEN) increases the risk of patients to commit suicide has not been investigated so far. Identifying NEN patients at risk to commit suicide is important to increase their life quality and life expectancy.Methods and findings: Cancer cases were extracted from the Surveillance, Epidemiology, and End Results program and were divided into the NEN and the non-NEN cohorts. Subsequently, the NEN patients were randomly split into a training data set and a validation data set. Analyzing the training data set, we developed a score for assessing the risk to commit suicide for patients with NEN. In addition, we validated the score using the validation data set and evaluated, if this score could also be applied to other cancer entities by using the test data set, a non-NEN cohort. The odds ratio (OR) of suicide between NEN and non-NEN patients was determined. Moreover, the performance of a score was evaluated by the receiver operating characteristic curve and the area under the curve (AUC). Compared to non-NEN, NEN significantly increased the risk of suicide to 1.8-fold (NEN vs. non-NEN; OR, 1.832; P &lt; 0.001). In addition, we observed that age, gender, race, marital status, tumor stage, histologic grade, surgery, and chemotherapy were associated with suicide among NEN patients; and a synthesized score based on these factors could significantly distinguish suicide individuals from non-suicide individuals in the training data set (AUC, 0.829; P &lt; 0.001) and in the validation data set (AUC, 0.735; P &lt; 0.001). This score also had a good performance when it was assessed by the test data set (AUC, 0.690; P &lt; 0.001). This demonstrates that the score might also be applicable to other cancer entities.Conclusions: This population-based study suggests that NEN patients have a higher risk of suicide than non-NEN patients. In addition, this study provided a score, which can identify NEN patients at high-risk of committing suicide. Thus, this score in combination with current screening and prevention strategies for suicide may improve life quality and life expectancy of NEN patients.


Sign in / Sign up

Export Citation Format

Share Document