discovery dataset
Recently Published Documents


TOTAL DOCUMENTS

17
(FIVE YEARS 13)

H-INDEX

2
(FIVE YEARS 1)

Cells ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 287
Author(s):  
Khaled Bin Satter ◽  
Paul Minh Huy Tran ◽  
Lynn Kim Hoang Tran ◽  
Zach Ramsey ◽  
Katheine Pinkerton ◽  
...  

Publicly available gene expression datasets were analyzed to develop a chromophobe and oncocytoma related gene signature (COGS) to distinguish chRCC from RO. The datasets GSE11151, GSE19982, GSE2109, GSE8271 and GSE11024 were combined into a discovery dataset. The transcriptomic differences were identified with unsupervised learning in the discovery dataset (97.8% accuracy) with density based UMAP (DBU). The top 30 genes were identified by univariate gene expression analysis and ROC analysis, to create a gene signature called COGS. COGS, combined with DBU, was able to differentiate chRCC from RO in the discovery dataset with an accuracy of 97.8%. The classification accuracy of COGS was validated in an independent meta-dataset consisting of TCGA-KICH and GSE12090, where COGS could differentiate chRCC from RO with 100% accuracy. The differentially expressed genes were involved in carbohydrate metabolism, transcriptomic regulation by TP53, beta-catenin-dependent Wnt signaling, and cytokine (IL-4 and IL-13) signaling highly active in cancer cells. Using multiple datasets and machine learning, we constructed and validated COGS as a tool that can differentiate chRCC from RO and complement histology in routine clinical practice to distinguish these two tumors.


2021 ◽  
Author(s):  
Xi Jiang ◽  
Xiao-Jing Shou ◽  
Zhongbo Zhao ◽  
Fanchao Meng ◽  
Jiao Le ◽  
...  

Objective: Autism spectrum disorder (ASD) is associated with altered brain development, but it is unclear which specific structural changes may serve as potential diagnostic markers. This study aimed to identify and model brain-wide differences in structural connectivity using MRI diffusion tensor imaging (DTI) in young ASD and typically developing (TD) children (3.5-6 years old). Methods: Ninety-three ASD and 26 TD children were included in a discovery dataset and 12 ASD and 9 TD children from different sites included as independent validation datasets. Brain-wide (294 regions) structural connectivity was measured using DTI (fractional anisotropy, FA) under sedation together with symptom severity and behavioral and cognitive development. A connection matrix was constructed for each child for comparisons between ASD and TD groups. Pattern classification was performed and the resulting model tested on two independent datasets. Results: Thirty-three structural connections showed increased FA in ASD compared to TD children and associated with both symptom severity and general cognitive development. The majority (29/33) involved the frontal lobe and comprised five different networks with functional relevance to default mode, motor control, social recognition, language and reward. Overall, classification accuracy is very high in the discovery dataset 96.77%, and 91.67% and 88.89% in the two independent validation datasets. Conclusions: Identified structural connectivity differences primarily involving the frontal cortex can very accurately distinguish individual ASD from TD children and may therefore represent a robust early brain biomarker.


2021 ◽  
Author(s):  
Bokan Bao ◽  
Vahid H Gazestani ◽  
Yaqiong Xiao ◽  
Raphael Kim ◽  
Austin W.T. Chiang ◽  
...  

Importance: ASD diagnosis remains behavior-based and the median age of the first diagnosis remains unchanged at ~52 months, which is nearly 5 years after its first trimester origin. Long delays between ASD's prenatal onset and eventual diagnosis likely is a missed opportunity. However, accurate and clinically-translatable early-age diagnostic methods do not exist due to ASD genetic and clinical heterogeneity. There is a need for early-age diagnostic biomarkers of ASD that is robust against its heterogeneity. Objective: To develop a single blood-based molecular classifier that accurately diagnoses ASD at the age of first symptoms. Design, Setting, and Participants: N=264 ASD, typically developing (TD), and language delayed (LD) toddlers with their clinical, diagnostic, and leukocyte RNA data collected. Datasets included Discovery (n=175 ASD, TD subjects), Longitudinal (n=33 ASD, TD subjects), and Replication (n=89 ASD, TD, LD subjects). We developed an ensemble of ASD classifiers by testing 42,840 models composed of 3,570 feature selection sets and 12 classification methods. Models were trained on the Discovery dataset with 5-fold cross validation. Results were used to construct a Bayesian model averaging-based (BMA) ensemble classifier model that was tested in Discovery and Replication datasets. Data were collected from 2007 to 2012 and analyzed from August 2019 to April 2021. Main Outcomes and Measures: Primary outcomes were (1) comparisons of the performance of 42,840 classifier models in correctly identifying ASD vs TD and LD in Discovery and Replication datasets; and (2) performance of the ensemble model composed of 1,076 models and weighted by Bayesian model averaging technique. Results: Of 42,840 models trained in the Discovery dataset, 1,076 averaged AUC-ROC>0.8. These 1,076 models used 191 different feature routes and 2,764 gene features. Using weighted BMA of these features and routes, an ensemble classifier model was constructed which demonstrated excellent performance in Discovery and Replication datasets with ASD classification AUC-ROC scores of 84% to 88%. ASD classification accuracy was comparable against LD and TD subjects and in the Longitudinal dataset. ASD toddlers with ensemble scores above and below the ASD ensemble mean had similar diagnostic and psychometric scores, but those below the ASD ensemble mean had more prenatal risk events than TD toddlers. Ensemble features include genes with immune/inflammation, response to cytokines, transcriptional regulation, mitotic cell cycle, and PI3K-AKT, RAS, and Wnt signaling pathways. Conclusions and Relevance: An ensemble ASD molecular classifier has high and replicable accuracy across the spectrum of ASD clinical characteristics and across toddlers aged 1 to 4 years, which has potential for clinical translation.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 10544-10544
Author(s):  
Tiancheng Han ◽  
Yuanyuan Hong ◽  
Pei Zhihua ◽  
Song Xiaofeng ◽  
Jianing Yu ◽  
...  

10544 Background: Screening the biomarkers from the cell-free DNA (cfDNA) of peripheral blood is a non-invasive and promising method for cancer diagnosis. Among diverse types of biomarkers, epigenetic biomarkers have been reported to be one of the most promising ones. Epigenetic modifications are widespread on the human genome and generally have strong signals due to the similar methylation patterns shared by adjacent CpG sites. Although some epigenetic diagnostic methods have been developed based on cfDNAs, few of them could be applied to pan-cancer and their sensitivities are barely satisfactory for early cancer detection. Methods: Targeted methylation sequencing was performed using our in-house-designed panel targeting regions with abundant cancer-specific methylation CpGs. The cfDNA samples from 80 healthy individuals and 549 cancer patients of 14 cancer types were separately sequenced. The dataset was randomly split into one discovery dataset and one validation dataset. Moreover, cfDNA samples from four cancer patients were diluted with the healthy cfDNAs to generate 12 in vitro simulated samples with low circulating tumor DNA (ctDNA) fraction. Additionally, DNAs extracted from 130 unmatched tumor formalin fixation and paraffin embedding (FFPE) samples of 10 cancer types were sequenced to screen the diagnostic biomarkers. Adjacent CpG sites were first merged into methylation-correlated blocks (MCB) according to their correlations of methylation levels in tumor DNAs. The MCBs with higher methylation levels in tumor DNAs than that of healthy cfDNAs (from the discovery dataset) were defined as our hypermethylation biomarkers. For each cfDNA sample, a hypermethylation score (HM-score) was computed to measure the overall methylation level difference of selected biomarkers. The performance of our method was evaluated with the real-world dataset, while the limit of detection was estimated using the simulated low-ctDNA samples. Results: Our model based on 37 hypermethylation MCB biomarkers achieved an area under the curve (AUC) of 0.89 and 0.86 in the real-world pan-cancer discovery and validation cfDNA datasets, respectively. Furthermore, the overall specificity and sensitivity are 100% and 76.19% in the discovery dataset, and 96.67% and 72.86% in the validation dataset. In the validation dataset, 28/40 (70%) of early-stage colorectal cancer patients and 10/20 (50%) of non-small-cell lung cancer patients were successfully diagnosed. Additionally, all the simulated samples with theoretical ctDNA factions over 0.5% were predicted as diseased, demonstrating the ability of our method to detect tumor signals at early stages. Conclusions: Our cfDNA-based epigenetic method outperforms currently available methods in various cancer types, and is promising to be applied to early-stage cancer detection and samples with low ctDNA fractions.


2021 ◽  
Vol 39 (6_suppl) ◽  
pp. 453-453
Author(s):  
William Paul Skelton ◽  
Rohit K. Jain ◽  
Catherine Curran ◽  
Gregory Russell Pond ◽  
Syeda Mahrukh Hussnain Naqvi ◽  
...  

453 Background: The renin-angiotensin system (RAS) is involved in regulation of angiogenesis and cell proliferation. Preclinical data also indicate that angiotensin inhibition may improve drug delivery by enhancing tumor perfusion partly by downregulating transforming growth factor (TGF)-β. Since (TGF)-β appears to be associated with resistance in patients (pts) with metastatic urothelial carcinoma (mUC) receiving PD1/L1 inhibitors, we hypothesized that angiotensin converting enzyme inhibitors (ACEI) and angiotensin receptor blockers (ARBs) may enhance the outcomes of mUC pts receiving PD1/L1 inhibitors. Methods: Data from mUC pts who received PD1/L1 inhibitors as monotherapy were obtained: pts from the Dana-Farber Cancer Institute (DFCI) served as the discovery dataset, while data from Moffitt Cancer Center (MCC) served as the validation dataset. Data for ACEI and ARB administration was collected with concurrent administration defined as ongoing therapy from the time of starting PD1/L1 inhibitor treatment. A logistic regression was used to investigate the impact of concurrent ACEI/ARB on any regression of tumor (ART, any decrease in size of tumor on scan) as the primary endpoint defined as any tumor regression after controlling for known prognostic factors (performance status, sites of metastasis, neutrophil/lymphocyte ratio, platelet count, hemoglobin). Overall survival (OS), the secondary endpoint, was analyzed using Cox proportional hazards regression. Results: Data was available for 178 pts from DFCI (discovery dataset) with mUC who received a PD1/L1 inhibitor of whom 153 (86%) had received prior platinum and 33 pts (18.5%) received concurrent AECI/ARBs. Multivariable analysis controlling for known prognostic factors revealed that patients who received ACEIs or ARBs had greater ART (HR 3.0 [95% CI 1.25-7.17], p = 0.014) and improved OS, (HR 0.49 [95% CI 0.28-0.88] p = 0.016). In the MCC validation dataset, 101 pts were available of whom 59 (58.4%) had received prior platinum and 22 pts (21.8%) received concurrent ACEI/ARBs. Univariate analysis showed that those patients who were treated with ACEI/ARB had an improved ART (OR 3.32 [95% CI 1.22-9.06] p = 0.019). On multivariable analysis, there was a borderline significant association of ACEI/ARB with ART (OR = 3.03, p = 0.075), but no association was observed with OS. Conclusions: In this hypothesis-generating study, concurrent angiotensin inhibitors including ACEI or ARBs were associated with tumor regression in mUC pts receiving PD-1/L1 inhibitors. The inconsistent association with OS may be partly due to modest sample size and comorbidities associated with the need for ACEI/ARBs. These results require validation in a prospective study.


2020 ◽  
Author(s):  
Meng Wang ◽  
Ang Li ◽  
Yong Liu ◽  
Hao Yan ◽  
Yuqing Sun ◽  
...  

AbstractBackgroundSchizophrenia (SZ) typically manifests heterogeneous phenotypes involving positive, negative and cognitive symptoms. However, the underlying neural mechanisms of these symptoms keep unclear. Functional gradient is a fascinating measure to characterize continuous, hierarchical organization of brain.MethodsWe aimed to investigate whether reproducible disruptions of functional gradient existed in SZ compared to normal controls (NC), and these abnormalities were associated with severity of clinical and cognitive symptoms in SZ. All analyses were implemented in two independent large-sample multi-site datasets (discovery dataset, 400 SZ and 336 NC; replication dataset, 279 SZ and 262 NC). First, functional gradient across cerebral cortex was calculated in each subject. Second, vertex-wise comparisons of cortical gradient between SZ and NC groups were performed to identify abnormalities in SZ. Meanwhile, reproducible and robustness analyses were implemented to validate these abnormalities. Finally, regression analyses were performed using generalized additive models to link these abnormalities to severity of clinical and cognitive symptoms in SZ.ResultsWe found an abnormal gradient map in SZ in the discovery dataset, which was reproducible in the replication dataset. The abnormal gradient pattern was also robust when performing methodological alternatives and control analyses. Further, these reproducible abnormalities can reliably predict symptoms of clinical and cognitive domains across the two independent datasets.ConclusionThese findings demonstrated that alterations in functional gradient can provide a reliable signature of SZ, characterizing the heterogenous symptoms of clinical or cognitive domains, and may be further investigated to understand the neurobiological mechanisms of these symptoms.Impact StatementIn our study, using functional gradient measure and statistical learning technology and two independent multi-site case-control resting-state fMRI datasets (discovery dataset: 736 subjects; replication dataset: 541 subjects), we comprehensively investigated functional hierarchical organization in the cerebral cortex of SZ and its association with interindividual severity of symptoms. We found reproducible and robust abnormalities of functional gradient existed in SZ, which provided a reliable signature to characterize negative and general psychopathology symptoms, as well as cognitive deficits. Our findings can provide new insights to understand the neurobiological mechanisms of clinical and cognitive symptoms in SZ.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii12-ii13
Author(s):  
William Chen ◽  
Harish N Vasudevan ◽  
Abrar Choudhury ◽  
Calixto-Hope G Lucas ◽  
Stephen Magill ◽  
...  

Abstract BACKGROUND Clinical biomarkers for identifying patients at risk for recurrence after resection of meningioma are lacking and are needed for guiding adjuvant therapy. The aim of this study was to identify a prognostic gene expression signature for meningioma. METHODS Targeted gene expression analysis was performed on a discovery dataset of 96 meningiomas with suitable tissue identified from a retrospective institutional biorepository. Recurrence was dichotomized based on the median time to local recurrence (TTR). With median follow-up of 6.4 years, the discovery dataset was enriched for clinical endpoints of local recurrence (58%), mortality (42%), and disease-specific mortality (49% of deaths). A 266 gene expression panel was used to interrogate the discovery dataset, and a prognostic gene signature and risk score was generated using prediction analysis for microarrays (PAM) and elastic net regression. The risk score was validated using gene expression data (GSE58037) from 56 meningiomas resected at an independent institution (20% local recurrence, 18% mortality, median follow-up 5.4 years). RESULTS A 36-gene signature was identified achieving an AUC of 0.86 for TTR faster than the median in the discovery cohort. A risk score between 0 and 1 based on this signature was strongly associated with shorter TTR (F-test, P< 0.0001), and on multivariate Cox regression (MVA), was independently associated with recurrence (RR 1.56 per 0.1 increase, 95% CI 1.30–1.90, P< 0.0001) and mortality (RR 1.32 per 0.1 increase, 1.07–1.64, P=0.01) after adjusting for WHO grade, age, extent of resection, and sex. Similarly, in the validation dataset, the gene risk score was correlated with shorter TTR (P=0.002) and associated with mortality on MVA (RR 1.86 per 0.1 increase, 1.19–2.88, P=0.005) after adjustment for WHO grade. CONCLUSIONS The prognostic meningioma gene expression risk score presented here could be useful in identifying patients at higher risk of progression after resection.


2020 ◽  
Author(s):  
Saffron A.G. Willis-Owen ◽  
Clara Domingo Sabugo ◽  
Elizabeth Starren ◽  
Liming Liang ◽  
Maxim B. Freidin ◽  
...  

SummaryLung cancer is the most frequent cause of cancer death worldwide1. It is male predominant and for reasons that are unknown also associated with significantly worse outcomes in men2. Here we compared gene co-expression networks in affected and unaffected pulmonary tissue derived from 126 patients with Stage IA–IV lung cancer. We observed marked degradation of a sex-associated gene co-expression network in tumour tissue. The disturbance was linked to fractional loss of the Y chromosome and was detected in 28% of male tumours in the discovery dataset and 27% of male tumours in a 123 sample replication dataset. Depression of Y chromosome expression was accompanied by extensive autosomal DNA hypomethylation. The male specific H3K4 demethylase, KDM5D, was identified as an apex hub within this co-expression network. Male patients exhibiting relative tumour KDM5D deficiency had an increased risk of death in the discovery dataset (Hazard Ratio [HR] 3.80, 95% CI 1.40 – 10.3, P=0.009) and in an independent sample of 1,100 male lung tumours (HR 1.67, 95% CI 1.4-2.0, P=1.2⨯10−10). Our findings identify tumour-specific weakening of male-specific expression, in particular deficiency of KDM5D, as a common replicable prognostic marker and credible mechanism underlying sex disparity in cancer.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Yinghua Zhao ◽  
Lianying Yang ◽  
Changqing Sun ◽  
Yang Li ◽  
Yangzhige He ◽  
...  

Acute appendicitis is one of the most common acute abdomens, but the confident preoperative diagnosis is still a challenge. In order to profile noninvasive urinary biomarkers that could discriminate acute appendicitis from other acute abdomens, we carried out mass spectrometric experiments on urine samples from patients with different acute abdomens and evaluated diagnostic potential of urinary proteins with various machine-learning models. Firstly, outlier protein pools of acute appendicitis and controls were constructed using the discovery dataset (32 acute appendicitis and 41 control acute abdomens) against a reference set of 495 normal urine samples. Ten outlier proteins were then selected by feature selection algorithm and were applied in construction of machine-learning models using naïve Bayes, support vector machine, and random forest algorithms. The models were assessed in the discovery dataset by leave-one-out cross validation and were verified in the validation dataset (16 acute appendicitis and 45 control acute abdomens). Among the three models, random forest model achieved the best performance: the accuracy was 84.9% in the leave-one-out cross validation of discovery dataset and 83.6% (sensitivity: 81.2%, specificity: 84.4%) in the validation dataset. In conclusion, we developed a 10-protein diagnostic panel by the random forest model that was able to distinguish acute appendicitis from confusable acute abdomens with high specificity, which indicated the clinical application potential of noninvasive urinary markers in disease diagnosis.


Author(s):  
Łukasz Borchmann ◽  
Dawid Wisniewski ◽  
Andrzej Gretkowski ◽  
Izabela Kosmala ◽  
Dawid Jurkiewicz ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document