SpiderLearner: An ensemble approach to Gaussian graphical model estimation

Multivariate biological data are often modeled using networks in which nodes represent a biological variable (e.g., genes) and edges represent associations (e.g., coexpression). A Gaussian graphical model (GGM), or partial correlation network, is an undirected graphical model in which a weighted edge between two nodes represents the magnitude of their partial correlation, and the absence of an edge indicates zero partial correlation. A GGM provides a roadmap of direct dependencies between variables, providing a valuable systems-level perspective. Many methods exist for estimating GGMs; estimated GGMs are typically highly sensitive to choice of method, posing an outstanding statistical challenge. We address this challenge by developing SpiderLearner, a tool that combines a range of candidate GGM estimation methods to construct an ensemble estimate as a weighted average of results from each candidate. In simulation studies, SpiderLearner performs better than or comparably to the best of the candidate methods. We apply SpiderLearner to estimate a GGM for gene expression in a publicly available dataset of 260 ovarian cancer patients. Using the community structure of the GGM, we develop a network-based risk score which we validate in six independent datasets. The risk score requires only seven genes, each of which has important biological function. Our method is flexible, extensible, and has demonstrated potential to identify de novo biomarkers for complex diseases. An open-source implementation of our method is available at https://github.com/katehoffshutta/SpiderLearner.

Download Full-text

Clinical Implications of the SYNTAX Study

Interventional Cardiology Review ◽

10.15420/icr.2009.4.1.48 ◽

2009 ◽

Vol 4 (1) ◽

pp. 48 ◽

Cited By ~ 1

Author(s):

Patrick Serruys ◽

Scot Garg ◽

◽

Keyword(s):

Percutaneous Coronary Intervention ◽

De Novo ◽

Artery Bypass ◽

Bypass Graft ◽

Coronary Intervention ◽

Syntax Score ◽

Cerebrovascular Events ◽

Percutaneous Coronary ◽

Syntax Study ◽

Better Than

Recent years have seen an ongoing debate as to whether coronary artery bypass graft (CABG) surgery or percutaneous coronary intervention (PCI) is the most appropriate revascularisation strategy for patients with coronary heart disease (CAD). The Synergy between Percutaneous Coronary Intervention with TAXUS and Cardiac Surgery (SYNTAX) study was conducted with the intention of defining the specific roles of each therapy in the management of de novo three-vessel disease or left main CAD. Interim results after 12 months show that PCI leads to significantly higher rates of major adverse cardiac or cerebrovascular events compared with CABG (17.8 versus 12.4; p=0.002), largely owing to increased rates of repeat revascularisation. However, CABG was much more likely to lead to stroke. Interestingly, categorisation of patients by severity of CAD complexity according to the SYNTAX score has shown that there are certain patients in whom PCI can yield results that are comparable to, if not better than, those achieved with CABG. Careful clinical evaluation and comprehensive assessment of CAD severity, alongside application of the SYNTAX score, can aid practitioners in selecting the most suitable therapy for each individual CAD patient.

Download Full-text

A Three-Dimensional Tetraphenylethylene-Based Fluorescence Covalent Organic Framework for Molecular Recognition

10.26434/chemrxiv.12982019 ◽

2020 ◽

Author(s):

Junxia Ren ◽

Yaozu Liu ◽

Xin Zhu ◽

Yangyang Pan ◽

Yujie Wang ◽

...

Keyword(s):

Molecular Recognition ◽

Fluorescence Probe ◽

Three Dimensional ◽

Hazardous Substances ◽

Covalent Organic Framework ◽

Highly Sensitive ◽

Volatile Organic ◽

Polycyclic Aromatic ◽

Better Than ◽

Organic Framework

<a></a><a></a><a></a><a></a><a></a><a></a><a></a><a>The development of highly-sensitive recognition of </a><a></a><a></a><a></a><a></a><a>hazardous </a>chemicals, such as volatile organic compounds (VOCs) and polycyclic aromatic hydrocarbons (PAHs), is of significant importance because of their widespread social concerns related to environment and human health. Here, we report a three-dimensional (3D) covalent organic framework (COF, termed JUC-555) bearing tetraphenylethylene (TPE) side chains as an aggregation-induced emission (AIE) fluorescence probe for sensitive molecular recognition.<a></a><a> </a>Due to the rotational restriction of TPE rotors in highly interpenetrated framework after inclusion of dimethylformamide (DMF), JUC-555 shows impressive AIE-based strong fluorescence. Meanwhile, owing to the large pore size (11.4 Å) and suitable intermolecular distance of aligned TPE (7.2 Å) in JUC-555, the obtained material demonstrates an excellent performance in the molecular recognition of hazardous chemicals, e.g., nitroaromatic explosives, PAHs, and even thiophene compounds, via a fluorescent quenching mechanism. The quenching constant (KSV) is two orders of magnitude better than those of other fluorescence-based porous materials reported to date. This research thus opens 3D functionalized COFs as a promising identification tool for environmentally hazardous substances.

Download Full-text

Evaluation for estimating of the PDF and the CDF of Generalized Inverted Exponential Distribution with Application in Industry

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.1.39 ◽

2020 ◽

pp. 507-522

Author(s):

Parisa Torkaman

Keyword(s):

Least Squares ◽

Exponential Distribution ◽

Mean Squared Error ◽

Weighted Least Squares ◽

Real Data ◽

Minimum Variance ◽

Cumulative Distribution ◽

Estimation Methods ◽

Data Set ◽

Better Than

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.

Download Full-text

Identification of Essential Proteins in Yeast Using Mean Weighted Average and Recursive Feature Elimination

Recent Patents on Computer Science ◽

10.2174/2213275911666180918155521 ◽

2019 ◽

Vol 12 (1) ◽

pp. 5-10 ◽

Cited By ~ 5

Author(s):

Sivagnanam Rajamanickam Mani Sekhar ◽

Siddesh Gaddadevara Matt ◽

Sunilkumar S. Manvi ◽

Srinivasa Krishnarajanagar Gopalalyengar

Keyword(s):

Drug Design ◽

Weighted Average ◽

Living Organism ◽

Experimental Result ◽

Recursive Feature Elimination ◽

Protein Interaction Data ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Result Show ◽

Better Than

Background: Essential proteins are significant for drug design, cell development, and for living organism survival. A different method has been developed to predict essential proteins by using topological feature, and biological features. Objective: Still it is a challenging task to predict essential proteins effectively and timely, as the availability of protein protein interaction data depends on network correctness. Methods: In the proposed solution, two approaches Mean Weighted Average and Recursive Feature Elimination is been used to predict essential proteins and compared to select the best one. In Mean Weighted Average consecutive slot data to be taken into aggregated count, to get the nearest value which considered as prescription for the best proteins for the slot, where as in Recursive Feature Elimination method whole data is spilt into different slots and essential protein for each slot is determined. Results: The result shows that the accuracy using Recursive Feature Elimination is at-least nine percentages superior when compared to Mean Weighted Average and Betweenness centrality. Conclusion: Essential proteins are made of genes which are essential for living being survival and drug design. Different approaches have been proposed to anticipate essential proteins using either experimental or computation methods. The experimental result show that the proposed work performs better than other approaches.

Download Full-text

A modified expectation‐maximization algorithm for latent Gaussian graphical model

Canadian Journal of Statistics ◽

10.1002/cjs.11643 ◽

2021 ◽

Author(s):

Chaowen Zheng ◽

Jingfang Huang ◽

Ian A. Wood ◽

Yichao Wu

Keyword(s):

Expectation Maximization ◽

Graphical Model ◽

Expectation Maximization Algorithm ◽

Gaussian Graphical Model

Download Full-text

Incorporation of biologic factors for the staging of de novo stage IV breast cancer

npj Breast Cancer ◽

10.1038/s41523-020-00186-5 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Zhen-Yu He ◽

Chen-Lu Lian ◽

Jun Wang ◽

Jian Lei ◽

Li Hua ◽

...

Keyword(s):

Breast Cancer ◽

Risk Score ◽

Distant Metastasis ◽

De Novo ◽

Stage Iv ◽

Staging System ◽

Sensitivity Analyses ◽

Operating Characteristics ◽

Prognostic Analysis ◽

Stage Iv Breast Cancer

Abstract This study aimed to investigate the prognostic value of biological factors, including histological grade, estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (HER2) status in de novo stage IV breast cancer. Based on eligibility, patient data deposited between 2010 and 2014 were collected from the surveillance, epidemiology, and end results database. The receiver operating characteristics curve, Kaplan–Meier analysis, and Cox proportional hazard analysis were used for analysis. We included 8725 patients with a median 3-year breast cancer-specific survival (BCSS) of 52.6%. Higher histologic grade, HER2-negative, ER-negative, and PR-negative disease were significantly associated with lower BCSS in the multivariate prognostic analysis. A risk score staging system separated patients into four risk groups. The risk score was assigned according to a point system: 1 point for grade 3, 1 point if hormone receptor-negative, and 1 point if HER2-negative. The 3-year BCSS was 76.3%, 64.5%, 48.5%, and 23.7% in patients with 0, 1, 2, and 3 points, respectively, with a median BCSS of 72, 52, 35, and 16 months, respectively (P < 0.001). The multivariate prognostic analysis showed that the risk score staging system was an independent prognostic factor associated with BCSS. Patients with a higher risk score had a lower BCSS. Sensitivity analyses replicated similar findings after stratification according to tumor stage, nodal stage, the sites of distant metastasis, and the number of distant metastasis. In conclusion, our risk score staging system shows promise for the prognostic stratification of de novo stage IV breast cancer.

Download Full-text

A rapid and accurate method for screening T-2 toxin in food and feed using competitive AlphaLISA

FEMS Microbiology Letters ◽

10.1093/femsle/fnab029 ◽

2021 ◽

Vol 368 (6) ◽

Author(s):

Liwen Zhang ◽

Qingyu Lv ◽

Yuling Zheng ◽

Xuan Chen ◽

Decong Kong ◽

...

Keyword(s):

High Sensitivity ◽

Accurate Method ◽

Detection Methods ◽

Cereal Crops ◽

Detection Range ◽

Highly Sensitive ◽

Food And Feed ◽

Good Repeatability ◽

Feed Samples ◽

Better Than

ABSTRACT T-2 is a common mycotoxin contaminating cereal crops. Chronic consumption of food contaminated with T-2 toxin can lead to death, so simple and accurate detection methods in food and feed are necessary. In this paper, we establish a highly sensitive and accurate method for detecting T-2 toxin using AlphaLISA. The system consists of acceptor beads labeled with T-2-bovine serum albumin (BSA), streptavidin-labeled donor beads and biotinylated T-2 antibodies. T-2 in the sample matrix competes with T-2-BSA for antibodies. Adding biotinylated antibodies to the test well followed by T-2 and T-2-BSA acceptor beads yielded a detection range of 0.03–500 ng/mL. The half-maximal inhibitory concentration was 2.28 ng/mL and the coefficient of variation was <10%. In addition, this method had no cross-reaction with other related mycotoxins. This optimized method for extracting T-2 from food and feed samples achieved a recovery rate of approximately 90% in T-2 concentrations as low as 1 ng/mL, better than the performance of a commercial ELISA kit. This competitive AlphaLISA method offers high sensitivity, good specificity, good repeatability and simple operation for detecting T-2 toxin in food and feed.

Download Full-text

Mendelian Randomization Focused Analysis of Vitamin D on the Secondary Prevention of Ischemic Stroke

Stroke ◽

10.1161/strokeaha.120.032634 ◽

2021 ◽

Author(s):

Yap-Hang Chan ◽

C. Mary Schooling ◽

Jie Zhao ◽

Shiu-Lun Au Yeung ◽

Jo Jo Hai ◽

...

Keyword(s):

Myocardial Infarction ◽

Vitamin D ◽

Ischemic Stroke ◽

Risk Score ◽

Genetic Risk ◽

Odds Ratio ◽

Mendelian Randomization ◽

De Novo ◽

Genetic Risk Score ◽

Ischemic Disease

Background and Purpose: Experimental studies showed vitamin D (Vit-D) could promote vascular regeneration and repair. Prior randomized studies had focused mainly on primary prevention. Whether Vit-D protects against ischemic stroke and myocardial infarction recurrence among subjects with prior ischemic insults was unknown. Here, we dissected through Mendelian randomization any effect of Vit-D on the secondary prevention of recurrent ischemic stroke and myocardial infarction. Methods: Based on a genetic risk score for Vit-D constructed from a derivation cohort sample (n=5331, 45% Vit-D deficient, 89% genotyped) via high-throughput exome-chip screening of 12 prior genome-wide association study–identified genetic variants of Vit-D mechanistic pathways ( rs2060793 , rs4588 , and rs7041 ; F statistic, 73; P <0.001), we performed a focused analysis on prospective recurrence of myocardial infarction (MI) and ischemic stroke in an independent subsample with established ischemic disease (n=441, all with prior first ischemic event; follow-up duration, 41.6±14.3 years) under a 2-sample, individual-data, prospective Mendelian randomization approach. Results: In the ischemic disease subsample, 11.1% (n=49/441) had developed recurrent ischemic stroke or MI and 13.3% (n=58/441) had developed recurrent or de novo ischemic stroke/MI. Kaplan-Meier analyses showed that genetic risk score predicted improved event-free survival from recurrent ischemic stroke or MI (log-rank, 13.0; P =0.001). Cox regression revealed that genetic risk score independently predicted reduced risk of recurrent ischemic stroke or MI combined (hazards ratio, 0.62 [95% CI, 0.48–0.81]; P <0.001), after adjusted for potential confounders. Mendelian randomization supported that Vit-D is causally protective against the primary end points of recurrent ischemic stroke or MI (Wald estimate: odds ratio, 0.55 [95% CI, 0.35–0.81]) and any recurrent or de novo ischemic stroke/MI (odds ratio, 0.64 [95% CI, 0.42–0.91]) and recurrent MI alone (odds ratio, 0.52 [95% CI, 0.30–0.81]). Conclusions: Genetically predicted lowering in Vit-D level is causal for the recurrence of ischemic vascular events in persons with prior ischemic stroke or MI.

Download Full-text

Estimating extreme river discharges in Europe through a Bayesian Network

10.5194/hess-2016-250 ◽

2016 ◽

Author(s):

Dominik Paprotny ◽

Oswaldo Morales Nápoles

Keyword(s):

Large Scale ◽

Graphical Model ◽

Flood Hazard ◽

Physical Models ◽

Method Performance ◽

Discharge Data ◽

Dependency Structure ◽

Spatial Coverage ◽

Geographical Characteristics ◽

Better Than

Abstract. Large-scale hydrological modelling of flood hazard requires adequate extreme discharge data. Models based on physics are applied alongside those utilizing only statistical analysis. The former requires enormous computation power, while the latter are most limited in accuracy and spatial coverage. In this paper we introduce an alternate, statistical approach based on Bayesian Networks (BN), a graphical model for dependent random variables. We use a non-parametric BN to describe the joint distribution of extreme discharges in European rivers and variables describing the geographical characteristics of their catchments. Data on annual maxima of daily discharges from more than 1800 river gauge stations were collected, together with information on terrain, land use and climate of catchments that drain to those locations. The (conditional) correlations between the variables are modelled through copulas, with the dependency structure defined in the network. The results show that using this method, mean annual maxima and return periods of discharges could be estimated with an accuracy similar to existing studies using physical models for Europe, and better than a comparable global statistical method. Performance of the model varies slightly between regions of Europe, but is consistent between different time periods, and is not affected by a split-sample validation. The BN was applied to a large domain covering all sizes of rivers in the continent, both for present and future climate, showing large variation in influence of climate change on river discharges, as well as large differences between emission scenarios. The method could be used to provide quick estimates of extreme discharges at any location for the purpose of obtaining input information for hydraulic modelling.

Download Full-text

Contiguity: Contig adjacency graph construction and visualisation

10.7287/peerj.preprints.1037v1 ◽

2015 ◽

Cited By ~ 8

Author(s):

Mitchell J Sullivan ◽

Nouri L Ben Zakour ◽

Brian M Forde ◽

Mitchell Stanton-Cook ◽

Scott A Beatson

Keyword(s):

De Novo ◽

Reference Sequence ◽

De Bruijn Graph ◽

Interactive Software ◽

Graph Exploration ◽

Adjacency Graph ◽

Highly Sensitive ◽

Long Read ◽

Genome Assemblies ◽

Adjacency Graphs

Contiguity is an interactive software for the visualization and manipulation of de novo genome assemblies. Contiguity creates and displays information on contig adjacency which is contextualized by the simultaneous display of a comparison between assembled contigs and reference sequence. Where scaffolders allow unambiguous connections between contigs to be resolved into a single scaffold, Contiguity allows the user to create all potential scaffolds in ambiguous regions of the genome. This enables the resolution of novel sequence or structural variants from the assembly. In addition, Contiguity provides a sequencing and assembly agnostic approach for the creation of contig adjacency graphs. To maximize the number of contig adjacencies determined, Contiguity combines information from read pair mappings, sequence overlap and De Bruijn graph exploration. We demonstrate how highly sensitive graphs can be achieved using this method. Contig adjacency graphs allow the user to visualize potential arrangements of contigs in unresolvable areas of the genome. By combining adjacency information with comparative genomics, Contiguity provides an intuitive approach for exploring and improving sequence assemblies. It is also useful in guiding manual closure of long read sequence assemblies. Contiguity is an open source application, implemented using Python and the Tkinter GUI package that can run on any Unix, OSX and Windows operating system. It has been designed and optimized for bacterial assemblies. Contiguity is available at http://mjsull.github.io/Contiguity .

Download Full-text