scholarly journals DISA tool: discriminative and informative subspace assessment with categorical and numerical outcomes

2021 ◽  
Author(s):  
Leonardo Duarte Rodrigues Alexandre ◽  
Rafael S. Costa ◽  
Rui Henriques

Motivation: Pattern discovery and subspace clustering play a central role in the biological domain, supporting for instance putative regulatory module discovery from omic data for both descriptive and predictive ends. In the presence of target variables (e.g. phenotypes), regulatory patterns should further satisfy delineate discriminative power properties, well-established in the presence of categorical outcomes, yet largely disregarded for numerical outcomes, such as risk profiles and quantitative phenotypes. Results: DISA (Discriminative and Informative Subspace Assessment), a Python software package, is proposed to assess patterns in the presence of numerical outcomes using well-established measures together with a novel principle able to statistically assess the correlation gain of the subspace against the overall space. Results confirm the possibility to soundly extend discriminative criteria towards numerical outcomes without the drawbacks well-associated with discretization procedures. A case study is provided to show the properties of the proposed method. Availability: DISA is freely available at https://github.com/JupitersMight/DISA under the MIT license.

2009 ◽  
Vol 75 (23) ◽  
pp. 7537-7541 ◽  
Author(s):  
Patrick D. Schloss ◽  
Sarah L. Westcott ◽  
Thomas Ryabin ◽  
Justine R. Hall ◽  
Martin Hartmann ◽  
...  

ABSTRACT mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.


Author(s):  
RAFFAELLA GUGLIELMANN ◽  
LILIANA IRONI

Fuzzy systems properly integrated with Qualitative Reasoning approaches yield a hybrid identification method, called FS-QM, that outperforms traditional data-driven approaches in terms of robustness, interpretability and efficiency in both rich and poor data contexts. This results from the embedment of the entire system dynamics predicted by the simulation of its qualitative model, represented by fuzzy-rules, into the fuzzy system. However, the intrinsic limitation of qualitative simulation to scale up to complex and large systems significantly reduces its efficient applicability to real-world problems. The novelty of this paper deals with a divide-and-conquer approach that aims at making qualitative simulation tractable and the derived behavioural description comprehensible and exhaustive, and consequently usable to perform system identification. The partition of the complete model into smaller ones prevents the generation of a complete temporal ordering of all unrelated events, that is one of the major causes of intractable branching in qualitative simulation. The set of generated behaviours is drastically but beneficially reduced as it still captures the entire range of possible dynamical distinctions. Thus, the properties of the correspondent fuzzy-rule base, that guarantee robustness and interpretability of the identified model, are preserved. The strategy we propose is discussed through a case study from the biological domain.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
Y X Li ◽  
J Jiang ◽  
Y Zhang ◽  
J P Li ◽  
Y Huo

Abstract Introduction Clinical data repositories (CDR) including electronic health record (EHR) data have great potential for outcome prediction and risk modeling. However, most CDRs were only used for data displaying, and using data from CDR for outcome prediction often requires careful study design and sophisticated modeling techniques before a hypothesis can be tested. Purpose We built a prediction tool integrated with CDR based on pattern discovery aiming to bridge the above gap and demonstrated a case study on contrast related acute kidney injury (AKI) with the system. Methods A cardiovascular CDR integrated with multiple hospital informatics systems was established. For the case study on AKI, we included patients undergoing cardiac catheterization from January 13, 2015 to April 27, 2017, excluding those with dialysis, end-stage renal disease, renal transplant, and missing pre- or post-procedural creatinine. To handle missing data, a prior-history-note composer was designed to fill in structured data of 14 diseases related to cardiovascular problem. Crucial data such as ejective fraction was extracted from the structured reports. AKI was defined according to Acute Kidney Injury Network by increase of serum creatinine from most recent baseline to the post-procedure 7-day peak. To build predictive modeling, we selected 17 variables covered in existing AKI models. Pattern discovery was recently developed as an interpretable predictive model which works on incomplete noisy data. In this study, we developed a pattern discovery based visual analytics tool, and trained it on 70% data up to August 2016 with three interactive knowledge incorporation modes to develop 3 models: 1) pure data-driven, 2) domain knowledge, and 3) clinician-interactive. In last two modes, a physician using the visual analytics could change the variables and further refine the model, respectively. We tested and compared it with other models on the 30% consecutive patients dated afterwards, which is shown in Figure 1. Results Among 2,560 patients in the final dataset with 17 pre-procedure variables derived from CDR data, 169 (7.3%) had AKI. We measured 4 existing models, whose areas under curves (AUCs) of receiver operating characteristics curve for the test set were 0.70 (Mehran's), 0.72 (Chen's), 0.67 (Gao's) and 0.62 (AGEF), respectively. A pure data-driven machine learning method achieves AUC of 0.72 (Easy Ensemble). The AUCs of our 3 models are 0.77, 0.80, 0.82, respectively, with the last being top where physician knowledge is incorporated. Demo and demonstration Conclusions We developed a novel pattern-discovery-based outcome prediction tool integrated with CDR and purely using EHR data. On the case of predicting contrast related AKI, the tool showed user-friendliness by physicians, and demonstrated a competitive performance in comparison with the state-of-the-art models.


Author(s):  
S. Sarker

The case study describes the process of implementation of an integrated software package at the Thai subsidiary (SMTL) of a Hong Kong-based multinational company (SMHK) engaged in the manufacturing of electronic equipment.


1997 ◽  
Vol XVII (6) ◽  
pp. 98-107
Author(s):  
Hyoseob Kim ◽  
Cornelia Boldyreff

1983 ◽  
Vol 6 (2) ◽  
pp. 50-57 ◽  
Author(s):  
Lynn S. Fuchs ◽  
Stanley L. Deno ◽  
Phyllis K. Mirkin

This paper provides a rationale for and describes a continuous evaluation system, data-based program modification (DBPM), which has demonstrated technical adequacy, logistical feasibility, and instructional effectiveness. Additionally, the paper illustrates the use of DBPM with a case study, and then describes the DBPM software package that stores, summarizes, analyzes, and displays a graph of student performance data.


Sign in / Sign up

Export Citation Format

Share Document