multiple dataset
Recently Published Documents


TOTAL DOCUMENTS

18
(FIVE YEARS 7)

H-INDEX

4
(FIVE YEARS 0)

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Anthony Mammoliti ◽  
Petr Smirnov ◽  
Minoru Nakano ◽  
Zhaleh Safikhani ◽  
Christopher Eeles ◽  
...  

AbstractReproducibility is essential to open science, as there is limited relevance for findings that can not be reproduced by independent research groups, regardless of its validity. It is therefore crucial for scientists to describe their experiments in sufficient detail so they can be reproduced, scrutinized, challenged, and built upon. However, the intrinsic complexity and continuous growth of biomedical data makes it increasingly difficult to process, analyze, and share with the community in a FAIR (findable, accessible, interoperable, and reusable) manner. To overcome these issues, we created a cloud-based platform called ORCESTRA (orcestra.ca), which provides a flexible framework for the reproducible processing of multimodal biomedical data. It enables processing of clinical, genomic and perturbation profiles of cancer samples through automated processing pipelines that are user-customizable. ORCESTRA creates integrated and fully documented data objects with persistent identifiers (DOI) and manages multiple dataset versions, which can be shared for future studies.


2021 ◽  
Vol 105 ◽  
pp. 102224
Author(s):  
Tehsin Kanwal ◽  
Adeel Anjum ◽  
Saif U.R. Malik ◽  
Haider Sajjad ◽  
Abid Khan ◽  
...  

2020 ◽  
Author(s):  
Stephen Coleman ◽  
Paul D.W. Kirk ◽  
Chris Wallace

AbstractMotivationCluster analysis is an integral part of precision medicine and systems biology, used to define groups of patients or biomolecules. However, problems such as choosing the number of clusters and issues with high dimensional data arise consistently. An ensemble approach, such as consensus clustering, can overcome some of the difficulties associated with high dimensional data, frequently exploring more relevant clustering solutions than individual models. Another tool for cluster analysis, Bayesian mixture modelling, has alternative advantages, including the ability to infer the number of clusters present and extensibility. However, inference of these models is often performed using Markov-chain Monte Carlo (MCMC) methods which can suffer from problems such as poor exploration of the posterior distribution and long runtimes. This makes applying Bayesian mixture models and their extensions to ‘omics data challenging. We apply consensus clustering to Bayesian mixture models to address these problems.ResultsConsensus clustering of Bayesian mixture models successfully finds generating structure in our simulation study and captures multiple modes in the likelihood surface. This approach also offers significant reductions in runtime compared to traditional Bayesian inference when a parallel environment is available. We propose a heuristic to decide upon ensemble size and then apply consensus clustering to Multiple Dataset Integration, an extension of Bayesian mixture models for integrative analyses, on three ‘omics datasets for budding yeast. We find clusters of genes that are co-expressed and have common regulatory proteins which we validate using external knowledge, showing consensus clustering can be applied to any MCMC-based clustering method.


2020 ◽  
Author(s):  
Maarten JMF Reijnders ◽  
Robert M Waterhouse

AbstractThe Gene Ontology (GO) is a cornerstone of functional genomics research that drives discoveries through knowledge-informed computational analysis of biological data from large- scale assays. Key to this success is how the GO can be used to support hypotheses or conclusions about the biology or evolution of a study system by identifying annotated functions that are overrepresented in subsets of genes of interest. Graphical visualisations of such GO term enrichment results are critical to aid interpretation and avoid biases by presenting researchers with intuitive visual data summaries. Amongst current visualisation tools and resources there is a lack of standalone open-source software solutions that facilitate systematic comparisons of multiple lists of GO terms. To address this we developed GO-Figure!, an open-source Python software for producing user-customisable semantic similarity scatterplots of redundancy-reduced GO term lists. The lists are simplified by grouping together GO terms with similar functions using their quantified information contents and semantic similarities, with user-control over grouping thresholds. Representatives are then selected for plotting in two-dimensional semantic space where similar GO terms are placed closer to each other on the scatterplot, with an array of user-customisable graphical attributes. GO-Figure! offers a simple solution for command-line plotting of informative summary visualisations of lists of GO terms, designed to support exploratory data analyses and multiple dataset comparisons.


Author(s):  
Dane Burden ◽  
Nic Roniger ◽  
Matt Romney

Abstract Unique characteristics of individual pipelines come from over a century of evolving design, construction, maintenance, regulation and operation. These characteristics are especially true for legacy, pre-regulated pipelines. Due to the unique nature of the threats present on these assets, there is a need for unique inspection technologies and techniques that can increase pipeline integrity. Reconditioned and repaired pipe utilizing puddle weld repairs is one such threat. An advanced analysis was completed on a 10-inch, 68-mile light products pipeline. The pipeline was constructed with reconditioned pipe that was estimated to contain tens of thousands of puddle welds. Historical in-line inspection (ILI) data generally underperformed in classifying and discriminating puddle welds versus metal loss features. The primary objective of this project was to assess the probability of identification (POI) of a multiple dataset ILI tool utilizing multiple magnetic flux leakage (MFL) magnetization directions and residual (RES) magnetization measurements. A secondary objective was to scrutinize data for signs of coincident features. Hydrostatic testing failures showed that puddle welds with porosity and cracking were susceptible to failure and that the identification of these features would be beneficial. Analysis of historical puddle weld investigations and newly completed multiple dataset ILI data revealed strong identification capabilities in the RES dataset. The high-field magnetizations offered secondary confirmation but often saturated out thermal effects or material differences. The final report included over 40,000 identified puddle welds and five classifications for further investigation. Field investigations for 212 features were completed and the results compared to the ILI data to assess performance. A confusion matrix was created for true positive (TP), true negative (TN), false positive (FP) and false negative (FN) conditions. The smallest TP puddle weld dimension was 0.7″ × 0.7″, and the population had a statistical sensitivity value of 98% (132 TP and 3 FP). Three additional anomalies denoted as atypical were also investigated. The ILI signatures at these locations were consistent with previous repairs in which puddle welds with cracking were found and repaired. Two of the three features investigated were found to have cracking. Crack propagation was found to be both axial and non-axial in orientation. The results show that puddle welds can be detected and identified with extremely high accuracy. In addition, the preliminary classification results for atypical puddle welds show a high potential for identifying secondary coincident features. This paper details the stages, deliverables and results from an ILI advanced analysis focused on puddle welds.


Cloud Computing is an emerging field with lot of possibilities for the maintenance at the Infrastructure Layer and Software Layer. A storage architecture is associated with two processes namely the storage and the retrieval process. The storage architecture plays a vital role in how quickly the data is retrieved. The retrieved data is presented as per the weight of the retrieved data. This paper presents a novel secure storage and the ranking mechanism for the documents for cloud. As no previous reference for any data is kept at the server, the data is encrypted based on the co-relation between the data files calculated by Cosine similarity. The ranking of the retrieved data is done through Supervised Machine learning mechanism. The evaluation of the parameters are done on the base of computation time and total number of true retrievals on multi-keyword search. Multiple dataset from Kaggle are used to perform and cross validate the proposed algorithm


2019 ◽  
Vol 8 (4) ◽  
pp. 137-152
Author(s):  
Clement Ntori ◽  
Emmanuel Arhin ◽  
Ibrahim Abdul Sulemana ◽  
D. Y Antwi Boateng
Keyword(s):  

2018 ◽  
Vol 39 (12) ◽  
pp. 3926-3938 ◽  
Author(s):  
Duole Feng ◽  
Le Yu ◽  
Yuanyan Zhao ◽  
Yuqi Cheng ◽  
Yidi Xu ◽  
...  

Author(s):  
HARUNA CHIROMA ◽  
ABDULSALAM YA'U GITAL ◽  
ADAMU I. ABUBAKAR ◽  
SANAH ABDULLAHI MUAZ ◽  
JAAFAR Z. MAITAMA ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document