scholarly journals FAIRness in Biomedical Data Discovery

Author(s):  
Alina Trifan ◽  
José Oliveira
2014 ◽  
Vol 21 (2) ◽  
pp. 379-383 ◽  
Author(s):  
Jeffrey W Pennington ◽  
Byron Ruth ◽  
Michael J Italia ◽  
Jeffrey Miller ◽  
Stacey Wrazien ◽  
...  

2018 ◽  
Vol 25 (3) ◽  
pp. 300-308 ◽  
Author(s):  
Xiaoling Chen ◽  
Anupama E Gururaj ◽  
Burak Ozyurt ◽  
Ruiling Liu ◽  
Ergin Soysal ◽  
...  

Abstract Objective Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. Materials and Methods DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Results and Conclusion Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.


2017 ◽  
Vol 25 (3) ◽  
pp. 337-344 ◽  
Author(s):  
Ram Dixit ◽  
Deevakar Rogith ◽  
Vidya Narayana ◽  
Mandana Salimi ◽  
Anupama Gururaj ◽  
...  

Abstract Objective To present user needs and usability evaluations of DataMed, a Data Discovery Index (DDI) that allows searching for biomedical data from multiple sources. Materials and Methods We conducted 2 phases of user studies. Phase 1 was a user needs analysis conducted before the development of DataMed, consisting of interviews with researchers. Phase 2 involved iterative usability evaluations of DataMed prototypes. We analyzed data qualitatively to document researchers’ information and user interface needs. Results Biomedical researchers’ information needs in data discovery are complex, multidimensional, and shaped by their context, domain knowledge, and technical experience. User needs analyses validate the need for a DDI, while usability evaluations of DataMed show that even though aggregating metadata into a common search engine and applying traditional information retrieval tools are promising first steps, there remain challenges for DataMed due to incomplete metadata and the complexity of data discovery. Discussion Biomedical data poses distinct problems for search when compared to websites or publications. Making data available is not enough to facilitate biomedical data discovery: new retrieval techniques and user interfaces are necessary for dataset exploration. Consistent, complete, and high-quality metadata are vital to enable this process. Conclusion While available data and researchers’ information needs are complex and heterogeneous, a successful DDI must meet those needs and fit into the processes of biomedical researchers. Research directions include formalizing researchers’ information needs, standardizing overviews of data to facilitate relevance judgments, implementing user interfaces for concept-based searching, and developing evaluation methods for open-ended discovery systems such as DDIs.


Major challenge in the analysis of clinical data and knowledge discovery is to suggest an integrated, advanced and efficient tools, methods and technologies for access and processing of progressively increasing amounts of data in multiple formats. The paper presents a platform for multidimensional large-scale biomedical data management and analytics, which covers all phases of data discovery, data integration, data preprocessing, data storage, data analytics and visualization. The goal is to suggest an intelligent solution as integrated, scalable workflow development environment consisting of a suite of software tools to automate the computational process in conducting scientific experiments.


2019 ◽  
Author(s):  
George Alter ◽  
Alejandra Gonzalez-Beltran ◽  
Lucila Ohno-Machado ◽  
Philippe Rocca-Serra

AbstractThis article presents elements in the Data Tags Suite (DATS) metadata schema describing data access, data use conditions, and consent information. DATS is a product of the bioCADDIE Project, which created a data discovery index for searching across all types of biomedical data. The “access and use” metadata items in DATS are designed from the perspective of a researcher who wants to find and re-use existing data. Data reuse is often controlled to protect the privacy of subjects and patients. We focus on the impact of data protection procedures on data users. However, these procedures are part of a larger environment around patient privacy protection, and this article puts DATS metadata into the context of the administrative, legal, and technical systems used to protect confidential data.


2021 ◽  
Vol 15 (8) ◽  
pp. 912-926
Author(s):  
Ge Zhang ◽  
Pan Yu ◽  
Jianlin Wang ◽  
Chaokun Yan

Background: There have been rapid developments in various bioinformatics technologies, which have led to the accumulation of a large amount of biomedical data. However, these datasets usually involve thousands of features and include much irrelevant or redundant information, which leads to confusion during diagnosis. Feature selection is a solution that consists of finding the optimal subset, which is known to be an NP problem because of the large search space. Objective: For the issue, this paper proposes a hybrid feature selection method based on an improved chemical reaction optimization algorithm (ICRO) and an information gain (IG) approach, which called IGICRO. Methods: IG is adopted to obtain some important features. The neighborhood search mechanism is combined with ICRO to increase the diversity of the population and improve the capacity of local search. Results: Experimental results of eight public available data sets demonstrate that our proposed approach outperforms original CRO and other state-of-the-art approaches.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Yan Gao ◽  
Yan Cui

A Correction to this paper has been published: https://doi.org/10.1038/s41467-020-20480-x


Author(s):  
Shilpa Nadimpalli Kobren ◽  
◽  
Dustin Baldridge ◽  
Matt Velinder ◽  
Joel B. Krier ◽  
...  

Abstract Purpose Genomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful. Methods We collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols. Results We found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases. Conclusion The largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases.


Sign in / Sign up

Export Citation Format

Share Document