NIH Training and Education for Biomedical Data Science

Author(s):  
Valerie Florance
F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 251
Author(s):  
John Van Horn ◽  
Sumiko Abe ◽  
José Luis Ambite ◽  
Teresa K. Attwood ◽  
Niall Beard ◽  
...  

The increasing richness and diversity of biomedical data types creates major organizational and analytical impediments to rapid translational impact in the context of training and education. As biomedical data-sets increase in size, variety and complexity, they challenge conventional methods for sharing, managing and analyzing those data. In May 2017, we convened a two-day meeting between the BD2K Training Coordinating Center (TCC), ELIXIR Training/TeSS, GOBLET, H3ABioNet, EMBL-ABR, bioCADDIE and the CSIRO, in Huntington Beach, California, to compare and contrast our respective activities, and how these might be leveraged for wider impact on an international scale. Discussions focused on the role of i) training for biomedical data science; ii) the need to promote core competencies, and the ii) development of career paths. These led to specific conversations about i) the values of standardizing and sharing data science training resources; ii) challenges in encouraging adoption of training material standards; iii) strategies and best practices for the personalization and customization of learning experiences; iv) processes of identifying stakeholders and determining how they should be accommodated; and v) discussions of joint partnerships to lead the world on data science training in ways that benefit all stakeholders. Generally, international cooperation was viewed as essential for accommodating the widest possible participation in the modern bioscience enterprise, providing skills in a truly “FAIR” manner, addressing the importance of data science understanding worldwide. Several recommendations for the exchange of educational frameworks are made, along with potential sources for support, and plans for further cooperative efforts are presented.


Author(s):  
Bethany Percha

Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Siobhan Cleary ◽  
Cathal Seoighe

Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2019 ◽  
Vol 21 (4) ◽  
pp. 1182-1195
Author(s):  
Andrew C Liu ◽  
Krishna Patel ◽  
Ramya Dhatri Vunikili ◽  
Kipp W Johnson ◽  
Fahad Abdu ◽  
...  

Abstract Sepsis is a series of clinical syndromes caused by the immunological response to infection. The clinical evidence for sepsis could typically attribute to bacterial infection or bacterial endotoxins, but infections due to viruses, fungi or parasites could also lead to sepsis. Regardless of the etiology, rapid clinical deterioration, prolonged stay in intensive care units and high risk for mortality correlate with the incidence of sepsis. Despite its prevalence and morbidity, improvement in sepsis outcomes has remained limited. In this comprehensive review, we summarize the current landscape of risk estimation, diagnosis, treatment and prognosis strategies in the setting of sepsis and discuss future challenges. We argue that the advent of modern technologies such as in-depth molecular profiling, biomedical big data and machine intelligence methods will augment the treatment and prevention of sepsis. The volume, variety, veracity and velocity of heterogeneous data generated as part of healthcare delivery and recent advances in biotechnology-driven therapeutics and companion diagnostics may provide a new wave of approaches to identify the most at-risk sepsis patients and reduce the symptom burden in patients within shorter turnaround times. Developing novel therapies by leveraging modern drug discovery strategies including computational drug repositioning, cell and gene-therapy, clustered regularly interspaced short palindromic repeats -based genetic editing systems, immunotherapy, microbiome restoration, nanomaterial-based therapy and phage therapy may help to develop treatments to target sepsis. We also provide empirical evidence for potential new sepsis targets including FER and STARD3NL. Implementing data-driven methods that use real-time collection and analysis of clinical variables to trace, track and treat sepsis-related adverse outcomes will be key. Understanding the root and route of sepsis and its comorbid conditions that complicate treatment outcomes and lead to organ dysfunction may help to facilitate identification of most at-risk patients and prevent further deterioration. To conclude, leveraging the advances in precision medicine, biomedical data science and translational bioinformatics approaches may help to develop better strategies to diagnose and treat sepsis in the next decade.


Author(s):  
José Luis Ambite ◽  
Jonathan Gordon ◽  
Lily Fierro ◽  
Gully Burns ◽  
Joel Mathew

The availability of massive datasets in genetics, neuroimaging, mobile health, and other subfields of biology and medicine promises new insights but also poses significant challenges. To realize the potential of big data in biomedicine, the National Institutes of Health launched the Big Data to Knowledge (BD2K) initiative, funding several centers of excellence in biomedical data analysis and a Training Coordinating Center (TCC) tasked with facilitating online and inperson training of biomedical researchers in data science. A major initiative of the BD2K TCC is to automatically identify, describe, and organize data science training resources available on the Web and provide personalized training paths for users. In this paper, we describe the construction of ERuDIte, the Educational Resource Discovery Index for Data Science, and its release as linked data. ERuDIte contains over 11,000 training resources including courses, video tutorials, conference talks, and other materials. The metadata for these resources is described uniformly using Schema.org. We use machine learning techniques to tag each resource with concepts from the Data Science Education Ontology, which we developed to further describe resource content. Finally, we map references to people and organizations in learning resources to entities in DBpedia, DBLP, and ORCID, embedding our collection in the web of linked data. We hope that ERuDIte will provide a framework to foster open linked educational resources on the Web.


2018 ◽  
Vol 1 (1) ◽  
pp. 181-205 ◽  
Author(s):  
Pierre Baldi

Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.


2020 ◽  
Vol 3 (1) ◽  
pp. 43-59
Author(s):  
Peter M. Kasson

Infectious disease research spans scales from the molecular to the global—from specific mechanisms of pathogen drug resistance, virulence, and replication to the movement of people, animals, and pathogens around the world. All of these research areas have been impacted by the recent growth of large-scale data sources and data analytics. Some of these advances rely on data or analytic methods that are common to most biomedical data science, while others leverage the unique nature of infectious disease, namely its communicability. This review outlines major research progress in the past few years and highlights some remaining opportunities, focusing on data or methodological approaches particular to infectious disease.


Author(s):  
William La Cava ◽  
Heather Williams ◽  
Weixuan Fu ◽  
Steve Vitale ◽  
Durga Srivatsan ◽  
...  

Abstract Motivation Many researchers with domain expertise are unable to easily apply machine learning (ML) to their bioinformatics data due to a lack of ML and/or coding expertise. Methods that have been proposed thus far to automate ML mostly require programming experience as well as expert knowledge to tune and apply the algorithms correctly. Here, we study a method of automating biomedical data science using a web-based AI platform to recommend model choices and conduct experiments. We have two goals in mind: first, to make it easy to construct sophisticated models of biomedical processes; and second, to provide a fully automated AI agent that can choose and conduct promising experiments for the user, based on the user’s experiments as well as prior knowledge. To validate this framework, we conduct an experiment on 165 classification problems, comparing to state-of-the-art, automated approaches. Finally, we use this tool to develop predictive models of septic shock in critical care patients. Results We find that matrix factorization-based recommendation systems outperform metalearning methods for automating ML. This result mirrors the results of earlier recommender systems research in other domains. The proposed AI is competitive with state-of-the-art automated ML methods in terms of choosing optimal algorithm configurations for datasets. In our application to prediction of septic shock, the AI-driven analysis produces a competent ML model (AUROC 0.85±0.02) that performs on par with state-of-the-art deep learning results for this task, with much less computational effort. Availability and implementation PennAI is available free of charge and open-source. It is distributed under the GNU public license (GPL) version 3. Supplementary information Supplementary data are available at Bioinformatics online.


Data Science ◽  
2017 ◽  
Vol 1 (1-2) ◽  
pp. 19-25 ◽  
Author(s):  
Lawrence E. Hunter

Sign in / Sign up

Export Citation Format

Share Document