scholarly journals Perspective on Data Science

Author(s):  
Roger D. Peng ◽  
Hilary S. Parker

The field of data science currently enjoys a broad definition that includes a wide array of activities which borrow from many other established fields of study. Having such a vague characterization of a field in the early stages might be natural, but over time maintaining such a broad definition becomes unwieldy and impedes progress. In particular, the teaching of data science is hampered by the seeming need to cover many different points of interest. Data scientists must ultimately identify the core of the field by determining what makes the field unique and what it means to develop new knowledge in data science. In this review we attempt to distill some core ideas from data science by focusing on the iterative process of data analysis and develop some generalizations from past experience. Generalizations of this nature could form the basis of a theory of data science and would serve to unify and scale the teaching of data science to large audiences. Expected final online publication date for the Annual Review of Statistics, Volume 9 is March 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Stephan C. Meylan ◽  
Elika Bergelson

Children's linguistic knowledge and the learning mechanisms by which they acquire it grow substantially in infancy and toddlerhood, yet theories of word learning largely fail to incorporate these shifts. Moreover, researchers’ often-siloed focus on either familiar word recognition or novel word learning limits the critical consideration of how these two relate. As a step toward a mechanistic theory of language acquisition, we present a framework of “learning through processing” and relate it to the prevailing methods used to assess children's early knowledge of words. Incorporating recent empirical work, we posit a specific, testable timeline of qualitative changes in the learning process in this interval. We conclude with several challenges and avenues for building a comprehensive theory of early word learning: better characterization of the input, reconciling results across approaches, and treating lexical knowledge in the nascent grammar with sufficient sophistication to ensure generalizability across languages and development. Expected final online publication date for the Annual Review of Linguistics, Volume 8 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Bethany Percha

Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Siobhan Cleary ◽  
Cathal Seoighe

Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Paul B. Talbert ◽  
Steven Henikoff

Nucleosomes wrap DNA and impede access for the machinery of transcription. The core histones that constitute nucleosomes are subject to a diversity of posttranslational modifications, or marks, that impact the transcription of genes. Their functions have sometimes been difficult to infer because the enzymes that write and read them are complex, multifunctional proteins. Here, we examine the evidence for the functions of marks and argue that the major marks perform a fairly small number of roles in either promoting transcription or preventing it. Acetylations and phosphorylations on the histone core disrupt histone-DNA contacts and/or destabilize nucleosomes to promote transcription. Ubiquitylations stimulate methylations that provide a scaffold for either the formation of silencing complexes or resistance to those complexes, and carry a memory of the transcriptional state. Tail phosphorylations deconstruct silencing complexes in particular contexts. We speculate that these fairly simple roles form the basis of transcriptional regulation by histone marks. Expected final online publication date for the Annual Review of Genomics and Human Genetics Volume 22 is August 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
George-John Nychas ◽  
Emma Sims ◽  
Panagiotis Tsakanikas ◽  
Fady Mohareb

Food safety is one of the main challenges of the agri-food industry that is expected to be addressed in the current environment of tremendous technological progress, where consumers’ lifestyles and preferences are in a constant state of flux. Food chain transparency and trust are drivers for food integrity control and for improvements in efficiency and economic growth. Similarly, the circular economy has great potential to reduce wastage and improve the efficiency of operations in multi-stakeholder ecosystems. Throughout the food chain cycle, all food commodities are exposed to multiple hazards, resulting in a high likelihood of contamination. Such biological or chemical hazards may be naturally present at any stage of food production, whether accidentally introduced or fraudulently imposed, risking consumers’ health and their faith in the food industry. Nowadays, a massive amount of data is generated, not only from the next generation of food safety monitoring systems and along the entire food chain (primary production included) but also from the internet of things, media, and other devices. These data should be used for the benefit of society, and the scientific field of data science should be a vital player in helping to make this possible. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Paul Embrechts ◽  
Mario V. Wüthrich

For centuries, mathematicians and, later, statisticians, have found natural research and employment opportunities in the realm of insurance. By definition, insurance offers financial cover against unforeseen events that involve an important component of randomness, and consequently, probability theory and mathematical statistics enter insurance modeling in a fundamental way. In recent years, a data deluge, coupled with ever-advancing information technology and the birth of data science, has revolutionized or is about to revolutionize most areas of actuarial science as well as insurance practice. We discuss parts of this evolution and, in the case of non-life insurance, show how a combination of classical tools from statistics, such as generalized linear models and, e.g., neural networks contribute to better understanding and analysis of actuarial data. We further review areas of actuarial science where the cross fertilization between stochastics and insurance holds promise for both sides. Of course, the vastness of the field of insurance limits our choice of topics; we mainly focus on topics closer to our main areas of research. Expected final online publication date for the Annual Review of Statistics, Volume 9 is March 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2021 ◽  
Vol 75 (1) ◽  
Author(s):  
Graham J. Britton ◽  
Jeremiah J. Faith

Despite identification of numerous associations between microbiomes and diseases, the complexity of the human microbiome has hindered identification of individual species and strains that are causative in host phenotype or disease. Uncovering causative microbes is vital to fully understand disease processes and to harness the potential therapeutic benefits of microbiota manipulation. Developments in sequencing technology, animal models, and bacterial culturing have facilitated the discovery of specific microbes that impact the host and are beginning to advance the characterization of host-microbiome interaction mechanisms. We summarize the historical and contemporary experimental approaches taken to uncover microbes from the microbiota that affect host biology and describe examples of commensals that have specific effects on the immune system, inflammation, and metabolism. There is still much to learn, and we lay out challenges faced by the field and suggest potential remedies for common pitfalls encountered in the hunt for causative commensal microbes. Expected final online publication date for the Annual Review of Microbiology, Volume 75 is October 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Eiji Ohtani

Hydrogen and deuterium isotopic evidence indicates that the source of terrestrial water was mostly meteorites, with additional influx from nebula gas during accretion. There are two Earth models, with large (7–12 ocean masses) and small (1–4 ocean masses) water budgets that can explain the geochemical, cosmochemical, and geological observations. Geophysical and mineral physics data indicate that the upper and lower mantles are generally dry, whereas the mantle transition zone is wetter, with heterogeneous water distribution. Subducting slabs are a source of water influx, and there are three major sites of deep dehydration: the base of the upper mantle, and the top and bottom of the lower mantle in addition to slabs in the shallow upper mantle. Hydrated regions surround these dehydration sites. The core may be a hidden reservoir of hydrogen under the large water budget model. ▪ Earth is a water planet. Where and when was water delivered, and how much? How does water circulate in Earth? This review looks at the current answers to these fundamental questions. Expected final online publication date for the Annual Review of Earth and Planetary Sciences, Volume 49 is May 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Andrea A. Cabrera ◽  
Martine Bérubé ◽  
Xênia M. Lopes ◽  
Marie Louis ◽  
Tom Oosting ◽  
...  

Studies of cetacean evolution using genetics and other biomolecules have come a long way—from the use of allozymes and short sequences of mitochondrial or nuclear DNA to the assembly of full nuclear genomes and characterization of proteins and lipids. Cetacean research has also advanced from using only contemporary samples to analyzing samples dating back thousands of years, and to retrieving data from indirect environmental sources, including water or sediments. Combined, these studies have profoundly deepened our understanding of the origin of cetaceans; their adaptation and speciation processes; and of the past population change, migration, and admixture events that gave rise to the diversity of cetaceans found today. Expected final online publication date for the Annual Review of Ecology, Evolution, and Systematics, Volume 52 is November 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Jingqi Chen ◽  
Guiying Dong ◽  
Liting Song ◽  
Xingzhong Zhao ◽  
Jixin Cao ◽  
...  

The accumulation of vast amounts of multimodal data for the human brain, in both normal and disease conditions, has provided unprecedented opportunities for understanding why and how brain disorders arise. Compared with traditional analyses of single datasets, the integration of multimodal datasets covering different types of data (i.e., genomics, transcriptomics, imaging, etc.) has shed light on the mechanisms underlying brain disorders in greater detail across both the microscopic and macroscopic levels. In this review, we first briefly introduce the popular large datasets for the brain. Then, we discuss in detail how integration of multimodal human brain datasets can reveal the genetic predispositions and the abnormal molecular pathways of brain disorders. Finally, we present an outlook on how future data integration efforts may advance the diagnosis and treatment of brain disorders. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Sign in / Sign up

Export Citation Format

Share Document