scholarly journals A scoping review of ‘big data’, ‘informatics’, and ‘bioinformatics’ in the animal health and veterinary medical literature

2019 ◽  
Vol 20 (1) ◽  
pp. 1-18 ◽  
Author(s):  
Zenhwa Ouyang ◽  
Jan Sargeant ◽  
Alison Thomas ◽  
Kate Wycherley ◽  
Rebecca Ma ◽  
...  

AbstractResearch in big data, informatics, and bioinformatics has grown dramatically (Andreu-Perez J, et al., 2015, IEEE Journal of Biomedical and Health Informatics 19, 1193–1208). Advances in gene sequencing technologies, surveillance systems, and electronic medical records have increased the amount of health data available. Unconventional data sources such as social media, wearable sensors, and internet search engine activity have also contributed to the influx of health data. The purpose of this study was to describe how ‘big data’, ‘informatics’, and ‘bioinformatics’ have been used in the animal health and veterinary medical literature and to map and chart publications using these terms through time. A scoping review methodology was used. A literature search of the terms ‘big data’, ‘informatics’, and ‘bioinformatics’ was conducted in the context of animal health and veterinary medicine. Relevance screening on abstract and full-text was conducted sequentially. In order for articles to be relevant, they must have used the words ‘big data’, ‘informatics’, or ‘bioinformatics’ in the title or abstract and full-text and have dealt with one of the major animal species encountered in veterinary medicine. Data items collected for all relevant articles included species, geographic region, first author affiliation, and journal of publication. The study level, study type, and data sources were collected for primary studies. After relevance screening, 1093 were classified. While there was a steady increase in ‘bioinformatics’ articles between 1995 and the end of the study period, ‘informatics’ articles reached their peak in 2012, then declined. The first ‘big data’ publication in animal health and veterinary medicine was in 2012. While few articles used the term ‘big data’ (n = 14), recent growth in ‘big data’ articles was observed. All geographic regions produced publications in ‘informatics’ and ‘bioinformatics’ while only North America, Europe, Asia, and Australia/Oceania produced publications about ‘big data’. ‘Bioinformatics’ primary studies tended to use genetic data and tended to be conducted at the genetic level. In contrast, ‘informatics’ primary studies tended to use non-genetic data sources and conducted at an organismal level. The rapidly evolving definition of ‘big data’ may lead to avoidance of the term.

2021 ◽  
Author(s):  
Meghan Shyama Nagpal ◽  
Antonia Barbaric ◽  
Diana Sherifali ◽  
Plinio P Morita ◽  
Joseph A Cafazzo

BACKGROUND Complications due to Type 2 Diabetes (T2D) can be mitigated through proper self-management which can positively change health behaviours. Technological tools are available to help people living with T2D manage their condition and such tools provide a large repository for patient-generated health data (PGHD). Analytics can provide insights about the ambulatory behaviours of people living with T2D. OBJECTIVE The objective of this review was to investigate analytical insights can be derived through PGHD with respect to ambulatory behaviours of people living with T2D. METHODS A scoping review using the Arksey & O’Malley framework was conducted in which a comprehensive search of the literature was conducted by two reviewers. Three electronic databases (PubMed, IEEE, ACM) were searched using keywords associated with diabetes, behaviours, and analytics. Several rounds of screening using predetermined inclusion and exclusion criteria were conducted and studies were selected. Critical examination took place through a descriptive-analytical narrative method and data extracted from the studies was classified into thematic categories. These categories reflect the findings of this study as per our objective. RESULTS We identified 43 studies that met the inclusion criteria for this review. While 70% of the studies examined PGHD independently, 30% of the studies combined PGHD with other data sources. The majority of these studies used machine learning algorithms to perform their analysis. Themes identified through this review include 1) predicting diabetes / obesity, 2) factors that contribute to diabetes / obesity, 3) insights from social media & online forums, 4) predicting glycemia, 5) improved adherence / outcomes, 6) analysis of sedentary behaviours, 7) deriving behavioural patterns, 8) discovering clinical findings, and 9) developing design principles. CONCLUSIONS The increased volume and availability of PGHD has the potential to derive analytical insights regarding the ambulatory behaviours of people living with T2D. From the literature, we determined that analytics can predict outcomes and identify granular behavioural patterns from PGHD. This review determined the broad range of insights that can be examined through PGHD, that would not be available through other data sources.


Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the author shows some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly getting refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the dreamt smarter planet.


Big Data ◽  
2016 ◽  
pp. 757-777
Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the authors show some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly being refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the smarter planet.


2015 ◽  
pp. 187-221
Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the author shows some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly getting refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the dreamt smarter planet.


2017 ◽  
Vol 4 (3) ◽  
pp. 160721 ◽  
Author(s):  
A. A. Hill ◽  
M. Crotta ◽  
B. Wall ◽  
L. Good ◽  
S. J. O'Brien ◽  
...  

Foodborne infection is a result of exposure to complex, dynamic food systems. The efficiency of foodborne infection is driven by ongoing shifts in genetic machinery. Next-generation sequencing technologies can provide high-fidelity data about the genetics of a pathogen. However, food safety surveillance systems do not currently provide similar high-fidelity epidemiological metadata to associate with genetic data. As a consequence, it is rarely possible to transform genetic data into actionable knowledge that can be used to genuinely inform risk assessment or prevent outbreaks. Big data approaches are touted as a revolution in decision support, and pose a potentially attractive method for closing the gap between the fidelity of genetic and epidemiological metadata for food safety surveillance. We therefore developed a simple food chain model to investigate the potential benefits of combining ‘big’ data sources, including both genetic and high-fidelity epidemiological metadata. Our results suggest that, as for any surveillance system, the collected data must be relevant and characterize the important dynamics of a system if we are to properly understand risk: this suggests the need to carefully consider data curation, rather than the more ambitious claims of big data proponents that unstructured and unrelated data sources can be combined to generate consistent insight. Of interest is that the biggest influencers of foodborne infection risk were contamination load and processing temperature, not genotype. This suggests that understanding food chain dynamics would probably more effectively generate insight into foodborne risk than prescribing the hazard in ever more detail in terms of genotype.


Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the authors show some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly being refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the smarter planet.


SAGE Open ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 215824402110615
Author(s):  
Jyh-How Huang ◽  
Yu-Chia Hsu

Sports big data has been an emerging research area in recent years. The purpose of this study was to ascertain the most frequent research topics, application areas, data sources, and data usage characteristics in the existing literature, in order to understand the development of data-driven baseball research and the multidisciplinary participation in the big data era. A scoping review was conducted, focusing on the diversity of using publicly available major league baseball data. Next, the co-occurrence analysis in bibliometrics was used to present a knowledge map of the reviewed literature. Finally, we propose a comprehensive baseball data research domain framework to visualize the ecosystem of publicly available sports data applications mapped to the four application domains in the big data maturity model. After searching and screening process from the Web of Science, Science Direct, and SPORTDiscus database, 48 relevant papers with clearly indicated data sources and data fields used were finally selected and full reviewed for advanced analysis. The most relevant research hotspots for sports data are sequentially economics and finance, sports injury, and sports performance evaluation. Subjects studied ranged from pitchers, position players, catchers, umpires, batters, free agents, and attendees. The most popular data sources are PITCHf/x, the Lahman Baseball Database, and baseball-reference.com. This review can serve as a valuable starting point for researchers to plan research strategies, to discover opportunities for cross-disciplinary research innovations, and to categorize their work in the context of the state of research.


Author(s):  
Shelly Vik ◽  
Behnam Sharif ◽  
Judy Seidel ◽  
Deborah A Marshall

IntroductionTechnical solutions have been used in industry settings for many years to facilitate efficient management and analyses of big data sources. An initiative to apply a business solution to support development of simulation models for health systems research using nearly two decades of provincial administrative health data is described. Objectives and ApproachAdministrative data including practitioner claims, hospitalizations and ambulatory care visits for patients with a diagnosis of osteoarthritis were obtained from Alberta Health for the period 1994/95 to 2012/13. These data were incorporated into a multidimensional data cube using Microsoft SQL Server Analysis Services. Initial steps required dimensional modeling to restructure the data into a star schema format. This involved appending several data sets and defining additional reference tables to contain stratification variables and denominator data for rate calculations. The modeling expert worked closely with the information technology team throughout the process and assessed validity of the output. ResultsDevelopment and validation of the multidimensional cube occurred in iterations over approximately 12 months. The final solution resulted in an analytics platform that compiled data from approximately 400 million records obtained from four different administrative data sources. Ten dimension tables containing 102 variables provided enhanced flexibility to conduct ad hoc stratified analyses in a fraction of the time that would be required using conventional methods. For example, some analyses that previously required a day of analyst time could be performed in less than 15 minutes. The efficiencies in analytic time were achieved by the pre-aggregated measures and slice and dice capability of the data cube, which negated many intermediary steps for data extraction and time consuming iterative analyses required for development of the simulation models. Conclusion/ImplicationsThis project demonstrated how a technical solution applied in industry can be utilized to address challenges encountered by researchers related to managing and analyzing large administrative health data sets. The methods could be applied in many other research settings to facilitate access to and analyses of information using big data.


Author(s):  
Abraham Rudnick ◽  
Dougal Nolan ◽  
Patrick Daigle

LAY SUMMARY Information on Canadian military Veterans’ mental health is needed to develop and improve mental health services. It is not clear to what extent such information is available and connected across its sources. A comprehensive review of scientific and other authorized publications was conducted to identify information sources related to Canadian Veteran mental health, connections between them, and related policies or guidelines. Ten data sources related to military Veterans’ mental health in Canada were found, but no policies or guidelines specifically addressing information sharing across these data sets were discovered. Secure, Accessible, eFfective, and Efficient (SAFE) information sharing across these sources was implied but not confirmed. The authors recommend consideration be given to establishing a repository of relevant data sets and policies and guidelines for information sharing and standardization across all relevant data sets.


BMJ Open ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. e037860
Author(s):  
Jason Denzil Morgenstern ◽  
Emmalin Buajitti ◽  
Meghan O’Neill ◽  
Thomas Piggott ◽  
Vivek Goel ◽  
...  

ObjectiveTo determine how machine learning has been applied to prediction applications in population health contexts. Specifically, to describe which outcomes have been studied, the data sources most widely used and whether reporting of machine learning predictive models aligns with established reporting guidelines.DesignA scoping review.Data sourcesMEDLINE, EMBASE, CINAHL, ProQuest, Scopus, Web of Science, Cochrane Library, INSPEC and ACM Digital Library were searched on 18 July 2018.Eligibility criteriaWe included English articles published between 1980 and 2018 that used machine learning to predict population-health-related outcomes. We excluded studies that only used logistic regression or were restricted to a clinical context.Data extraction and synthesisWe summarised findings extracted from published reports, which included general study characteristics, aspects of model development, reporting of results and model discussion items.ResultsOf 22 618 articles found by our search, 231 were included in the review. The USA (n=71, 30.74%) and China (n=40, 17.32%) produced the most studies. Cardiovascular disease (n=22, 9.52%) was the most studied outcome. The median number of observations was 5414 (IQR=16 543.5) and the median number of features was 17 (IQR=31). Health records (n=126, 54.5%) and investigator-generated data (n=86, 37.2%) were the most common data sources. Many studies did not incorporate recommended guidelines on machine learning and predictive modelling. Predictive discrimination was commonly assessed using area under the receiver operator curve (n=98, 42.42%) and calibration was rarely assessed (n=22, 9.52%).ConclusionsMachine learning applications in population health have concentrated on regions and diseases well represented in traditional data sources, infrequently using big data. Important aspects of model development were under-reported. Greater use of big data and reporting guidelines for predictive modelling could improve machine learning applications in population health.Registration numberRegistered on the Open Science Framework on 17 July 2018 (available at https://osf.io/rnqe6/).


Sign in / Sign up

Export Citation Format

Share Document