scholarly journals What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions (Preprint)

2020 ◽  
Author(s):  
Kristina K Gagalova ◽  
M Angelica Leon Elizalde ◽  
Elodie Portales-Casamar ◽  
Matthias Görges

BACKGROUND Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade. OBJECTIVE The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation. METHODS We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices. RESULTS Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making). CONCLUSIONS IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning.

10.2196/17687 ◽  
2020 ◽  
Vol 4 (8) ◽  
pp. e17687
Author(s):  
Kristina K Gagalova ◽  
M Angelica Leon Elizalde ◽  
Elodie Portales-Casamar ◽  
Matthias Görges

Background Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade. Objective The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation. Methods We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices. Results Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making). Conclusions IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning.


2021 ◽  
Vol 2 (3) ◽  
pp. 59
Author(s):  
Susanti Krismon ◽  
Syukri Iska

This article discusses the implementation of wages in agriculture in Nagari Bukit Kandung Subdistrict X Koto Atas, Solok Regency in a review of muamalah fiqh. The type of research is field research (field research). The data sources consist of primary data sources, namely from farmers and farm laborers who were carried out to 8 people and 4 farm workers, while the secondary data were obtained from documents in the form of the Bukit Kandung Nagari Profile that were related to this research, which could provide information or data. Addition to strengthen the primary data. Data collection techniques that the author uses are observation, interviews and documentation. The data processing that the author uses is qualitative. Based on the results of this study, the implementation of wages in agriculture carried out in Nagari Bukit Kandung District X Koto Diatas Solok Regency is farm laborers who ask for their wages to be given in advance before they carry out their work without an agreement to give their wages at the beginning. Because farm laborers ask for their wages to be given at the beginning, many farm workers work not as expected by farmers and there are also farm workers who are not on time to do the work that should be done. According to the muamalah fiqh review, the implementation of wages in agriculture in Nagari Bukit Kandung is not allowed because there is an element of gharar in the contract and there are parties who are disadvantaged in the contract, namely the owner of the fields.


Author(s):  
Todd J Vision ◽  
Heather A Piwowar

Recently introduced funding agency policies seek to increase the availability of data from individual published studies for reuse by the research community at large. The success of such policies can be measured both by data input (“is useful data being made available?”) and research output (“are these data being reused by others?”). A key determinant of data input is the extent to which data producers receive adequate professional credit for making data available. One of us (HP) previously reported a large citation difference for published microarray studies with and without data available in a public repository. Analysis of a much larger sample, with more covariates, provides a more reliable estimate of this citation boost, as well as additional insights into patterns of reuse and how the availability of data affects publication impact. A more recent study tracking the reuse of 100 datasets from each of ten different primary data repositories reveals large variation in patterns of reuse and citation. Our findings (a) illuminate ways in which the reuses of archived data tend to differ in purpose from that of the original producers; (b) inform data archiving policy, such as how long data embargoes need to be in order to protect the proprietary interests of producers; (c) and allow us to answer the vexing question of what the return on investment is for data archiving. In conducting these studies, we have become aware of gaps in data citation practice and infrastructure that limit the extent to which researchers receive credit for their contributions. We describe early efforts to bake good data citation and usage tracking into cyberinfrastructure as part of DataONE, the Data Observation Network for Earth. Finally, we introduce total-impact, a tool that allows researchers to track the diverse impacts of all their research outputs, including data, and empowers them to be recognized for their scholarly work on their own terms. Software and Data Availability: Research software and data: https://github.com/hpiwowar (CCZero for data where possible, MIT for code); Dryad: new BSD license: http://code.google.com/p/dryad; DataONE: Apache license: http://www.dataone.org/developer-resources; total-impact: MIT license: https://github.com/total-impact. This is an abstract that was submitted to the iEvoBio 2012 conference, held on July 10-11, 2012, in Ottawa, Canada.


2019 ◽  
Vol 1 (1) ◽  
pp. 1-16
Author(s):  
Sukrianto Sukrianto ◽  
Elya Elya ◽  
Naima Naima

This study is entitled "The Role of Moral Teachers in improving the emotional intelligence of students in MI Muhammadiyah Nunu Sub-district of Tatanga City hammer". In MI Muhammadiyah Nunu, Tatanga District, Palu City, (2) what are the supporting and inhibiting factors of the Akidah Akhlak and teacher increasing emotional intelligence of students in MI Muhammadiyah Nunu, Tatanga District, Palu City, the objectives of this study are: (1) to determine the role of the teacher Akidah Akhlak in increasing emotional intelligence of students in MI Muhammadiyah Nunu, Tatanga District, Palu City, (2) to determine the supporting and inhibiting factors of Akidah Akhlak teachers in improving the emotional intelligence of students in MI Muhammadiyah Nunu, Tatanga District, Palu City. This research is qualitative research. The subject of this determination is the teacher of moral subjects, data collection methods in this study are observation, interviews, and documentation. Data sources used in this study are primary data sources and secondary data sources. The research approach used in this study is the pedagogical approach, psychological approach, and social approach. Data processing techniques used are data processing techniques and data analysis. These results are obtained that the role of the Islamic Moral Teachers in increasing emotional intelligence of students in MI Muhammadiyah Nunu, Tatanga District, Palu City, namely: the teacher is able to understand the emotions of students, the teacher processes the emotions of students, provides guidance to students, provides motivation in improving emotional intelligence of students, teachers are able to foster student relations and the provision of punishment for students who violate the rules in school. The factors supporting and inhibiting the teacher's morality in developing students' emotional intelligence, namely: the existence of cooperation between teachers, increasing human resources, facilities and infrastructure, and extracurricular activities. While the inhibiting factors are: students do not obey the rules in school, students lack confidence, the demands of grades and limited time for meetings.


2020 ◽  
Vol 3 (2) ◽  
pp. 464-470
Author(s):  
Sonitehe Gea ◽  
Victorinus Laoli

This research was conducted in the Tetehosi II Village Gunungsitoli Idanoi Subdistrict by raising the title: "The Effect of Work Discipline on Village Apparatus Performance in the Tetehosi II Village Gunungsitoli Idanoi Subdistrict". In this study, researchers took a sample of 25 people, all employees in the Tetehosi II Village Office, Gunungsitoli Idanoi Subdistrict. Then the type of data used by researchers in this study are primary data and secondary data. In obtaining research data sources is to use alar and the type of research the author uses is quantitative. Based on the description above, statistically the authors propose the following research results: From the results of the calculation of the validity test of variables X and Y on each item / item questionnaire as many as 20 questions after correlating the results are valid so that the questionnaire / questionnaire is feasible to be used in data processing. This was done by the researcher to check the results of the questionnaire returned by the respondent whether it was in accordance with the instructions given. Based on the hypothesis testing criteria it turns out that Ha was accepted and Ho was rejected where Ha had a relationship and Ho had no relationship, because t arithmetic = 3.091> t table = 2.052 so it can be stated the influence of Work Discipline (X) on Performance (Y) at the Village Office in Gunungsitoli Idanoi District.


Author(s):  
Andra Waagmeester ◽  
Paul Braun ◽  
Manoj Karingamadathil ◽  
Jose Emilio Labra Gayo ◽  
Siobhan Leachman ◽  
...  

Moths form a diverse group of species that are predominantly active at night. They are colourful, have an ecological role, but are less well described compared to their closest relatives, the butterflies. Much remains to be understood about moths, which is shown by the many issues within their taxonomy, including being a paraphyletic group and the inability to clearly distinguish them from butterflies (Fig. 1). We present the Wikimedia architecture as a hub of knowledge on moths. This ecosystem consists of 312 language editions of Wikipedia and sister projects such as Wikimedia commons (a multimedia repository), and Wikidata (a public knowledge graph). Through Wikidata, external data repositories can be integrated into this knowledge landscape on moths. Wikidata contains links to (open) data repositories on biodiversity like iNaturalist, Global Biodiversity Information Facility (GBIF) and the Biodiversity Heritage Library (BHL) which in return contain detailed content like species occurrence data, images or publications on moths. We present a workflow that integrates crowd-sourced information and images from iNaturalist, with content from GBIF and BHL into the different language editions of Wikipedia. The Wikipedia articles in turn feed information to other sources. Taxon pages on iNaturalist, for example, have an "About" tab, which is fed by the Wikipedia article describing the respective taxon, where the current language of the (iNaturalist) interface fetches the appropriate language version from Wikipedia. This is a nice example of data reuse, which is one of the pillars of FAIR (Findable, Accessible, Interoperable and Reusable) (Wilkinson et al. 2016). Wikidata provides the linked data hub in this flow of knowledge. Since Wikidata is available in RDF, it aligns well with the data model of the semantic web. This allows rapid integration with other linked data sources, and provides an intuitive portal for non-linked data to be integrated as linked data with this semantic web. rapid integration with other linked data sources, and provides an intuitive portal for non-linked data to be integrated as linked data with this semantic web. Wikidata is includes information on all sorts of things (e.g., people, species, locations, events). Which is why it can structure data in a multitude of ways, thus leading to 9000+ properties. Because all those different domains and communities use the same source for different things it is important to have good structure and documentation for a specific topic so we and others can interpret the data. We present a schema that describes data about moth taxa on Wikidata. Since 2019, Wikidata has an EntitySchema namespace that allows contributors to specify applicable linked-data schemas. The schemas are expressed using Shape Expressions (ShEx) (Thornton et al. 2019), which is a formal modelling language for RDF, one of the data formats used on the Semantic Web. Since Wikidata is also rendered as RDF, it is possible to use ShEx to describe data models and user expectations in Wikidata (Waagmeester et al. 2021). These schemas can then be used to verify if a subset of Wikidata conforms to an expected or described data model. Starting from a document that describes an expected schema on moths, we have developed an EntitySchema (E321) for moths in Wikidata. This schema provides unambiguous guidance for contributors who have data they are not sure how to model. For example, a user with data about a particular species of moth may be working from a scientific article that states that the species is only found in New Zealand, and may be unsure of how to model that fact as a statement in Wikidata. After consulting Schema E321, the user will find out about Property P183 “endemic_to” and then use that property to state that the species is endemic to New Zealand. As more contributors follow the data model expressed in schema E321, there will be structural consistency across items for moths in Wikidata. This reduces the risk of contributors using different combinations of properties and qualifiers to express the same meaning. If a contributor needs to express something that is not yet represented in Schema E321 they can extend the schema itself, as each schema can be edited. The multilingual affordances of the Wikidata platform allow users to edit in over 300 languages. In this way, contributors edit in their preferred language and see the structure of the data as well as the schemas in their language of choice. This broadens the range of people who can contribute to these data models and reduces the dominance of English. There are approximately 160K+ estimated moth species. This number is equal to the number of moths described in iNaturalist, while Wikidata contains 220K items on moths. As the biggest language edition, the English Wikipedia contains 65K moth articles; other language editions contain far fewer Wikipedia articles. The higher number of items on moths in Wikidata can be partly explained by Wikidata taxon synonyms being treated as distinct taxa. Wikidata, as a proxy of knowledge on moths, is instrumental in getting them better described in Wikipedia and other (FAIR) sources. While in return, curation in Wikidata happens by a large community. This approach to data modelling has the advantage of allowing multilingual collaboration and iterative extension and improvement over time.


2013 ◽  
Author(s):  
Todd J Vision ◽  
Heather A Piwowar

Recently introduced funding agency policies seek to increase the availability of data from individual published studies for reuse by the research community at large. The success of such policies can be measured both by data input (“is useful data being made available?”) and research output (“are these data being reused by others?”). A key determinant of data input is the extent to which data producers receive adequate professional credit for making data available. One of us (HP) previously reported a large citation difference for published microarray studies with and without data available in a public repository. Analysis of a much larger sample, with more covariates, provides a more reliable estimate of this citation boost, as well as additional insights into patterns of reuse and how the availability of data affects publication impact. A more recent study tracking the reuse of 100 datasets from each of ten different primary data repositories reveals large variation in patterns of reuse and citation. Our findings (a) illuminate ways in which the reuses of archived data tend to differ in purpose from that of the original producers; (b) inform data archiving policy, such as how long data embargoes need to be in order to protect the proprietary interests of producers; (c) and allow us to answer the vexing question of what the return on investment is for data archiving. In conducting these studies, we have become aware of gaps in data citation practice and infrastructure that limit the extent to which researchers receive credit for their contributions. We describe early efforts to bake good data citation and usage tracking into cyberinfrastructure as part of DataONE, the Data Observation Network for Earth. Finally, we introduce total-impact, a tool that allows researchers to track the diverse impacts of all their research outputs, including data, and empowers them to be recognized for their scholarly work on their own terms. Software and Data Availability: Research software and data: https://github.com/hpiwowar (CCZero for data where possible, MIT for code); Dryad: new BSD license: http://code.google.com/p/dryad; DataONE: Apache license: http://www.dataone.org/developer-resources; total-impact: MIT license: https://github.com/total-impact. This is an abstract that was submitted to the iEvoBio 2012 conference, held on July 10-11, 2012, in Ottawa, Canada.


2016 ◽  
Vol 55 (02) ◽  
pp. 114-124 ◽  
Author(s):  
E. Ammenwerth ◽  
W. O. Hackl

SummaryBackground: Secondary use of clinical routine data is receiving an increasing amount of attention in biomedicine and healthcare. However, building and analysing integrated clinical routine data repositories are non -trivial, challenging tasks. As in most evolving fields, recognized standards, well-proven methodological frameworks, or accurately described best-practice approaches for the systematic planning of solutions for secon -dary use of routine medical record data are missing.Objective: We propose a conceptual best-practice framework and procedure model for the systematic planning of intelligent reuse of integrated clinical routine data (SPIRIT).Methods: SPIRIT was developed based on a broad literature overview and further refined in two case studies with different kinds of clinical routine data, including process-oriented nursing data from a large hospital group and high-volume multimodal clinical data from a neurologic intensive care unit.Results: SPIRIT aims at tailoring secondary use solutions to specific needs of single departments without losing sight of the institution as a whole. It provides a general conceptual best-practice framework consisting of three parts: First, a secondary use strategy for the whole organization is determined. Second, comprehensive analyses are conducted from two different viewpoints to define the requirements regarding a clinical routine data reuse solution at the system level from the data perspective (BOTTOM UP) and at the strategic level from the future users perspective (TOP DOWN). An obligatory clinical context analysis (IN BETWEEN) facilitates refinement, combination, and integration of the different requirements. The third part of SPIRIT is dedicated to implementation, which comprises design and realization of clinical data integration and management as well as data analysis solutions.Conclusions: The SPIRIT framework is intended to be used to systematically plan the intelligent reuse of clinical routine data for multiple purposes, which often was not intended when the primary clinical documentation systems were implemented. SPIRIT helps to overcome this gap. It can be applied in healthcare institutions of any size or specialization and allows a stepwise setup and evolution of holistic clinical routine data reuse solutions.


2019 ◽  
Author(s):  
Clark C. Evans ◽  
Kyrylo Simonov

AbstractA new way to conceptualize computations, Query Combinators, can be used to create a data processing environment shared among the entire medical research team. For a given research context, a domain specific query language can be created that represents data sources, analysis methods, and integrative domain knowlege. Research questions can then have an intuitive, high-level form that can be reasoned about and discussed.


2018 ◽  
Vol 3 (1) ◽  
pp. 795
Author(s):  
Isabel De la Torre Díez ◽  
Guillermo Fernández Rodríguez ◽  
Gema Castillo ◽  
Aranzazu Berbey Alvarez

In recent years, thanks to the progress of electronics and computing, it is possible to process a large volume of clinical data. As a result of this scenario, real world data (RWD) are gaining enormous relevance. RWD are the data, whose origin is the usual clinical practice, used to make medical decisions about drugs or medical practice. This research is aimed to study the current situation of RWD in Spain. To achieve this objective, we have assessed the data sources on which these are fed. We have also analyzed the main publications based on RWD. Our findings are: firstly, both records and databases as well as medical histories have a high level of computerization and have also a great deal of information to be used for research; and secondly, the scientific studies carried out are of a great quality, but society is not aware of the importance RWD have and there is discoordination between the Autonomies and the Government. Keywords: RWD, clinical data, medical decisions, practical decisions, medical histories


Sign in / Sign up

Export Citation Format

Share Document