Faculty Opinions recommendation of BAMboozle removes genetic variation from human sequence data for open data sharing.

AbstractThe risks associated with re-identification of human genetic data are severely limiting open data sharing in life sciences, even in studies where donor-related genetic variant information is not of primary interest. Here, we developed BAMboozle, a versatile tool to eliminate critical types of sensitive genetic information in human sequence data by reverting aligned reads to the genome reference sequence. Applying BAMboozle to functional genomics data, such as single-cell RNA-seq (scRNA-seq) and scATAC-seq datasets, confirmed the removal of donor-related single nucleotide polymorphisms (SNPs) and indels in a manner that did not disclose the altered positions. Importantly, BAMboozle only removes the genetic sequence variants of the sample (i.e., donor) while preserving other important aspects of the raw sequence data. For example, BAMboozled scRNA-seq data contained accurate cell-type associated gene expression signatures, splice kinetic information, and can be used for methods benchmarking. Altogether, BAMboozle efficiently removes genetic variation in aligned sequence data, which represents a step forward towards open data sharing in many areas of genomics where the genetic variant information is not of primary interest.

Download Full-text

anonymizeBAM: Versatile anonymization of human sequence data for open data sharing

10.1101/2021.01.11.426206 ◽

2021 ◽

Author(s):

Christoph Ziegenhain ◽

Rickard Sandberg

Keyword(s):

Data Sharing ◽

Genetic Variant ◽

Sequence Data ◽

Life Sciences ◽

Open Data ◽

Complete Removal ◽

Rna Seq ◽

Human Sequence ◽

Versatile Tool ◽

Variant Information

AbstractThe risks associated with re-identification of human genetic data are severely limiting open data sharing in life sciences. Here, we developed anonymizeBAM, a versatile tool for the anonymization of genetic variant information present in sequence data. Applying anonymizeBAM to single-cell RNA-seq and ATAC-seq datasets confirmed the complete removal of donor-related genetic information. Therefore, the accurate generation of de-identified sequence data will re-enable open sharing in sequencing-based studies for improved transparency, reproducibility, and innovation.

Download Full-text

Data Dentistry: How Data Are Changing Clinical Care and Research

Journal of Dental Research ◽

10.1177/00220345211020265 ◽

2021 ◽

pp. 002203452110202

Author(s):

F. Schwendicke ◽

J. Krois

Keyword(s):

Health Care ◽

Data Sharing ◽

Clinical Care ◽

Open Data ◽

User Interaction ◽

Data Availability ◽

Related Data ◽

Data User ◽

Regulatory Data ◽

Consumer Data

Data are a key resource for modern societies and expected to improve quality, accessibility, affordability, safety, and equity of health care. Dental care and research are currently transforming into what we term data dentistry, with 3 main applications: 1) medical data analysis uses deep learning, allowing one to master unprecedented amounts of data (language, speech, imagery) and put them to productive use. 2) Data-enriched clinical care integrates data from individual (e.g., demographic, social, clinical and omics data, consumer data), setting (e.g., geospatial, environmental, provider-related data), and systems level (payer or regulatory data to characterize input, throughput, output, and outcomes of health care) to provide a comprehensive and continuous real-time assessment of biologic perturbations, individual behaviors, and context. Such care may contribute to a deeper understanding of health and disease and a more precise, personalized, predictive, and preventive care. 3) Data for research include open research data and data sharing, allowing one to appraise, benchmark, pool, replicate, and reuse data. Concerns and confidence into data-driven applications, stakeholders’ and system’s capabilities, and lack of data standardization and harmonization currently limit the development and implementation of data dentistry. Aspects of bias and data-user interaction require attention. Action items for the dental community circle around increasing data availability, refinement, and usage; demonstrating safety, value, and usefulness of applications; educating the dental workforce and consumers; providing performant and standardized infrastructure and processes; and incentivizing and adopting open data and data sharing.

Download Full-text

Fengyun Meteorological Satellite Products for Earth System Science Applications

Advances in Atmospheric Sciences ◽

10.1007/s00376-021-0425-3 ◽

2021 ◽

Author(s):

Di Xian ◽

Peng Zhang ◽

Ling Gao ◽

Ruijing Sun ◽

Haizhen Zhang ◽

...

Keyword(s):

Data Sharing ◽

Satellite Data ◽

Prediction Models ◽

Weather Forecasting ◽

Numerical Models ◽

Weather Prediction ◽

Vegetation Indices ◽

Open Data ◽

Earth System ◽

Inversion Algorithm

AbstractFollowing the progress of satellite data assimilation in the 1990s, the combination of meteorological satellites and numerical models has changed the way scientists understand the earth. With the evolution of numerical weather prediction models and earth system models, meteorological satellites will play a more important role in earth sciences in the future. As part of the space-based infrastructure, the Fengyun (FY) meteorological satellites have contributed to earth science sustainability studies through an open data policy and stable data quality since the first launch of the FY-1A satellite in 1988. The capability of earth system monitoring was greatly enhanced after the second-generation polar orbiting FY-3 satellites and geostationary orbiting FY-4 satellites were developed. Meanwhile, the quality of the products generated from the FY-3 and FY-4 satellites is comparable to the well-known MODIS products. FY satellite data has been utilized broadly in weather forecasting, climate and climate change investigations, environmental disaster monitoring, etc. This article reviews the instruments mounted on the FY satellites. Sensor-dependent level 1 products (radiance data) and inversion algorithm-dependent level 2 products (geophysical parameters) are introduced. As an example, some typical geophysical parameters, such as wildfires, lightning, vegetation indices, aerosol products, soil moisture, and precipitation estimation have been demonstrated and validated by in-situ observations and other well-known satellite products. To help users access the FY products, a set of data sharing systems has been developed and operated. The newly developed data sharing system based on cloud technology has been illustrated to improve the efficiency of data delivery.

Download Full-text

Detecting a Local Signature of Genetic Hitchhiking Along a Recombining Chromosome

Genetics ◽

10.1093/genetics/160.2.765 ◽

2002 ◽

Vol 160 (2) ◽

pp. 765-777 ◽

Cited By ~ 9

Author(s):

Yuseob Kim ◽

Wolfgang Stephan

Keyword(s):

Genetic Variation ◽

Dna Polymorphism ◽

Nucleotide Diversity ◽

Sequence Data ◽

Directional Selection ◽

Genetic Hitchhiking ◽

Dna Sequence Data ◽

Local Reduction ◽

Polymorphism Data ◽

Ancestral Recombination Graphs

Abstract The theory of genetic hitchhiking predicts that the level of genetic variation is greatly reduced at the site of strong directional selection and increases as the recombinational distance from the site of selection increases. This characteristic pattern can be used to detect recent directional selection on the basis of DNA polymorphism data. However, the large variance of nucleotide diversity in samples of moderate size imposes difficulties in detecting such patterns. We investigated the patterns of genetic variation along a recombining chromosome by constructing ancestral recombination graphs that are modified to incorporate the effect of genetic hitchhiking. A statistical method is proposed to test the significance of a local reduction of variation and a skew of the frequency spectrum caused by a hitchhiking event. This method also allows us to estimate the strength and the location of directional selection from DNA sequence data.

Download Full-text

Significant population genetic structure detected for a new and highly restricted species of Atriplex (Chenopodiaceae) from Western Australia, and implications for conservation management

Australian Journal of Botany ◽

10.1071/bt11223 ◽

2012 ◽

Vol 60 (1) ◽

pp. 32 ◽

Cited By ~ 15

Author(s):

Laurence J. Clarke ◽

Duncan I. Jardine ◽

Margaret Byrne ◽

Kelly Shepherd ◽

Andrew J. Lowe

Keyword(s):

Genetic Variation ◽

New Species ◽

Genetic Structure ◽

Western Australia ◽

Sequence Data ◽

Isolation By Distance ◽

Conservation Strategies ◽

New Taxon ◽

Near Surface ◽

Two Populations

Atriplex sp. Yeelirrie Station (L. Trotter & A. Douglas LCH 25025) is a highly restricted, potentially new species of saltbush, known from only two sites ~30 km apart in central Western Australia. Knowledge of genetic structure within the species is required to inform conservation strategies as both populations occur within a palaeovalley that contains significant near-surface uranium mineralisation. We investigate the structure of genetic variation within populations and subpopulations of this taxon using nuclear microsatellites. Internal transcribed spacer sequence data places this new taxon within a clade of polyploid Atriplex species, and the maximum number of alleles per locus suggests it is hexaploid. The two populations possessed similar levels of genetic diversity, but exhibited a surprising level of genetic differentiation given their proximity. Significant isolation by distance over scales of less than 5 km suggests dispersal is highly restricted. In addition, the proportion of variation between the populations (12%) is similar to that among A. nummularia populations sampled at a continent-wide scale (several thousand kilometres), and only marginally less than that between distinct A. nummularia subspecies. Additional work is required to further clarify the exact taxonomic status of the two populations. We propose management recommendations for this potentially new species in light of its highly structured genetic variation.

Download Full-text

The coalescent process in models with selection, recombination and geographic subdivision

Genetics Research ◽

10.1017/s0016672300029074 ◽

1991 ◽

Vol 57 (1) ◽

pp. 83-91 ◽

Cited By ~ 40

Author(s):

Norman Kaplan ◽

Richard R. Hudson ◽

Masaru Iizuka

Keyword(s):

Genetic Variation ◽

Population Genetic ◽

Genetic Model ◽

Sequence Data ◽

Balancing Selection ◽

Similar Model ◽

Proposed Model ◽

Coalescent Approach ◽

Neutral Mutations ◽

Better Than

SummaryA population genetic model with a single locus at which balancing selection acts and many linked loci at which neutral mutations can occur is analysed using the coalescent approach. The model incorporates geographic subdivision with migration, as well as mutation, recombination, and genetic drift of neutral variation. It is found that geographic subdivision can affect genetic variation even with high rates of migration, providing that selection is strong enough to maintain different allele frequencies at the selected locus. Published sequence data from the alcohol dehydrogenase locus of Drosophila melanogaster are found to fit the proposed model slightly better than a similar model without subdivision.

Download Full-text

Archaeological documentation and data sharing: digital surveying and open data approach applied to archaeological fieldworks

Virtual Archaeology Review ◽

10.4995/var.2019.10377 ◽

2019 ◽

Vol 10 (20) ◽

pp. 17 ◽

Cited By ~ 3

Author(s):

Mattia Previtali ◽

Riccardo Valente

Keyword(s):

Information System ◽

Geographic Information System ◽

Data Sharing ◽

Open Data ◽

Geographic Information ◽

Published Data ◽

Archaeological Data ◽

Disciplinary Expertise ◽

Large Level ◽

The Impact

The open data paradigm is changing the research approach in many fields such as remote sensing and the social sciences. This is supported by governmental decisions and policies that are boosting the open data wave, and in this context archaeology is also affected by this new trend. In many countries, archaeological data are still protected or only limited access is allowed. However, the strong political and economic support for the publication of government data as open data will change the accessibility and disciplinary expertise in the archaeological field too. In order to maximize the impact of data, their technical openness is of primary importance. Indeed, since a spreadsheet is more usable than a PDF of a table, the availability of digital archaeological data, which is structured using standardised approaches, is of primary importance for the real usability of published data. In this context, the main aim of this paper is to present a workflow for archaeological data sharing as open data with a large level of technical usability and interoperability. Primary data is mainly acquired through the use of digital techniques (e.g. digital cameras and terrestrial laser scanning). The processing of this raw data is performed with commercial software for scan registration and image processing, allowing for a simple and semi-automated workflow. Outputs obtained from this step are then processed in modelling and drawing environments to generate digital models, both 2D and 3D. These crude geometrical data are then enriched with further information to generate a Geographic Information System (GIS) which is finally published as open data using Open Geospatial Consortium (OGC) standards to maximise interoperability.Highlights:<ul><li>Open data will change the accessibility and disciplinary expertise in the archaeological field.</li><li>The main aim of this paper is to present a workflow for archaeological data sharing as open data with a large level of interoperability.</li><li>Digital acquisition techniques are used to document archaeological excavations and a Geographic Information System (GIS) is generated that is published as open data.</li></ul>

Download Full-text

NIfTI-MRS: A standard format for magnetic resonance spectroscopic data

10.1101/2021.11.09.467912 ◽

2021 ◽

Author(s):

William T Clarke ◽

Mark Mikkelsen ◽

Georg Oeltzschner ◽

Tiffany Bell ◽

Amirmohammad Shamaei ◽

...

Keyword(s):

Data Sharing ◽

Open Data ◽

Imaging Data ◽

Standard Format ◽

Conversion Point ◽

Imaging Tool ◽

Data Formats ◽

Multiple Data ◽

Single Voxel ◽

Online Documentation

Purpose: The use of multiple data formats in the MRS community currently hinders data sharing and integration. NIfTI-MRS is proposed as a standard MR spectroscopy data format, which is implemented as an extension to the neuroimaging informatics technology initiative (NIfTI) format. Using this standardised format will facilitate data sharing, ease algorithm development, and encourage the integration of MRS analysis with other imaging modalities. Methods: A file format based on the NIfTI header extension framework was designed to incorporate essential spectroscopic metadata and additional encoding dimensions. A detailed description of the specification is provided. An open-source command-line conversion program is implemented to enable conversion of single-voxel and spectroscopic imaging data to NIfTI-MRS. To provide visualisation of data in NIfTI-MRS, a dedicated plugin is implemented for FSLeyes, the FSL image viewer. Results: Alongside online documentation, ten example datasets are provided in the proposed format. In addition, minimal examples of NIfTI-MRS readers have been implemented. The conversion software, spec2nii, currently converts fourteen formats to NIfTI-MRS, including DICOM and vendor proprietary formats. Conclusion: The proposed format aims to solve the issue of multiple data formats being used in the MRS community. By providing a single conversion point, it aims to simplify the processing and analysis of MRS data, thereby lowering the barrier to use of MRS. Furthermore, it can serve as the basis for open data sharing, collaboration, and interoperability of analysis programs. It also opens possibility of greater standardisation and harmonisation. By aligning with the dominant format in neuroimaging, NIfTI-MRS enables the use of mature tools present in the imaging community, demonstrated in this work by using a dedicated imaging tool, FSLeyes, as a viewer.

Download Full-text

Supporting evidence-based analysis for modified risk tobacco products through a toxicology data-sharing infrastructure

F1000Research ◽

10.12688/f1000research.10493.2 ◽

2017 ◽

Vol 6 ◽

pp. 12 ◽

Cited By ~ 6

Author(s):

Stéphanie Boué ◽

Thomas Exner ◽

Samik Ghosh ◽

Vincenzo Belcastro ◽

Joh Dokler ◽

...

Keyword(s):

Data Sharing ◽

Data Science ◽

Disease Risk ◽

Open Data ◽

Supporting Evidence ◽

Tobacco Products ◽

Us Fda ◽

Using Data

The US FDA defines modified risk tobacco products (MRTPs) as products that aim to reduce harm or the risk of tobacco-related disease associated with commercially marketed tobacco products. Establishing a product’s potential as an MRTP requires scientific substantiation including toxicity studies and measures of disease risk relative to those of cigarette smoking. Best practices encourage verification of the data from such studies through sharing and open standards. Building on the experience gained from the OpenTox project, a proof-of-concept database and website (INTERVALS) has been developed to share results from both in vivo inhalation studies and in vitro studies conducted by Philip Morris International R&D to assess candidate MRTPs. As datasets are often generated by diverse methods and standards, they need to be traceable, curated, and the methods used well described so that knowledge can be gained using data science principles and tools. The data-management framework described here accounts for the latest standards of data sharing and research reproducibility. Curated data and methods descriptions have been prepared in ISA-Tab format and stored in a database accessible via a search portal on the INTERVALS website. The portal allows users to browse the data by study or mechanism (e.g., inflammation, oxidative stress) and obtain information relevant to study design, methods, and the most important results. Given the successful development of the initial infrastructure, the goal is to grow this initiative and establish a public repository for 21st-century preclinical systems toxicology MRTP assessment data and results that supports open data principles.

Download Full-text