scholarly journals CNSA: a data repository for archiving omics data

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Xueqin Guo ◽  
Fengzhen Chen ◽  
Fei Gao ◽  
Ling Li ◽  
Ke Liu ◽  
...  

Abstract With the application and development of high-throughput sequencing technology in life and health sciences, massive multi-omics data brings the problem of efficient management and utilization. Database development and biocuration are the prerequisites for the reuse of these big data. Here, relying on China National GeneBank (CNGB), we present CNGB Sequence Archive (CNSA) for archiving omics data, including raw sequencing data and its further analyzed results which are organized into six objects, namely Project, Sample, Experiment, Run, Assembly and Variation at present. Moreover, CNSA has created a correlation model of living samples, sample information and analytical data on some projects. Both living samples and analytical data are directly correlated with the sample information. From either one, information or data of the other two can be obtained, so that all data can be traced throughout the life cycle from the living sample to the sample information to the analytical data. Complying with the data standards commonly used in the life sciences, CNSA is committed to building a comprehensive and curated data repository for storing, managing and sharing of omics data. We will continue to improve the data standards and provide free access to open-data resources for worldwide scientific communities to support academic research and the bio-industry. Database URL: https://db.cngb.org/cnsa/.

2020 ◽  
Author(s):  
Xueqin Guo ◽  
Fengzhen Chen ◽  
Fei Gao ◽  
Ling Li ◽  
Ke Liu ◽  
...  

AbstractWith the application and development of high-throughput sequencing technology in life and health sciences, massive multi-dimensional biological data brings the problem of efficient management and utilization. Database development and biocuration are the prerequisites for the reuse of these big data. Here, relying on China National GeneBank (CNGB), we present CNGB Sequence Archive (CNSA) for archiving omics data, including raw sequencing data and its analytical data and related metadata which are organized into six objects, namely Project, Sample, Experiment, Run, Assembly, and Variation at present. Moreover, CNSA has created the correlation model of living samples, sample information, and analytical data on some projects, so that all data can be traced throughout the life cycle from the living sample to the sample information to the analytical data. Complying with the data standards commonly used in the life sciences, CNSA is committed to building a comprehensive and curated data repository for the storage, management and sharing of omics data, improving the data standards, and providing free access to open data resources for worldwide scientific communities to support academic research and the bio-industry. Database URL: https://db.cngb.org/cnsa/


2019 ◽  
Author(s):  
Wikum Dinalankara ◽  
Qian Ke ◽  
Donald Geman ◽  
Luigi Marchionni

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.


2019 ◽  
Vol 3 (2-2) ◽  
pp. 233
Author(s):  
Dasapta Erwin Irawan ◽  
Muhammad Aswan Syahputra ◽  
Prana Ugi ◽  
Deny Juanda Puradimaja

Hydrochemical analysis has emerged as a powerful methodology in geothermal system profiling. Indonesia is the capital of geothermal energy with its more than 100 active volcanoes. Therefore we need to have an analytical, data-driven, and user-focused online application of geothermal water quality. Proudly we introduce Thermostats (https://aswansyahputra.shinyapps.io/thermostats/). We collected water quality from 416 geothermal sites across Indonesia. Three main objectives are to provide an online open-free to use data repository, to visualize the dataset to suit user’s needs, and to help users understand the geothermal system of each particular site. At the end, we hope they like this system and donate their own dataset to make it better for future users. We designed this online app using Shiny, because it’s open source, lightweight and portable. It’s very intuitive to load our descriptive, bivariate and multivariate statistics. We selected Principal Component Analysis and Cluster Analysis as two strong statistics for water sample classification. Users could add their own dataset by making a pull request on Github (https://github.com/dasaptaerwin/thermostats) or sending it to us by email to make it visible in the application and included in the visualization. We make this application portable, so it can be installed on a local computer or a server, to enable an easy and fluid way of data sharing between collaborators.


2021 ◽  
Vol 99 (2) ◽  
Author(s):  
Yuhua Fu ◽  
Pengyu Fan ◽  
Lu Wang ◽  
Ziqiang Shu ◽  
Shilin Zhu ◽  
...  

Abstract Despite the broad variety of available microRNA (miRNA) research tools and methods, their application to the identification, annotation, and target prediction of miRNAs in nonmodel organisms is still limited. In this study, we collected nearly all public sRNA-seq data to improve the annotation for known miRNAs and identify novel miRNAs that have not been annotated in pigs (Sus scrofa). We newly annotated 210 mature sequences in known miRNAs and found that 43 of the known miRNA precursors were problematic due to redundant/missing annotations or incorrect sequences. We also predicted 811 novel miRNAs with high confidence, which was twice the current number of known miRNAs for pigs in miRBase. In addition, we proposed a correlation-based strategy to predict target genes for miRNAs by using a large amount of sRNA-seq and RNA-seq data. We found that the correlation-based strategy provided additional evidence of expression compared with traditional target prediction methods. The correlation-based strategy also identified the regulatory pairs that were controlled by nonbinding sites with a particular pattern, which provided abundant complementarity for studying the mechanism of miRNAs that regulate gene expression. In summary, our study improved the annotation of known miRNAs, identified a large number of novel miRNAs, and predicted target genes for all pig miRNAs by using massive public data. This large data-based strategy is also applicable for other nonmodel organisms with incomplete annotation information.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sung Yong Park ◽  
Gina Faraci ◽  
Pamela M. Ward ◽  
Jane F. Emerson ◽  
Ha Youn Lee

AbstractCOVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients’ Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.


2021 ◽  
Author(s):  
Samir Das ◽  
Rida Abou-Haidar ◽  
Henri Rabalais ◽  
Sonia Denise Lai Wing Sun ◽  
Zaliqa Rosli ◽  
...  

AbstractIn January 2016, the Montreal Neurological Institute-Hospital (The Neuro) declared itself an Open Science organization. This vision extends beyond efforts by individual scientists seeking to release individual datasets, software tools, or building platforms that provide for the free dissemination of such information. It involves multiple stakeholders and an infrastructure that considers governance, ethics, computational resourcing, physical design, workflows, training, education, and intra-institutional reporting structures. The C-BIG repository was built in response as The Neuro’s institutional biospecimen and clinical data repository, and collects biospecimens as well as clinical, imaging, and genetic data from patients with neurological disease and healthy controls. It is aimed at helping scientific investigators, in both academia and industry, advance our understanding of neurological diseases and accelerate the development of treatments. As many neurological diseases are quite rare, they present several challenges to researchers due to their small patient populations. Overcoming these challenges required the aggregation of datasets from various projects and locations. The C-BIG repository achieves this goal and stands as a scalable working model for institutions to collect, track, curate, archive, and disseminate multimodal data from patients. In November 2020, a Registered Access layer was made available to the wider research community at https://cbigr-open.loris.ca, and in May 2021 fully open data will be released to complement the Registered Access data. This article outlines many of the aspects of The Neuro’s transition to Open Science by describing the data to be released, C-BIG’s full capabilities, and the design aspects that were implemented for effective data sharing.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Timothy P. Jenkins ◽  
David I. Pritchard ◽  
Radu Tanasescu ◽  
Gary Telford ◽  
Marina Papaiakovou ◽  
...  

Abstract Background Helminth-associated changes in gut microbiota composition have been hypothesised to contribute to the immune-suppressive properties of parasitic worms. Multiple sclerosis is an immune-mediated autoimmune disease of the central nervous system whose pathophysiology has been linked to imbalances in gut microbial communities. Results In the present study, we investigated, for the first time, qualitative and quantitative changes in the faecal bacterial composition of human volunteers with remitting multiple sclerosis (RMS) prior to and following experimental infection with the human hookworm, Necator americanus (N+), and following anthelmintic treatment, and compared the findings with data obtained from a cohort of RMS patients subjected to placebo treatment (PBO). Bacterial 16S rRNA high-throughput sequencing data revealed significantly decreased alpha diversity in the faecal microbiota of PBO compared to N+ subjects over the course of the trial; additionally, we observed significant differences in the abundances of several bacterial taxa with putative immune-modulatory functions between study cohorts. Parabacteroides were significantly expanded in the faecal microbiota of N+ individuals for which no clinical and/or radiological relapses were recorded at the end of the trial. Conclusions Overall, our data lend support to the hypothesis of a contributory role of parasite-associated alterations in gut microbial composition to the immune-modulatory properties of hookworm parasites.


Sign in / Sign up

Export Citation Format

Share Document