Diat.barcode, an open-access curated barcode library for diatoms

Abstract Diatoms (Bacillariophyta) are ubiquitous microalgae which produce a siliceous exoskeleton and which make a major contribution to the productivity of oceans and freshwaters. They display a huge diversity, which makes them excellent ecological indicators of aquatic ecosystems. Usually, diatoms are identified using characteristics of their exoskeleton morphology. DNA-barcoding is an alternative to this and the use of High-Throughput-Sequencing enables the rapid analysis of many environmental samples at a lower cost than analyses under microscope. However, to identify environmental sequences correctly, an expertly curated reference library is needed. Several curated libraries for protists exists; none, however are dedicated to diatoms. Diat.barcode is an open-access library dedicated to diatoms which has been maintained since 2012. Data come from two sources (1) the NCBI nucleotide database and (2) unpublished sequencing data of culture collections. Since 2017, several experts have collaborated to curate this library for rbcL, a chloroplast marker suitable for species-level identification of diatoms. For the latest version of the database (version 7), 605 of the 3482 taxonomical names originally assigned by the authors of the rbcL sequences were modified after curation. The database is accessible at https://www6.inra.fr/carrtel-collection_eng/Barcoding-database.

Download Full-text

Diat.barcode: a DNA tool to decipher diatom communities for the evaluation environmental pressures

ARPHA Conference Abstracts ◽

10.3897/aca.4.e64940 ◽

2021 ◽

Vol 4 ◽

Cited By ~ 1

Author(s):

Frederic Rimet ◽

Teofana Chonova ◽

Gilles Gassiole ◽

Maria Kahlert ◽

François Keck ◽

...

Keyword(s):

Taxonomic Diversity ◽

R Package ◽

Life Forms ◽

Reference Database ◽

Sequencing Data ◽

Culture Collections ◽

Environmental Pressures ◽

Ecological Features ◽

Dna Metabarcoding ◽

Environmental Sequences

Diatoms (Bacillariophyta) are ubiquitous microalgae, which present a huge taxonomic diversity, changing in correlation with differing environmental conditions. This makes them excellent ecological indicators for various ecosystems and ecological problematics (ecotoxicology, biomonitoring, paleo-environmental reconstruction …). Current standardized methodologies for diatoms are based on microscopic determinations, which is time consuming and prone to identification uncertainties. DNA metabarcoding has been proposed as a way to avoid these flaws, enabling the sequencing of a large quantity of barcodes from natural samples. A taxonomic identity is given to these barcodes by comparing their sequences to a barcoding reference library. However, to identify environmental sequences correctly, the reference database should contain a representative number of reference sequences to ensure a good coverage of diatom diversity. Moreover, the reference database needs to be carefully taxonomically curated by experts, as its content has an obvious impact on species detection. Diat.barcode is an open-access library for diatoms linking diatom taxonomic identities to rbcL barcode sequences (a chloroplast marker suitable for species-level identification of diatoms), which has been maintained since 2012. Data are accumulated from three sources: (1) the NCBI nucleotide database, (2) unpublished sequencing data of culture collections and more recently (3) environmental sequences. Since 2017, an international network of experts in diatom taxonomy curate this library. The last version of the database (version 9.2), includes 8066 entries that correspond to more than 280 different genera and 1490 different species. In addition to the taxonomic information, morphological features (e.g. biovolumes, chloroplasts, etc.), life-forms (mobility, colony-type) and ecological features (taxa preferences to pollution) are given. The database can be downloaded from the website (www6.inrae.fr/carrtel-collection/Barcoding-database/) or directly through the R package diatbarcode. Ready-to-use files for commonly used metabarcoding pipelines (Mothur and DADA2) are also available.

Download Full-text

HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

PLoS ONE ◽

10.1371/journal.pone.0085879 ◽

2014 ◽

Vol 9 (1) ◽

pp. e85879 ◽

Cited By ~ 67

Author(s):

Fabrice P. A. David ◽

Julien Delafontaine ◽

Solenne Carat ◽

Frederick J. Ross ◽

Gregory Lefebvre ◽

...

Keyword(s):

Data Analysis ◽

Open Access ◽

High Throughput ◽

Web Application ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

Faculty Opinions recommendation of Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726132071.793531014 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Hiv Infection ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution

Bioinformatics ◽

10.1093/bioinformatics/btu010 ◽

2014 ◽

Vol 30 (9) ◽

pp. 1214-1219 ◽

Cited By ~ 6

Author(s):

C. Ye ◽

C. Hsiao ◽

H. Corrada Bravo

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Blind Deconvolution ◽

Sequencing Data ◽

Base Calling ◽

High Throughput Sequencing Data

Download Full-text

Improvement, identification, and target prediction for miRNAs in the porcine genome by using massive, public high-throughput sequencing data

Journal of Animal Science ◽

10.1093/jas/skab018 ◽

2021 ◽

Vol 99 (2) ◽

Author(s):

Yuhua Fu ◽

Pengyu Fan ◽

Lu Wang ◽

Ziqiang Shu ◽

Shilin Zhu ◽

...

Keyword(s):

High Throughput Sequencing ◽

Target Genes ◽

Target Prediction ◽

Large Data ◽

Sequencing Data ◽

Regulate Gene Expression ◽

High Throughput Sequencing Data ◽

Annotation Information ◽

Public Data ◽

Broad Variety

Abstract Despite the broad variety of available microRNA (miRNA) research tools and methods, their application to the identification, annotation, and target prediction of miRNAs in nonmodel organisms is still limited. In this study, we collected nearly all public sRNA-seq data to improve the annotation for known miRNAs and identify novel miRNAs that have not been annotated in pigs (Sus scrofa). We newly annotated 210 mature sequences in known miRNAs and found that 43 of the known miRNA precursors were problematic due to redundant/missing annotations or incorrect sequences. We also predicted 811 novel miRNAs with high confidence, which was twice the current number of known miRNAs for pigs in miRBase. In addition, we proposed a correlation-based strategy to predict target genes for miRNAs by using a large amount of sRNA-seq and RNA-seq data. We found that the correlation-based strategy provided additional evidence of expression compared with traditional target prediction methods. The correlation-based strategy also identified the regulatory pairs that were controlled by nonbinding sites with a particular pattern, which provided abundant complementarity for studying the mechanism of miRNAs that regulate gene expression. In summary, our study improved the annotation of known miRNAs, identified a large number of novel miRNAs, and predicted target genes for all pig miRNAs by using massive public data. This large data-based strategy is also applicable for other nonmodel organisms with incomplete annotation information.

Download Full-text

High-precision and cost-efficient sequencing for real-time COVID-19 surveillance

Scientific Reports ◽

10.1038/s41598-021-93145-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sung Yong Park ◽

Gina Faraci ◽

Pamela M. Ward ◽

Jane F. Emerson ◽

Ha Youn Lee

Keyword(s):

Los Angeles ◽

Whole Genome Sequencing ◽

Real Time ◽

Genome Sequencing ◽

High Precision ◽

High Throughput Sequencing ◽

Whole Genome ◽

Sequencing Data ◽

Public Health Response ◽

Cost Efficient

AbstractCOVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients’ Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.

Download Full-text

Experimental infection with the hookworm, Necator americanus, is associated with stable gut microbial diversity in human volunteers with relapsing multiple sclerosis

BMC Biology ◽

10.1186/s12915-021-01003-6 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Timothy P. Jenkins ◽

David I. Pritchard ◽

Radu Tanasescu ◽

Gary Telford ◽

Marina Papaiakovou ◽

...

Keyword(s):

Multiple Sclerosis ◽

Experimental Infection ◽

High Throughput Sequencing ◽

Alpha Diversity ◽

Placebo Treatment ◽

Sequencing Data ◽

Faecal Microbiota ◽

Microbial Composition ◽

Necator Americanus ◽

Human Volunteers

Abstract Background Helminth-associated changes in gut microbiota composition have been hypothesised to contribute to the immune-suppressive properties of parasitic worms. Multiple sclerosis is an immune-mediated autoimmune disease of the central nervous system whose pathophysiology has been linked to imbalances in gut microbial communities. Results In the present study, we investigated, for the first time, qualitative and quantitative changes in the faecal bacterial composition of human volunteers with remitting multiple sclerosis (RMS) prior to and following experimental infection with the human hookworm, Necator americanus (N+), and following anthelmintic treatment, and compared the findings with data obtained from a cohort of RMS patients subjected to placebo treatment (PBO). Bacterial 16S rRNA high-throughput sequencing data revealed significantly decreased alpha diversity in the faecal microbiota of PBO compared to N+ subjects over the course of the trial; additionally, we observed significant differences in the abundances of several bacterial taxa with putative immune-modulatory functions between study cohorts. Parabacteroides were significantly expanded in the faecal microbiota of N+ individuals for which no clinical and/or radiological relapses were recorded at the end of the trial. Conclusions Overall, our data lend support to the hypothesis of a contributory role of parasite-associated alterations in gut microbial composition to the immune-modulatory properties of hookworm parasites.

Download Full-text

deepBase v3.0: expression atlas and interactive analysis of ncRNAs from thousands of deep-sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkaa1039 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D877-D883

Author(s):

Fangzhou Xie ◽

Shurong Liu ◽

Junhao Wang ◽

Jiajia Xuan ◽

Xiaoqin Zhang ◽

...

Keyword(s):

High Throughput Sequencing ◽

Clinical Information ◽

Sequencing Data ◽

Normal Tissues ◽

Interactive Analysis ◽

High Throughput Sequencing Data ◽

Expression Atlas ◽

Expression Evolution ◽

Noninvasive Biomarkers ◽

Cancer Tissues

Abstract Eukaryotic genomes encode thousands of small and large non-coding RNAs (ncRNAs). However, the expression, functions and evolution of these ncRNAs are still largely unknown. In this study, we have updated deepBase to version 3.0 (deepBase v3.0, http://rna.sysu.edu.cn/deepbase3/index.html), an increasingly popular and openly licensed resource that facilitates integrative and interactive display and analysis of the expression, evolution, and functions of various ncRNAs by deeply mining thousands of high-throughput sequencing data from tissue, tumor and exosome samples. We updated deepBase v3.0 to provide the most comprehensive expression atlas of small RNAs and lncRNAs by integrating ∼67 620 data from 80 normal tissues and ∼50 cancer tissues. The extracellular patterns of various ncRNAs were profiled to explore their applications for discovery of noninvasive biomarkers. Moreover, we constructed survival maps of tRNA-derived RNA Fragments (tRFs), miRNAs, snoRNAs and lncRNAs by analyzing >45 000 cancer sample data and corresponding clinical information. We also developed interactive webs to analyze the differential expression and biological functions of various ncRNAs in ∼50 types of cancers. This update is expected to provide a variety of new modules and graphic visualizations to facilitate analyses and explorations of the functions and mechanisms of various types of ncRNAs.

Download Full-text

Improving gene function predictions using independent transcriptional components

Nature Communications ◽

10.1038/s41467-021-21671-w ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Carlos G. Urzúa-Traslaviña ◽

Vincent C. Leeuwenburgh ◽

Arkajyoti Bhattacharya ◽

Stefan Loipfinger ◽

Marcel A. T. M. van Vugt ◽

...

Keyword(s):

Independent Component Analysis ◽

High Throughput Sequencing ◽

Principal Component ◽

Component Analysis ◽

Independent Component ◽

Sequencing Data ◽

New Members ◽

High Throughput Sequencing Data ◽

Gene Sets ◽

Functional Understanding

AbstractThe interpretation of high throughput sequencing data is limited by our incomplete functional understanding of coding and non-coding transcripts. Reliably predicting the function of such transcripts can overcome this limitation. Here we report the use of a consensus independent component analysis and guilt-by-association approach to predict over 23,000 functional groups comprised of over 55,000 coding and non-coding transcripts using publicly available transcriptomic profiles. We show that, compared to using Principal Component Analysis, Independent Component Analysis-derived transcriptional components enable more confident functionality predictions, improve predictions when new members are added to the gene sets, and are less affected by gene multi-functionality. Predictions generated using human or mouse transcriptomic data are made available for exploration in a publicly available web portal.

Download Full-text

Screening and survival analysis of melanoma immunodrug response-related genes and the function of magnetic nanoparticles in gene extraction

Materials Express ◽

10.1166/mex.2021.2037 ◽

2021 ◽

Vol 11 (8) ◽

pp. 1306-1312

Author(s):

Li Song ◽

Ningchao Du ◽

Haitao Luo ◽

Furong Li

Keyword(s):

Survival Analysis ◽

Magnetic Nanoparticles ◽

Drug Response ◽

High Throughput Sequencing ◽

Cox Proportional Hazards ◽

Sequencing Data ◽

Protein Coding ◽

Non Coding Rna ◽

Long Non Coding Rna ◽

Rna Genes

This study aimed to identify the association of protein coding and long non coding RNA genes with immunotherapy response in melanoma. Based on RNA sequencing data of melanoma specimens, the expression levels of protein coding and long non coding RNA genes were calculated using the Kallisto RNA-seq quantification method, and differently expressed genes were detected using the DESeq2 method. Cox proportional hazards regression was used to evaluate the effects of gene expression on survival. According to the clinical data of 14 patients with drug response and 11 patients without drug response, 18 protein coding genes and 14 long non coding RNAs showed differential expressions (multiple of difference > 2 and P < 0.01 after correction), among which the coding genes of differential expression were significantly enriched through the process of cell adhesion (P < 0.01). The results of survival analysis showed that 18 coding genes and 14 long non coding RNA genes had significant effects on patient survival (P < 0.01). In this study, magnetic nanoparticles can be used to extract genomic DNA and total RNA due to their paramagnetism and biocompatibility, then transcriptome high-throughput sequencing was performed. The method has the advantages of removing dangerous reagents such as phenol and chloroform, replacing inorganic coating such as silica with organic oil, and shortening reaction time. Protein coding and long non coding RNA genes as well as magnetic nanoparticles may serve as potential cancer immune biomarker targets for developing future oncological treatments.

Download Full-text