european nucleotide archive
Recently Published Documents


TOTAL DOCUMENTS

53
(FIVE YEARS 27)

H-INDEX

14
(FIVE YEARS 4)

2021 ◽  
Author(s):  
Luc Cornet ◽  
Ilse Cleenwerck ◽  
Jessy Praet ◽  
Raphaël R. Leonard ◽  
Nicolas J. Vereecken ◽  
...  

AbstractSnodgrassella is a Betaproteobacteria genus found in the gut of honeybees (Apis spp.) and bumblebees (Bombus spp). It is part of a conserved microbiome that is composed of few core phylotypes and is essential for bee health and metabolism. Phylogenomic analyses using whole genome sequences of 75 Snodgrassella strains from 4 species of honey bees and 14 species of bumblebees showed that these strains formed a monophyletic lineage within the Neisseriaceae family, that Snodgrassella isolates from Asian honeybees diverged early on from the other species in their evolution, that isolates from honeybees and bumblebees were well separated and that this genus consists of at least seven species. We propose to formally name two new Snodgrassella species that were isolated from bumblebees, i.e. Snodgrassella gandavensis sp. nov. and Snodgrassella communis sp. nov. Possible evolutionary scenarios for 107 species or group specific genes revealed very limited evidence for horizontal gene transfer. Functional analyses revealed the importance of small proteins, defense mechanisms, amino acid transport and metabolism, inorganic ion transport and metabolism and carbohydrate transport and metabolism among these 107 specific genes.ImportanceThe microbiome of honeybees (Apis spp.) and bumblebees (Bombus spp.) is highly conserved and represented by few phylotypes. This simplicity in taxon composition makes the bee’s microbiome an emergent model organism for the study of gut microbial communities. Since the description of the Snodgrassella genus, which was isolated from the gut of honeybees and bumblebees in 2013, a single species, i.e. Snodgrassella alvi, has been named. Here we demonstrate that this genus is actually composed of at least seven species, two of them (Snodgrassella gandavensis sp. nov. and Snodgrassella communis sp. nov.) being formally described in the present publication. We also report the presence of 107 genes specific to Snodgrassella species, showing notably the importance of small proteins and defense mechanisms in this genus.Data summaryCornet L and Vandamme P, European Nucleotide Archive (ENA), Project accession: PRJEB47378Cornet L and Vandamme P, European Nucleotide Archive (ENA), Reads accessions: SAMEA9570070 - SAMEA9570078Cornet L and Vandamme P, European Nucleotide Archive (ENA), Genome accessions: GCA_914768015, GCA_914768025, GCA_914768035, GCA_914768045, GCA_914768055, GCA_914768065, GCA_914768075, GCA_914768085, GCA_914768095.


2021 ◽  
Author(s):  
Margarita Kalamara ◽  
James C. Abbott ◽  
Cait E. MacPhee ◽  
Nicola. R. Stanley-Wall

AbstractBiofilms are communities of bacteria that are attached to a surface and surrounded by an extracellular matrix. The extracellular matrix protects the community from stressors in the environment, making biofilms robust. The Gram-positive soil bacterium Bacillus subtilis, particularly the isolate NCIB 3610, is widely used as a model for studying biofilm formation. B. subtilis NCIB 3610 forms colony biofilms that are architecturally complex and highly hydrophobic. The hydrophobicity is linked, in part, to the localisation of the protein BslA at the surface of the biofilm, which provides the community with increased resistance to biocides. As most of our knowledge about B. subtilis biofilm formation comes from one isolate, it is unclear if biofilm hydrophobicity is a widely distributed feature of the species. To address this knowledge gap, we collated a library of B. subtilis soil isolates and acquired their whole genome sequences. We used our new isolates to examine biofilm hydrophobicity and found that, although BslA is encoded and produced by all isolates in our collection, hydrophobicity is not a universal feature of B. subtilis colony biofilms. To test whether the matrix exopolymer poly γ-glutamic acid could be masking hydrophobicity in our hydrophilic isolates, we constructed deletion mutants and found, contrary to our hypothesis, that the presence of poly γ-glutamic acid was not the reason behind the observed hydrophilicity. This study highlights the natural variation in the properties of biofilms formed by different isolates and the importance of using a more diverse range of isolates as representatives of a species.RepositoriesRaw sequence reads and annotated assemblies have been submitted to the European Nucleotide Archive under accession PRJEB43128.


2021 ◽  
Author(s):  
Grace A. Blackwell ◽  
Martin Hunt ◽  
Kerri M. Malone ◽  
Leandro Lima ◽  
Gal Horesh ◽  
...  

ABSTRACTThe open sharing of genomic data provides an incredibly rich resource for the study of bacterial evolution and function, and even anthropogenic activities such as the widespread use of antimicrobials. Whilst these archives are rich in data, considerable processing is required before biological questions can be addressed. Here, we assembled and characterised 661,405 bacterial genomes using a uniform standardised approach, retrieved from the European Nucleotide Archive (ENA) in November of 2018. A searchable COBS index has been produced, facilitating the easy interrogation of the entire dataset for a specific gene or mutation. Additional MinHash and pp-sketch indices support genome-wide comparisons and estimations of genomic distance. An analysis on this scale revealed the uneven species composition in the ENA/public databases, with just 20 of the total 2,336 species making up 90% of the genomes. The over-represented species tend to be acute/common human pathogens. This aligns with research priorities at different levels from individuals with targeted but focused research questions, areas of focus for the funding bodies or national public health agencies, to those identified globally as priority pathogens by the WHO for their resistance to front and last line antimicrobials. Understanding the actual and potential biases in bacterial diversity depicted in this snapshot, and hence within the data being submitted to the public sequencing archives, is essential if we are to target and fill gaps in our understanding of the bacterial kingdom.


2021 ◽  
Author(s):  
Quentin John Groom ◽  
Mathias Dillen ◽  
Pieter Huybrechts ◽  
Rukaya Johaadien ◽  
Niki Kyriakopoulou ◽  
...  

When sequencing molecules from an organism it is standard practice to create voucher specimens. This ensures that the results are repeatable and that the identification of the organism can be verified. It also means that the sequence data can be linked to a whole host of other data related to the specimen, including traits, other sequences, environmental data, and geography. It is therefore critical that explicit, preferably machine readable, links exist between voucher specimens and sequence. However, such links do not exist in the databases of the International Nucleotide Sequence Database Collaboration (INSDC). If it were possible to create permanent bidirectional links between specimens and sequence it would not only make data more findable, but would also open new avenues for research. In the Biohackathon we built a semi-automated workflow to take specimen data from the Meise Herbarium and search for references to those specimens in the European Nucleotide Archive (ENA). We achieved this by matching data elements of the specimen and sequence together and by adding a “human-in-the-loop” process whereby possible matches could be confirmed. Although we found that it was possible to discover and match sequences to their vouchers in our collection, we encountered many problems of data standardization, missing data and errors. These problems make the process unreliable and unsuitable to rediscover all the possible links that exist. Ultimately, improved standards and training would remove the need for retrospective relinking of specimens with their sequence. Therefore, we make some tentative recommendations for how this could be achieved in the future.


2021 ◽  
Vol 5 (1) ◽  
pp. 33-43
Author(s):  
Abdul-Hussein Ghazi

A species of freshwater prawn Macrobrachium was newly recorded from Al-Hammar marsh, Southern Iraq. Morphological features accompanied by 18 S r DNA analyses indicated that the species is Macrobrachum lar. DNA sequences of specimens of this species from the marsh is deposited at the GenBank for DNA as a new global isolate and was published by The National Center for Biotechnology Information (NCBI), the European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ). M. lar inhabit deep sections of streams and brackish water, adults live in freshwater, while juveniles can be found in brackish or saltwater, the total length of M. lar recorded in this study was ranged between 72 and 109 mm for males and between 61 and 93 mm for females.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1502-D1506
Author(s):  
Ugis Sarkans ◽  
Anja Füllgrabe ◽  
Ahmed Ali ◽  
Awais Athar ◽  
Ehsan Behrangi ◽  
...  

Abstract ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.


2020 ◽  
Vol 49 (D1) ◽  
pp. D92-D96
Author(s):  
Eric W Sayers ◽  
Mark Cavanaugh ◽  
Karen Clark ◽  
Kim D Pruitt ◽  
Conrad L Schoch ◽  
...  

Abstract GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 9.9 trillion base pairs from over 2.1 billion nucleotide sequences for 478 000 formally described species. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. Recent updates include new resources for data from the SARS-CoV-2 virus, updates to the NCBI Submission Portal and associated submission wizards for dengue and SARS-CoV-2 viruses, new taxonomy queries for viruses and prokaryotes, and simplified submission processes for EST and GSS sequences.


2020 ◽  
Vol 49 (D1) ◽  
pp. D82-D85
Author(s):  
Peter W Harrison ◽  
Alisha Ahamed ◽  
Raheela Aslam ◽  
Blaise T F Alako ◽  
Josephine Burgin ◽  
...  

Abstract The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), provided by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), has for almost forty years continued in its mission to freely archive and present the world's public sequencing data for the benefit of the entire scientific community and for the acceleration of the global research effort. Here we highlight the major developments to ENA services and content in 2020, focussing in particular on the recently released updated ENA browser, modernisation of our release process and our data coordination collaborations with specific research communities.


2020 ◽  
Vol 49 (D1) ◽  
pp. D121-D124
Author(s):  
Masanori Arita ◽  
Ilene Karsch-Mizrachi ◽  
Guy Cochrane

Abstract The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been the core infrastructure for collecting and providing nucleotide sequence data and metadata for >30 years. Three partner organizations, the DNA Data Bank of Japan (DDBJ) at the National Institute of Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have been collaboratively maintaining the INSDC for the benefit of not only science but all types of community worldwide.


2020 ◽  
Author(s):  
Saffiatou Darboe ◽  
Richard S. Bradbury ◽  
Jody Phelan ◽  
Abdoulie Kanteh ◽  
Abdul-Khalie Muhammad ◽  
...  

AbstractNon-typhoidal Salmonella associated with multidrug resistance cause invasive disease in sub-Saharan African. Specific lineages of serovars S. Typhimurium and S. Enteritidis are implicated. We characterised the genomic diversity of 100 clinical Non-typhoidal Salmonella collected from 93 patients in 2001 from the eastern and 2006 to 2018 in the western regions of The Gambia respectively. Phenotypic susceptibility applied Kirby Baur disk diffusion and whole genome sequencing utilized Illumina platforms. The predominant serovars were S. Typhimurium ST19 (31/100) and S. Enteritidis ST11 (18/100) restricted to invasive disease with the notable absence of S. Typhimurium ST313. Phylogenetic analysis performed in the context of 495 African strains from the European Nucleotide Archive confirmed the presence of the S. Enteritidis virulent epidemic invasive multidrug resistant West African clade. Multidrug resistance including chloramphenicol and azithromycin has emerged among the West African S. Enteritidis clade 7/9 (78%) with potential for spread, thus having important implications for patient management warranting systematic surveillance and epidemiologic investigations to inform control.Data summarySequences are deposited in the NCBI sequence reads archive (SRA) under BioProject ID:PRJEB38968. The genomic assemblies are available for download from the European Nucleotide Archive (ENA): http://www.ebi.ac.uk/ena/data/view/. Accession numbers SAMEA6991082 to SAME6991180


Sign in / Sign up

Export Citation Format

Share Document