scholarly journals NCBI’s Virus Discovery Codeathon: Building “FIVE” —The Federated Index of Viral Experiments API Index

Viruses ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 1424
Author(s):  
Joan Martí-Carreras ◽  
Alejandro Rafael Gener ◽  
Sierra D. Miller ◽  
Anderson F. Brito ◽  
Christiam E. Camacho ◽  
...  

Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus–host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.

2021 ◽  
Vol 149 ◽  
Author(s):  
R. K. Sanayaima Singh ◽  
Md. Zubbair Malik ◽  
R. K. Brojen Singh

Abstract One of the main concerns about the fast spreading coronavirus disease 2019 (Covid-19) pandemic is how to intervene. We analysed severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) isolates data using the multifractal approach and found a rich in viral genome diversity, which could be one of the root causes of the fast Covid-19 pandemic and is strongly affected by pressure and health index of the hosts inhabited regions. The calculated mutation rate (mr) is observed to be maximum at a particular pressure, beyond which mr maintains diversity. Hurst exponent and fractal dimension are found to be optimal at a critical pressure (Pm), whereas, for P > Pm and P < Pm, we found rich genome diversity relating to complicated genome organisation and virulence of the virus. The values of these complexity measurement parameters are found to be increased linearly with health index values.


2020 ◽  
Author(s):  
Andrew J. Page ◽  
Nabil-Fareed Alikhan ◽  
Michael Strinden ◽  
Thanh Le Viet ◽  
Timofey Skvortsov

AbstractSpoligotyping of Mycobacterium tuberculosis provides a subspecies classification of this major human pathogen. Spoligotypes can be predicted from short read genome sequencing data; however, no methods exist for long read sequence data such as from Nanopore or PacBio. We present a novel software package Galru, which can rapidly detect the spoligotype of a Mycobacterium tuberculosis sample from as little as a single uncorrected long read. It allows for near real-time spoligotyping from long read data as it is being sequenced, giving rapid sample typing. We compare it to the existing state of the art software and find it performs identically to the results obtained from short read sequencing data. Galru is freely available from https://github.com/quadram-institute-bioscience/galru under the GPLv3 open source licence.


2020 ◽  
Vol 496 (2) ◽  
pp. 2346-2361 ◽  
Author(s):  
Berta Margalef-Bentabol ◽  
Marc Huertas-Company ◽  
Tom Charnock ◽  
Carla Margalef-Bentabol ◽  
Mariangela Bernardi ◽  
...  

ABSTRACT With the advent of future big-data surveys, automated tools for unsupervised discovery are becoming ever more necessary. In this work, we explore the ability of deep generative networks for detecting outliers in astronomical imaging data sets. The main advantage of such generative models is that they are able to learn complex representations directly from the pixel space. Therefore, these methods enable us to look for subtle morphological deviations which are typically missed by more traditional moment-based approaches. We use a generative model to learn a representation of expected data defined by the training set and then look for deviations from the learned representation by looking for the best reconstruction of a given object. In this first proof-of-concept work, we apply our method to two different test cases. We first show that from a set of simulated galaxies, we are able to detect ${\sim}90{{\ \rm per\ cent}}$ of merging galaxies if we train our network only with a sample of isolated ones. We then explore how the presented approach can be used to compare observations and hydrodynamic simulations by identifying observed galaxies not well represented in the models. The code used in this is available at https://github.com/carlamb/astronomical-outliers-WGAN.


2019 ◽  
Author(s):  
Joana Isidro ◽  
Susana Ferreira ◽  
Miguel Pinto ◽  
Fernanda Domingues ◽  
Mónica Oleastro ◽  
...  

AbstractArcobacter butzleri is a food and waterborne bacteria and an emerging human pathogen, frequently displaying a multidrug resistant character. Still, no comprehensive genome-scale comparative analysis has been performed so far, which has limited our knowledge on A. butzleri diversification and pathogenicity. Here, we performed a deep genome analysis of A. butzleri focused on decoding its core- and pan-genome diversity and specific genetic traits underlying its pathogenic potential and diverse ecology. In total, 49 A. butzleri strains (collected from human, animal, food and environmental sources) were screened.A. butzleri (genome size 2.07-2.58 Mbp) revealed a large open pan-genome with 7474 genes (about 50% being singletons) and a small core-genome with 1165 genes. The core-genome is highly diverse (≥55% of the core genes presenting at least 40/49 alleles), being enriched with genes associated with housekeeping functions. In contrast, the accessory genome presented a high proportion of loci with an unknown function, also being particularly overrepresented by genes associated with defence mechanisms. A. butzleri revealed a plastic virulome (including newly identified determinants), marked by the differential presence of multiple adaptation-related virulence factors, such as the urease cluster ureD(AB)CEFG (phenotypically confirmed), the hypervariable hemagglutinin-encoding hecA, a putative type I secretion system (T1SS) harboring another agglutinin potentially related to adherence and a novel VirB/D4 T4SS likely linked to interbacterial competition and cytotoxicity. In addition, A. butzleri harbors a large repertoire of efflux pumps (EPs) (ten “core” and nine differentially present) and other antibiotic resistant determinants. We provide the first description of a genetic determinant of macrolides resistance in A. butzleri, by associating the inactivation of a TetR repressor (likely regulating an EP) with erythromycin resistance. Fluoroquinolones resistance correlated with the Thr-85-Ile substitution in GyrA and ampicillin resistance was linked to an OXA-15-like β-lactamase. Remarkably, by decoding the polymorphism pattern of the porin- and adhesin-encoding main antigen PorA, this study strongly supports that this pathogen is able to exchange porA as a whole and/or hypervariable epitope-encoding regions separately, leading to a multitude of chimeric PorA presentations that can impact pathogen-host interaction during infection. Ultimately, our unprecedented screening of short sequence repeats detected potential phase-variable genes related to adaptation and host/environment interaction, such as lipopolysaccharide modification and motility/chemotaxis, suggesting that phase variation likely modulate A. butzleri key adaptive functions.In summary, this study constitutes a turning point on A. butzleri comparative genomics revealing that this human gastrointestinal pathogen is equipped with vast virulence and antibiotic resistance arsenals, which, coupled with its remarkable core- and pan-genome diversity, opens a multitude of phenotypic fingerprints for environmental/host adaptation and pathogenicity.IMPACT STATEMENTDiarrhoeal diseases are the most common cause of human illness caused by foodborne hazards, but the surveillance of diarrhoeal diseases is biased towards the most commonly searched infectious agents (namely Campylobacter jejuni and C. coli). In fact, other less studied pathogens are frequently found as the etiological agent when refined non-selective culture conditions are applied. A hallmark example is the diarrhoeal-causing Arcobacter butzleri which, despite being also associated with extra-intestinal diseases, such as bacteremia in humans and mastitis in animals, and displaying high rates of antibiotic resistance, has not yet been profoundly investigated regarding its epidemiology, diversity and pathogenicity. To overcome the general lack of knowledge on A. butzleri comparative genomics, we provide the first comprehensive genome-scale analysis of A. butzleri focused on exploring the intraspecies virulome content and diversity, resistance determinants, as well as how this pathogen shapes its genome towards ecological adaptation and host invasion. The unveiled scenario of A. butzleri rampant diversity and plasticity reinforces the pathogenic potential of this food and waterborne hazard, while opening multiple research lines that will certainly contribute to the future development of more robust species-oriented diagnostics and molecular surveillance of A. butzleri.DATA SUMMARYA. butzleri raw sequence reads generated in the present study were deposited in the European Nucleotide Archive (ENA) (BioProject PRJEB34441). The assembled contigs (.fasta and .gbk files), the nucleotide sequences of the predicted transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA) (.ffn files) and the respective amino acid sequences of the translated CDS sequences (.faa files) are available at http://doi.org/10.5281/zenodo.3434222. Detailed ENA accession numbers, as well as the draft genome statistics are described in Table S1.


Author(s):  
Yixiong Chen ◽  
Yang Yang ◽  
Zhanyao Lei ◽  
Mingyuan Xia ◽  
Zhengwei Qi

AbstractModern RESTful services expose RESTful APIs to integrate with diversified applications. Most RESTful API parameters are weakly typed, which greatly increases the possible input value space. This poses difficulties for automated testing tools to generate effective test cases to reveal web service defects related to parameter validation. We call this phenomenon the type collapse problem. To remedy this problem, we introduce FET (Format-encoded Type) techniques, including the FET, the FET lattice, and the FET inference to model fine-grained information for API parameters. Enhanced by FET techniques, automated testing tools can generate targeted test cases. We demonstrate Leif, a trace-driven fuzzing tool, as a proof-of-concept implementation of FET techniques. Experiment results on 27 commercial services show that FET inference precisely captures documented parameter definitions, which helps Leif to discover 11 new bugs and reduce $$72\% \sim 86\%$$ 72 % ∼ 86 % fuzzing time as compared to state-of-the-art fuzzers.


2021 ◽  
Vol 288 (1961) ◽  
Author(s):  
Anna Brüniche-Olsen ◽  
Kenneth F. Kellner ◽  
Jerrold L. Belant ◽  
J. Andrew DeWoody

More than 25% of species assessed by the International Union for Conservation of Nature (IUCN) are threatened with extinction. Understanding how environmental and biological processes have shaped genomic diversity may inform management practices. Using 68 extant avian species, we parsed the effects of habitat availability and life-history traits on genomic diversity over time to provide a baseline for conservation efforts. We used published whole-genome sequence data to estimate overall genomic diversity as indicated by historical long-term effective population sizes ( N e ) and current genomic variability ( H ), then used environmental niche modelling to estimate Pleistocene habitat dynamics for each species. We found that N e and H were positively correlated with habitat availability and related to key life-history traits (body mass and diet), suggesting the latter contribute to the overall genomic variation. We found that H decreased with increasing species extinction risk, suggesting that H may serve as a leading indicator of demographic trends related to formal IUCN conservation status in birds. Our analyses illustrate that genome-wide summary statistics estimated from sequence data reflect meaningful ecological attributes relevant to species conservation.


Author(s):  
David F. Thurston

Abstract Heavy Haul Rail Transport in Canada has not advanced as in other countries. This gives these railways a chance to leap over intermediate technology in Train Control, Asset Management and levels of RAMS not offered in previous designs. Train Control is part of a System of Systems (SoS) that provides the basis of safety for operations, while allowing other non-vital systems to be implemented cheaper and faster than systems currently being modified to perform these tasks. This paper performs several tasks. First, it will update the reader on work being performed under the auspices of the Railway Association of Canada for Enhanced Train Control, and then it will describe how Train Control will interact with other systems to provide the overall functionality for safe train movement. Finally, the base requirements for a Train Control System meeting the requirements for Enhanced Train Control will be described. The initial concepts and designs will be presented to show how the requirements will be met. In addition, the pilot project that is underway for proof of concept and operational readiness will be detailed. Test cases will be illustrated and portions of an Operational Concept, Requirements Analysis and Form Fit ad Function (F3) specifications will be documented. All of this will be the stepping stone for other systems to lead in advanced functionality and operation on Canadian Railways of the future.


2019 ◽  
Vol 10 (1) ◽  
pp. 37-42
Author(s):  
Mareike Busche ◽  
Boas Pucker ◽  
Prisca Viehöver ◽  
Bernd Weisshaar ◽  
Ralf Stracke

Different Musa species, subspecies, and cultivars are currently investigated to reveal their genomic diversity. Here, we compare the genome sequence of one of the commercially most important cultivars, Musa acuminata Dwarf Cavendish, against the Pahang reference genome assembly. Numerous small sequence variants were detected and the ploidy of the cultivar presented here was determined as triploid based on sequence variant frequencies. Illumina sequence data also revealed a duplication of a large segment on the long arm of chromosome 2 in the Dwarf Cavendish genome. Comparison against previously sequenced cultivars provided evidence that this duplication is unique to Dwarf Cavendish. Although no functional relevance of this duplication was identified, this example shows the potential of plants to tolerate such aneuploidies.


Author(s):  
Haige Han ◽  
Kenneth Bryan ◽  
Wunierfu Shiraigol ◽  
Dongyi Bai ◽  
Yiping Zhao ◽  
...  

Abstract The Mongolian horse is one of the oldest extant horse populations and although domesticated, most animals are free-ranging and experience minimal human intervention. As an ancient population originating in one of the key domestication centers, the Mongolian horse may play a key role in understanding the origins and recent evolutionary history of horses. Here we describe an analysis of high-density genome-wide single-nucleotide polymorphism (SNP) data in 40 globally dispersed horse populations (n = 895). In particular, we have focused on new results from Chinese Mongolian horses (n = 100) that represent 5 distinct populations. These animals were genotyped for 670K SNPs and the data were analyzed in conjunction with 35K SNP data for 35 distinct breeds. Analyses of these integrated SNP data sets demonstrated that the Chinese Mongolian populations were genetically distinct from other modern horse populations. In addition, compared to other domestic horse breeds, the Chinese Mongolian horse populations exhibited relatively high genomic diversity. These results suggest that, in genetic terms, extant Chinese Mongolian horses may be the most similar modern populations to the animals originally domesticated in this region of Asia. Chinese Mongolian horse populations may therefore retain ancestral genetic variants from the earliest domesticates. Further genomic characterization of these populations in conjunction with archaeogenetic sequence data should be prioritized for understanding recent horse evolution and the domestication process that has led to the wealth of diversity observed in modern global horse breeds.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Eszter Kaszab ◽  
György Lengyel ◽  
Szilvia Marton ◽  
Ádám Dán ◽  
Krisztián Bányai ◽  
...  

AbstractCircoviruses, cycloviruses and other circular, replication-associated protein-encoding single stranded (CRESS) DNA viruses have been detected in a variety of animal taxa. In this study, cloacal swab samples (n = 90) were examined for CRESS DNA viruses from 31 wild bird species living at various aquatic sites in Hungary to identify possible reservoirs of viruses pathogenic to domestic poultry. A total of 30 (33.3%) specimens tested positive with pan-CRESS DNA virus specific PCR. Goose circovirus (GoCV), Duck associated cyclovirus 1 (DuACyV-1) and Garrulus glandarius associated circular virus 1 (GgaCV-1) were detected in nine, three and two different bird species, respectively. Selected specimens were subjected to whole genome sequencing. The obtained sequence data revealed conserved gene structure within the identified virus species and detected homologous (within GoCV) and possible heterologous recombination (within DuACyV-1) events. Results presented here provide new information on the genomic diversity and evolution of selected CRESS DNA viruses.


Sign in / Sign up

Export Citation Format

Share Document