The Nature and Extent of Plasmid Variation in Chlamydia trachomatis

Chlamydia trachomatis is an obligate intracellular pathogen of humans, causing both the sexually transmitted infection, chlamydia, and the most common cause of infectious blindness, trachoma. The majority of sequenced C. trachomatis clinical isolates carry a 7.5-Kb plasmid, and it is becoming increasingly evident that this is a key determinant of pathogenicity. The discovery of the Swedish New Variant and the more recent Finnish variant highlight the importance of understanding the natural extent of variation in the plasmid. In this study we analysed 524 plasmid sequences from publicly available whole-genome sequence data. Single nucleotide polymorphisms (SNP) in each of the eight coding sequences (CDS) were identified and analysed. There were 224 base positions out of a total 7550 bp that carried a SNP, which equates to a SNP rate of 2.97%, nearly three times what was previously calculated. After normalising for CDS size, CDS8 had the highest SNP rate at 3.97% (i.e., number of SNPs per total number of nucleotides), whilst CDS6 had the lowest at 1.94%. CDS5 had the highest total number of SNPs across the 524 sequences analysed (2267 SNPs), whereas CDS6 had the least SNPs with only 85 SNPs. Calculation of the genetic distances identified CDS6 as the least variable gene at the nucleotide level (d = 0.001), and CDS5 as the most variable (d = 0.007); however, at the amino acid level CDS2 was the least variable (d = 0.001), whilst CDS5 remained the most variable (d = 0.013). This study describes the largest in-depth analysis of the C. trachomatis plasmid to date, through the analysis of plasmid sequence data mined from whole genome sequences spanning 50 years and from a worldwide distribution, providing insights into the nature and extent of existing variation within the plasmid as well as guidance for the design of future diagnostic assays. This is crucial at a time when single-target diagnostic assays are failing to detect natural mutants, putting those infected at risk of a serious long-term and life-changing illness.

Download Full-text

148 Multiple Dysregulated Novel Pathways and Genes in Aleutian Mink Disease Revealed by Selection Signatures and Gene Network Analyses Using Whole-genome Sequence Data

Journal of Animal Science ◽

10.1093/jas/skab235.137 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 76-76

Author(s):

Seyed Milad Vahedi ◽

Karim Karimi ◽

Siavash Salek Ardestani ◽

Younes Miar

Keyword(s):

Sequence Data ◽

American Mink ◽

Enrichment Analysis ◽

Whole Genome Sequence ◽

Fixation Index ◽

Pathway Enrichment Analysis ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Network Analyses ◽

Genome Level

Abstract Aleutian disease (AD) is a chronic persistent infection in domestic mink caused by Aleutian mink disease virus (AMDV). Female mink’s fertility and pelt quality depression are the main reasons for the AD’s negative economic impacts on the mink industry. A total number of 79 American mink from the Canadian Center for Fur Animal Research at Dalhousie University (Truro, NS, Canada) were classified based on the results of counter immunoelectrophoresis (CIEP) tests into two groups of positive (n = 48) and negative (n = 31). Whole-genome sequences comprising 4,176 scaffolds and 8,039,737 single nucleotide polymorphisms (SNPs) were used to trace the selection footprints for response to AMDV infection at the genome level. Window-based fixation index (Fst) and nucleotide diversity (θπ) statistics were estimated to compare positive and negative animals’ genomes. The overlapped top 1% genomic windows between two statistics were considered as potential regions underlying selection pressures. A total of 98 genomic regions harboring 33 candidate genes were detected as selective signals. Most of the identified genes were involved in the development and functions of immune system (PPP3CA, SMAP2, TNFRSF21, SKIL, and AKIRIN2), musculoskeletal system (COL9A2, PPP1R9A, ANK2, AKAP9, and STRIT1), nervous system (ASCL1, ZFP69B, SLC25A27, MCF2, and SLC7A14), reproductive system (CAMK2D, GJB7, SSMEM1, C6orf163), liver (PAH and DPYD), and lung (SLC35A1). Gene-expression network analysis showed the interactions among 27 identified genes. Moreover, pathway enrichment analysis of the constructed genes network revealed significant oxytocin (KEGG: hsa04921) and GnRH signaling (KEGG: hsa04912) pathways, which are likely to be impaired by AMDV leading to dams’ fecundity reduction. These results provided a perspective to the genetic architecture of response to AD in American mink and novel insight into the pathogenesis of AMDV.

Download Full-text

Aquila: diploid personal genome assembly and comprehensive variant detection based on linked reads

10.1101/660605 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xin Zhou ◽

Lu Zhang ◽

Ziming Weng ◽

David L. Dill ◽

Arend Sidow

Keyword(s):

Genetic Variation ◽

Genome Sequence ◽

Genome Assembly ◽

Sequence Data ◽

Association Studies ◽

Cost Effective ◽

Whole Genome Sequence ◽

Personal Genome ◽

Whole Genome ◽

Nucleotide Polymorphisms

AbstractVariant discovery in personal, whole genome sequence data is critical for uncovering the genetic contributions to health and disease. We introduce a new approach, Aquila, that uses linked-read data for generating a high quality diploid genome assembly, from which it then comprehensively detects and phases personal genetic variation. Assemblies cover >95% of the human reference genome, with over 98% in a diploid state. Thus, the assemblies support detection and accurate genotyping of the most prevalent types of human genetic variation, including single nucleotide polymorphisms (SNPs), small insertions and deletions (small indels), and structural variants (SVs), in all but the most difficult regions. All heterozygous variants are phased in blocks that can approach arm-level length. The final output of Aquila is a diploid and phased personal genome sequence, and a phased VCF file that also contains homozygous and a few unphased heterozygous variants. Aquila represents a cost-effective evolution of whole-genome reconstruction that can be applied to cohorts for variation discovery or association studies, or to single individuals with rare phenotypes that could be caused by SVs or compound heterozygosity.

Download Full-text

Single-Nucleotide Polymorphisms in the Whole-Genome Sequence Data of Shiga Toxin-Producing Escherichia coli O157:H7/H- Strains by Cultivation

Current Microbiology ◽

10.1007/s00284-017-1208-z ◽

2017 ◽

Vol 74 (4) ◽

pp. 425-430 ◽

Cited By ~ 3

Author(s):

Eiji Yokoyama ◽

Shinichiro Hirai ◽

Taichiro Ishige ◽

Satoshi Murakami

Keyword(s):

Escherichia Coli ◽

Single Nucleotide Polymorphisms ◽

Genome Sequence ◽

Shiga Toxin ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Escherichia Coli O157 ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

Download Full-text

Optimizing Sequencing Resources in Genotyped Livestock Populations Using Linear Programming

Frontiers in Genetics ◽

10.3389/fgene.2021.740340 ◽

2021 ◽

Vol 12 ◽

Author(s):

Hao Cheng ◽

Keyu Xu ◽

Jinghui Li ◽

Kuruvilla Joseph Abraham

Keyword(s):

Linear Programming ◽

Genome Sequence ◽

Sequence Data ◽

Low Cost ◽

Whole Genome Sequence ◽

Full Potential ◽

Whole Genome ◽

Efficient Allocation ◽

Nucleotide Polymorphisms ◽

Genome Sequence Data

Low-cost genome-wide single-nucleotide polymorphisms (SNPs) are routinely used in animal breeding programs. Compared to SNP arrays, the use of whole-genome sequence data generated by the next-generation sequencing technologies (NGS) has great potential in livestock populations. However, sequencing a large number of animals to exploit the full potential of whole-genome sequence data is not feasible. Thus, novel strategies are required for the allocation of sequencing resources in genotyped livestock populations such that the entire population can be imputed, maximizing the efficiency of whole genome sequencing budgets. We present two applications of linear programming for the efficient allocation of sequencing resources. The first application is to identify the minimum number of animals for sequencing subject to the criterion that each haplotype in the population is contained in at least one of the animals selected for sequencing. The second application is the selection of animals whose haplotypes include the largest possible proportion of common haplotypes present in the population, assuming a limited sequencing budget. Both applications are available in an open source program LPChoose. In both applications, LPChoose has similar or better performance than some other methods suggesting that linear programming methods offer great potential for the efficient allocation of sequencing resources. The utility of these methods can be increased through the development of improved heuristics.

Download Full-text

Whole-genome sequencing reveals insights into the adaptation of French Charolais cattle to Cuban tropical conditions

Genetics Selection Evolution ◽

10.1186/s12711-020-00597-9 ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Lino C. Ramírez-Ayala ◽

Dominique Rocha ◽

Sebas E. Ramos-Onsins ◽

Jordi Leno-Colorado ◽

Mathieu Charles ◽

...

Keyword(s):

Sequence Data ◽

Bos Indicus ◽

Whole Genome Sequence ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Extended Haplotype ◽

Functional Impact ◽

Phenotypic Differences ◽

Tropical Conditions ◽

Charolais Cattle

Abstract Background In the early 20th century, Cuban farmers imported Charolais cattle (CHFR) directly from France. These animals are now known as Chacuba (CHCU) and have become adapted to the rough environmental tropical conditions in Cuba. These conditions include long periods of drought and food shortage with extreme temperatures that European taurine cattle have difficulty coping with. Results In this study, we used whole-genome sequence data from 12 CHCU individuals together with 60 whole-genome sequences from six additional taurine, indicus and crossed breeds to estimate the genetic diversity, structure and accurate ancestral origin of the CHCU animals. Although CHCU animals are assumed to form a closed population, the results of our admixture analysis indicate a limited introgression of Bos indicus. We used the extended haplotype homozygosity (EHH) approach to identify regions in the genome that may have had an important role in the adaptation of CHCU to tropical conditions. Putative selection events occurred in genomic regions with a high proportion of Bos indicus, but they were not sufficient to explain adaptation of CHCU to tropical conditions by Bos indicus introgression only. EHH suggested signals of potential adaptation in genomic windows that include genes of taurine origin involved in thermogenesis (ATP9A, GABBR1, PGR, PTPN1 and UCP1) and hair development (CCHCR1 and CDSN). Within these genes, we identified single nucleotide polymorphisms (SNPs) that may have a functional impact and contribute to some of the observed phenotypic differences between CHCU and CHFR animals. Conclusions Whole-genome data confirm that CHCU cattle are closely related to Charolais from France (CHFR) and Canada, but also reveal a limited introgression of Bos indicus genes in CHCU. We observed possible signals of recent adaptation to tropical conditions between CHCU and CHFR founder populations, which were largely independent of the Bos indicus introgression. Finally, we report candidate genes and variants that may have a functional impact and explain some of the phenotypic differences observed between CHCU and CHFR cattle.

Download Full-text

Utilizing Big Data to Identify Tiny Toxic Components: Digitalis

Foods ◽

10.3390/foods10081794 ◽

2021 ◽

Vol 10 (8) ◽

pp. 1794

Author(s):

Elizabeth Sage Hunter ◽

Robert Literman ◽

Sara M. Handy

Keyword(s):

Single Nucleotide Polymorphisms ◽

Dietary Supplements ◽

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Sequence Data ◽

Genus Level

The botanical genus Digitalis is equal parts colorful, toxic, and medicinal, and its bioactive compounds have a long history of therapeutic use. However, with an extremely narrow therapeutic range, even trace amounts of Digitalis can cause adverse effects. Using chemical methods, the United States Food and Drug Administration traced a 1997 case of Digitalis toxicity to a shipment of Plantago (a common ingredient in dietary supplements marketed to improve digestion) contaminated with Digitalis lanata. With increased accessibility to next generation sequencing technology, here we ask whether this case could have been cracked rapidly using shallow genome sequencing strategies (e.g., genome skims). Using a modified implementation of the Site Identification from Short Read Sequences (SISRS) bioinformatics pipeline with whole-genome sequence data, we generated over 2 M genus-level single nucleotide polymorphisms in addition to species-informative single nucleotide polymorphisms. We simulated dietary supplement contamination by spiking low quantities (0–10%) of Digitalis whole-genome sequence data into a background of commonly used ingredients in products marketed for “digestive cleansing” and reliably detected Digitalis at the genus level while also discriminating between Digitalis species. This work serves as a roadmap for the development of novel DNA-based assays to quickly and reliably detect the presence of toxic species such as Digitalis in food products or dietary supplements using genomic methods and highlights the power of harnessing the entire genome to identify botanical species.

Download Full-text

Optimizing Sequencing Resources in Genotyped Livestock Populations Using Linear Programming

10.1101/2020.06.29.179093 ◽

2020 ◽

Author(s):

Hao Cheng ◽

Keyu Xu ◽

Kuruvilla Joseph Abraham

Keyword(s):

Linear Programming ◽

Genome Sequence ◽

Sequence Data ◽

Low Cost ◽

Fixed Number ◽

Whole Genome Sequence ◽

Full Potential ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Genome Sequence Data

AbstractBackgroundLow-cost genome-wide single-nucleotide polymorphisms (SNPs) are routinely used in animal breeding programs. Compared to SNP arrays, the use of whole-genome sequence data generated by the next-generation sequencing technologies (NGS) has great potential in livestock populations. However, a large number of animals are required to be sequenced to exploit the full potential of whole-genome sequence data. Thus, novel strategies are desired to allocate sequencing resources in genotyped livestock populations such that the entire population can be sequenced or imputed efficiently.MethodsWe present two applications of linear programming models called LPChoose for sequencing resources allocation. The first application is to identify the minimum number of animals for sequencing while meeting the criteria that each haplotype in the population is contained in at least one of the animals selected for sequencing. The second is to sequence a fixed number of animals whose haplotypes include as large a proportion as possible of the haplotypes present in the population given a limited sequencing budget. In both cases, we assume that all animals have been haplotyped. We present results from approximation algorithms, and motivate the use of approximations through the correspondence of the problems we address with problems in computer science for which there are no known efficient algorithms.ResultsIn both applications LPChoose performed consistently better than some existing methods making similar assumptions.

Download Full-text

Faculty Opinions recommendation of Optimal algorithms for haplotype assembly from whole-genome sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13339986.14707085 ◽

2011 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Optimal Algorithms ◽

Genome Sequence Data ◽

Haplotype Assembly

Download Full-text

TIGER: inferring DNA replication timing from whole-genome sequence data

Bioinformatics ◽

10.1093/bioinformatics/btab166 ◽

2021 ◽

Cited By ~ 1

Author(s):

Amnon Koren ◽

Dashiell J Massey ◽

Alexa N Bracci

Keyword(s):

Dna Replication ◽

Genome Sequence ◽

Genomic Dna ◽

Sequence Data ◽

Replication Timing ◽

Whole Genome Sequence ◽

Supplementary Information ◽

Whole Genome ◽

Genome Sequence Data ◽

Dna Replication Timing

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

Whole genome sequence data of Bacillus australimaris strain B28A, isolated from Marine Water in India

Data in Brief ◽

10.1016/j.dib.2021.107240 ◽

2021 ◽

pp. 107240

Author(s):

Wael Ali Mohammed Hadi ◽

Boby T Edwin ◽

A Jayakumaran Nair

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Marine Water ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text