Utilizing Big Data to Identify Tiny Toxic Components: Digitalis

The botanical genus Digitalis is equal parts colorful, toxic, and medicinal, and its bioactive compounds have a long history of therapeutic use. However, with an extremely narrow therapeutic range, even trace amounts of Digitalis can cause adverse effects. Using chemical methods, the United States Food and Drug Administration traced a 1997 case of Digitalis toxicity to a shipment of Plantago (a common ingredient in dietary supplements marketed to improve digestion) contaminated with Digitalis lanata. With increased accessibility to next generation sequencing technology, here we ask whether this case could have been cracked rapidly using shallow genome sequencing strategies (e.g., genome skims). Using a modified implementation of the Site Identification from Short Read Sequences (SISRS) bioinformatics pipeline with whole-genome sequence data, we generated over 2 M genus-level single nucleotide polymorphisms in addition to species-informative single nucleotide polymorphisms. We simulated dietary supplement contamination by spiking low quantities (0–10%) of Digitalis whole-genome sequence data into a background of commonly used ingredients in products marketed for “digestive cleansing” and reliably detected Digitalis at the genus level while also discriminating between Digitalis species. This work serves as a roadmap for the development of novel DNA-based assays to quickly and reliably detect the presence of toxic species such as Digitalis in food products or dietary supplements using genomic methods and highlights the power of harnessing the entire genome to identify botanical species.

Download Full-text

Single-Nucleotide Polymorphisms in the Whole-Genome Sequence Data of Shiga Toxin-Producing Escherichia coli O157:H7/H- Strains by Cultivation

Current Microbiology ◽

10.1007/s00284-017-1208-z ◽

2017 ◽

Vol 74 (4) ◽

pp. 425-430 ◽

Cited By ~ 3

Author(s):

Eiji Yokoyama ◽

Shinichiro Hirai ◽

Taichiro Ishige ◽

Satoshi Murakami

Keyword(s):

Escherichia Coli ◽

Single Nucleotide Polymorphisms ◽

Genome Sequence ◽

Shiga Toxin ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Escherichia Coli O157 ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

Download Full-text

Whole-Genome Sequencing of a Year-Round Fruiting Jackfruit Variety Reveals Very High Single Nucleotide Polymorphisms in Inter-Genic Regions

10.21203/rs.3.rs-1176760/v1 ◽

2021 ◽

Author(s):

Tofazzal Islam ◽

Nadia Afroz ◽

ChuShin Koh ◽

M. Nazmul Haque ◽

Md. Jillur Rahman ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Genome Sequence ◽

De Novo ◽

Agricultural Research ◽

Gc Content ◽

Whole Genome Sequence ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Tropical Fruit

Abstract Background Jackfruit (Artocarpus heterophyllus Lam.) is a tropical and sub-tropical fruit tree distributed in Asia, Africa, and South America. It is the national fruit of Bangladesh and produces fruit in the summer season only. However, a year-round jackfruit variety, BARI Kanthal-3 developed by Bangladesh Agricultural Research Institute (BARI) provides fruits from September to June. This study aimed to evaluate the agronomic performance of BARI Kanthal-3 and to generate a draft whole genome sequence to obtain molecular insights of this important unique variety. Results Number of fruits, average each fruit weight, fruit yield per plant, edible portion in fruit and ß carotene content of BARI Kanthal-3 (n = 5) were 422/plant/year, 5.60 kg, 236.32 kg/year, 53.5% and 3614 mg/100g, respectively. During de novo assembly, 817.7 Mb of the BARI Kanthal-3 genome was scaffolded. However, in the reference-guided genome assembly, almost 843 Mb of the BARI Kanthal-3 genome was scaffolded. Through BUSCO assessment, 97.2% of the core genes were represented in the assembly with 1.3% and 1.5% either fragmented or missing, respectively. By comparing the single copy orthologues (SCOs) in three closely and one distantly related species of BARI Kanthal-3, 706 SCOs were found to be shared across the genomes of the five species. The phylogenetic analysis of the shared SCOs showed that A. heterophyllus is the closest species to BARI Kantal-3. The estimated genome size of BARI Kanthal-3 was 1.04 giga base pairs (Gbp) with a heterozygosity rate of 1.62%. The estimated GC content was 34.10%. Variant analysis revealed that BARI Kanthal-3 includes 5.7 M (35%) and 10.4 M (65%) simple and heterozygous single nucleotide polymorphisms (SNPs), and about 90% of all these polymorphisms are located in inter-genic regions. Conclusion The whole-genome sequence of A. heterophyllus cv. BARI Kanthal-3 reveals extremely high single nucleotide polymorphisms in inter-genic regions. The findings of this study will help better understanding the evolution, domestication, phylogenetic relationships, year-round fruiting and the markers development for molecular breeding of this highly nutritious fruit crop.

Download Full-text

Whole Genome Sequence Analysis of Brucella abortus Isolates from Various Regions of South Africa

Microorganisms ◽

10.3390/microorganisms9030570 ◽

2021 ◽

Vol 9 (3) ◽

pp. 570

Author(s):

Maphuti Betty Ledwaba ◽

Barbara Akorfa Glover ◽

Itumeleng Matle ◽

Giuseppe Profiti ◽

Pier Luigi Martelli ◽

...

Keyword(s):

South Africa ◽

Single Nucleotide Polymorphisms ◽

South African ◽

Genome Sequence ◽

Brucella Abortus ◽

Whole Genome Sequence ◽

Snp Analysis ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

The availability of whole genome sequences in public databases permits genome-wide comparative studies of various bacterial species. Whole genome sequence-single nucleotide polymorphisms (WGS-SNP) analysis has been used in recent studies and allows the discrimination of various Brucella species and strains. In the present study, 13 Brucella spp. strains from cattle of various locations in provinces of South Africa were typed and discriminated. WGS-SNP analysis indicated a maximum pairwise distance ranging from 4 to 77 single nucleotide polymorphisms (SNPs) between the South African Brucella abortus virulent field strains. Moreover, it was shown that the South African B. abortus strains grouped closely to B. abortus strains from Mozambique and Zimbabwe, as well as other Eurasian countries, such as Portugal and India. WGS-SNP analysis of South African B. abortus strains demonstrated that the same genotype circulated in one farm (Farm 1), whereas another farm (Farm 2) in the same province had two different genotypes. This indicated that brucellosis in South Africa spreads within the herd on some farms, whereas the introduction of infected animals is the mode of transmission on other farms. Three B. abortus vaccine S19 strains isolated from tissue and aborted material were identical, even though they originated from different herds and regions of South Africa. This might be due to the incorrect vaccination of animals older than the recommended age of 4–8 months or might be a problem associated with vaccine production.

Download Full-text

Optimizing Sequencing Resources in Genotyped Livestock Populations Using Linear Programming

Frontiers in Genetics ◽

10.3389/fgene.2021.740340 ◽

2021 ◽

Vol 12 ◽

Author(s):

Hao Cheng ◽

Keyu Xu ◽

Jinghui Li ◽

Kuruvilla Joseph Abraham

Keyword(s):

Linear Programming ◽

Genome Sequence ◽

Sequence Data ◽

Low Cost ◽

Whole Genome Sequence ◽

Full Potential ◽

Whole Genome ◽

Efficient Allocation ◽

Nucleotide Polymorphisms ◽

Genome Sequence Data

Low-cost genome-wide single-nucleotide polymorphisms (SNPs) are routinely used in animal breeding programs. Compared to SNP arrays, the use of whole-genome sequence data generated by the next-generation sequencing technologies (NGS) has great potential in livestock populations. However, sequencing a large number of animals to exploit the full potential of whole-genome sequence data is not feasible. Thus, novel strategies are required for the allocation of sequencing resources in genotyped livestock populations such that the entire population can be imputed, maximizing the efficiency of whole genome sequencing budgets. We present two applications of linear programming for the efficient allocation of sequencing resources. The first application is to identify the minimum number of animals for sequencing subject to the criterion that each haplotype in the population is contained in at least one of the animals selected for sequencing. The second application is the selection of animals whose haplotypes include the largest possible proportion of common haplotypes present in the population, assuming a limited sequencing budget. Both applications are available in an open source program LPChoose. In both applications, LPChoose has similar or better performance than some other methods suggesting that linear programming methods offer great potential for the efficient allocation of sequencing resources. The utility of these methods can be increased through the development of improved heuristics.

Download Full-text

Mining for single nucleotide polymorphisms in pig genome sequence data

BMC Genomics ◽

10.1186/1471-2164-10-4 ◽

2009 ◽

Vol 10 (1) ◽

pp. 4 ◽

Cited By ~ 13

Author(s):

Hindrik HD Kerstens ◽

Sonja Kollers ◽

Arun Kommadath ◽

Marisol del Rosario ◽

Bert Dibbits ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Genome Sequence ◽

Sequence Data ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Sequence Data ◽

Pig Genome

Download Full-text

Optimizing Sequencing Resources in Genotyped Livestock Populations Using Linear Programming

10.1101/2020.06.29.179093 ◽

2020 ◽

Author(s):

Hao Cheng ◽

Keyu Xu ◽

Kuruvilla Joseph Abraham

Keyword(s):

Linear Programming ◽

Genome Sequence ◽

Sequence Data ◽

Low Cost ◽

Fixed Number ◽

Whole Genome Sequence ◽

Full Potential ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Genome Sequence Data

AbstractBackgroundLow-cost genome-wide single-nucleotide polymorphisms (SNPs) are routinely used in animal breeding programs. Compared to SNP arrays, the use of whole-genome sequence data generated by the next-generation sequencing technologies (NGS) has great potential in livestock populations. However, a large number of animals are required to be sequenced to exploit the full potential of whole-genome sequence data. Thus, novel strategies are desired to allocate sequencing resources in genotyped livestock populations such that the entire population can be sequenced or imputed efficiently.MethodsWe present two applications of linear programming models called LPChoose for sequencing resources allocation. The first application is to identify the minimum number of animals for sequencing while meeting the criteria that each haplotype in the population is contained in at least one of the animals selected for sequencing. The second is to sequence a fixed number of animals whose haplotypes include as large a proportion as possible of the haplotypes present in the population given a limited sequencing budget. In both cases, we assume that all animals have been haplotyped. We present results from approximation algorithms, and motivate the use of approximations through the correspondence of the problems we address with problems in computer science for which there are no known efficient algorithms.ResultsIn both applications LPChoose performed consistently better than some existing methods making similar assumptions.

Download Full-text

Whole-Genome Sequence of a Reemerging Listeria monocytogenes Serovar 1/2a Strain in Central Italy

Microbiology Resource Announcements ◽

10.1128/mra.01069-18 ◽

2018 ◽

Vol 7 (23) ◽

Cited By ~ 1

Author(s):

Massimiliano Orsini ◽

Marina Torresi ◽

Claudio Patavino ◽

Patrizia Centorame ◽

Antonio Rinaldi ◽

...

Keyword(s):

Listeria Monocytogenes ◽

Single Nucleotide Polymorphisms ◽

Genome Sequence ◽

Central Italy ◽

Whole Genome Sequence ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Content Type

We report the whole-genome sequence of a Listeria monocytogenes strain isolated from a child in central Italy. Interestingly, the sequence showed a difference of only 13 single-nucleotide polymorphisms (SNPs) from a strain responsible for a severe listeriosis outbreak that occurred between January 2015 and March 2016 in the same region.

Download Full-text

Faculty Opinions recommendation of Optimal algorithms for haplotype assembly from whole-genome sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13339986.14707085 ◽

2011 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Optimal Algorithms ◽

Genome Sequence Data ◽

Haplotype Assembly

Download Full-text

TIGER: inferring DNA replication timing from whole-genome sequence data

Bioinformatics ◽

10.1093/bioinformatics/btab166 ◽

2021 ◽

Cited By ~ 1

Author(s):

Amnon Koren ◽

Dashiell J Massey ◽

Alexa N Bracci

Keyword(s):

Dna Replication ◽

Genome Sequence ◽

Genomic Dna ◽

Sequence Data ◽

Replication Timing ◽

Whole Genome Sequence ◽

Supplementary Information ◽

Whole Genome ◽

Genome Sequence Data ◽

Dna Replication Timing

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

Whole genome sequence data of Bacillus australimaris strain B28A, isolated from Marine Water in India

Data in Brief ◽

10.1016/j.dib.2021.107240 ◽

2021 ◽

pp. 107240

Author(s):

Wael Ali Mohammed Hadi ◽

Boby T Edwin ◽

A Jayakumaran Nair

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Marine Water ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Data

Download Full-text