scholarly journals Utilizing Big Data to Identify Tiny Toxic Components: Digitalis

Foods ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 1794
Author(s):  
Elizabeth Sage Hunter ◽  
Robert Literman ◽  
Sara M. Handy

The botanical genus Digitalis is equal parts colorful, toxic, and medicinal, and its bioactive compounds have a long history of therapeutic use. However, with an extremely narrow therapeutic range, even trace amounts of Digitalis can cause adverse effects. Using chemical methods, the United States Food and Drug Administration traced a 1997 case of Digitalis toxicity to a shipment of Plantago (a common ingredient in dietary supplements marketed to improve digestion) contaminated with Digitalis lanata. With increased accessibility to next generation sequencing technology, here we ask whether this case could have been cracked rapidly using shallow genome sequencing strategies (e.g., genome skims). Using a modified implementation of the Site Identification from Short Read Sequences (SISRS) bioinformatics pipeline with whole-genome sequence data, we generated over 2 M genus-level single nucleotide polymorphisms in addition to species-informative single nucleotide polymorphisms. We simulated dietary supplement contamination by spiking low quantities (0–10%) of Digitalis whole-genome sequence data into a background of commonly used ingredients in products marketed for “digestive cleansing” and reliably detected Digitalis at the genus level while also discriminating between Digitalis species. This work serves as a roadmap for the development of novel DNA-based assays to quickly and reliably detect the presence of toxic species such as Digitalis in food products or dietary supplements using genomic methods and highlights the power of harnessing the entire genome to identify botanical species.

2021 ◽  
Author(s):  
Tofazzal Islam ◽  
Nadia Afroz ◽  
ChuShin Koh ◽  
M. Nazmul Haque ◽  
Md. Jillur Rahman ◽  
...  

Abstract Background Jackfruit (Artocarpus heterophyllus Lam.) is a tropical and sub-tropical fruit tree distributed in Asia, Africa, and South America. It is the national fruit of Bangladesh and produces fruit in the summer season only. However, a year-round jackfruit variety, BARI Kanthal-3 developed by Bangladesh Agricultural Research Institute (BARI) provides fruits from September to June. This study aimed to evaluate the agronomic performance of BARI Kanthal-3 and to generate a draft whole genome sequence to obtain molecular insights of this important unique variety. Results Number of fruits, average each fruit weight, fruit yield per plant, edible portion in fruit and ß carotene content of BARI Kanthal-3 (n = 5) were 422/plant/year, 5.60 kg, 236.32 kg/year, 53.5% and 3614 mg/100g, respectively. During de novo assembly, 817.7 Mb of the BARI Kanthal-3 genome was scaffolded. However, in the reference-guided genome assembly, almost 843 Mb of the BARI Kanthal-3 genome was scaffolded. Through BUSCO assessment, 97.2% of the core genes were represented in the assembly with 1.3% and 1.5% either fragmented or missing, respectively. By comparing the single copy orthologues (SCOs) in three closely and one distantly related species of BARI Kanthal-3, 706 SCOs were found to be shared across the genomes of the five species. The phylogenetic analysis of the shared SCOs showed that A. heterophyllus is the closest species to BARI Kantal-3. The estimated genome size of BARI Kanthal-3 was 1.04 giga base pairs (Gbp) with a heterozygosity rate of 1.62%. The estimated GC content was 34.10%. Variant analysis revealed that BARI Kanthal-3 includes 5.7 M (35%) and 10.4 M (65%) simple and heterozygous single nucleotide polymorphisms (SNPs), and about 90% of all these polymorphisms are located in inter-genic regions. Conclusion The whole-genome sequence of A. heterophyllus cv. BARI Kanthal-3 reveals extremely high single nucleotide polymorphisms in inter-genic regions. The findings of this study will help better understanding the evolution, domestication, phylogenetic relationships, year-round fruiting and the markers development for molecular breeding of this highly nutritious fruit crop.


2021 ◽  
Vol 9 (3) ◽  
pp. 570
Author(s):  
Maphuti Betty Ledwaba ◽  
Barbara Akorfa Glover ◽  
Itumeleng Matle ◽  
Giuseppe Profiti ◽  
Pier Luigi Martelli ◽  
...  

The availability of whole genome sequences in public databases permits genome-wide comparative studies of various bacterial species. Whole genome sequence-single nucleotide polymorphisms (WGS-SNP) analysis has been used in recent studies and allows the discrimination of various Brucella species and strains. In the present study, 13 Brucella spp. strains from cattle of various locations in provinces of South Africa were typed and discriminated. WGS-SNP analysis indicated a maximum pairwise distance ranging from 4 to 77 single nucleotide polymorphisms (SNPs) between the South African Brucella abortus virulent field strains. Moreover, it was shown that the South African B. abortus strains grouped closely to B. abortus strains from Mozambique and Zimbabwe, as well as other Eurasian countries, such as Portugal and India. WGS-SNP analysis of South African B. abortus strains demonstrated that the same genotype circulated in one farm (Farm 1), whereas another farm (Farm 2) in the same province had two different genotypes. This indicated that brucellosis in South Africa spreads within the herd on some farms, whereas the introduction of infected animals is the mode of transmission on other farms. Three B. abortus vaccine S19 strains isolated from tissue and aborted material were identical, even though they originated from different herds and regions of South Africa. This might be due to the incorrect vaccination of animals older than the recommended age of 4–8 months or might be a problem associated with vaccine production.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hao Cheng ◽  
Keyu Xu ◽  
Jinghui Li ◽  
Kuruvilla Joseph Abraham

Low-cost genome-wide single-nucleotide polymorphisms (SNPs) are routinely used in animal breeding programs. Compared to SNP arrays, the use of whole-genome sequence data generated by the next-generation sequencing technologies (NGS) has great potential in livestock populations. However, sequencing a large number of animals to exploit the full potential of whole-genome sequence data is not feasible. Thus, novel strategies are required for the allocation of sequencing resources in genotyped livestock populations such that the entire population can be imputed, maximizing the efficiency of whole genome sequencing budgets. We present two applications of linear programming for the efficient allocation of sequencing resources. The first application is to identify the minimum number of animals for sequencing subject to the criterion that each haplotype in the population is contained in at least one of the animals selected for sequencing. The second application is the selection of animals whose haplotypes include the largest possible proportion of common haplotypes present in the population, assuming a limited sequencing budget. Both applications are available in an open source program LPChoose. In both applications, LPChoose has similar or better performance than some other methods suggesting that linear programming methods offer great potential for the efficient allocation of sequencing resources. The utility of these methods can be increased through the development of improved heuristics.


BMC Genomics ◽  
2009 ◽  
Vol 10 (1) ◽  
pp. 4 ◽  
Author(s):  
Hindrik HD Kerstens ◽  
Sonja Kollers ◽  
Arun Kommadath ◽  
Marisol del Rosario ◽  
Bert Dibbits ◽  
...  

2020 ◽  
Author(s):  
Hao Cheng ◽  
Keyu Xu ◽  
Kuruvilla Joseph Abraham

AbstractBackgroundLow-cost genome-wide single-nucleotide polymorphisms (SNPs) are routinely used in animal breeding programs. Compared to SNP arrays, the use of whole-genome sequence data generated by the next-generation sequencing technologies (NGS) has great potential in livestock populations. However, a large number of animals are required to be sequenced to exploit the full potential of whole-genome sequence data. Thus, novel strategies are desired to allocate sequencing resources in genotyped livestock populations such that the entire population can be sequenced or imputed efficiently.MethodsWe present two applications of linear programming models called LPChoose for sequencing resources allocation. The first application is to identify the minimum number of animals for sequencing while meeting the criteria that each haplotype in the population is contained in at least one of the animals selected for sequencing. The second is to sequence a fixed number of animals whose haplotypes include as large a proportion as possible of the haplotypes present in the population given a limited sequencing budget. In both cases, we assume that all animals have been haplotyped. We present results from approximation algorithms, and motivate the use of approximations through the correspondence of the problems we address with problems in computer science for which there are no known efficient algorithms.ResultsIn both applications LPChoose performed consistently better than some existing methods making similar assumptions.


2018 ◽  
Vol 7 (23) ◽  
Author(s):  
Massimiliano Orsini ◽  
Marina Torresi ◽  
Claudio Patavino ◽  
Patrizia Centorame ◽  
Antonio Rinaldi ◽  
...  

We report the whole-genome sequence of a Listeria monocytogenes strain isolated from a child in central Italy. Interestingly, the sequence showed a difference of only 13 single-nucleotide polymorphisms (SNPs) from a strain responsible for a severe listeriosis outbreak that occurred between January 2015 and March 2016 in the same region.


Author(s):  
Amnon Koren ◽  
Dashiell J Massey ◽  
Alexa N Bracci

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online


Sign in / Sign up

Export Citation Format

Share Document