Mining for single nucleotide polymorphisms in pig genome sequence data

The botanical genus Digitalis is equal parts colorful, toxic, and medicinal, and its bioactive compounds have a long history of therapeutic use. However, with an extremely narrow therapeutic range, even trace amounts of Digitalis can cause adverse effects. Using chemical methods, the United States Food and Drug Administration traced a 1997 case of Digitalis toxicity to a shipment of Plantago (a common ingredient in dietary supplements marketed to improve digestion) contaminated with Digitalis lanata. With increased accessibility to next generation sequencing technology, here we ask whether this case could have been cracked rapidly using shallow genome sequencing strategies (e.g., genome skims). Using a modified implementation of the Site Identification from Short Read Sequences (SISRS) bioinformatics pipeline with whole-genome sequence data, we generated over 2 M genus-level single nucleotide polymorphisms in addition to species-informative single nucleotide polymorphisms. We simulated dietary supplement contamination by spiking low quantities (0–10%) of Digitalis whole-genome sequence data into a background of commonly used ingredients in products marketed for “digestive cleansing” and reliably detected Digitalis at the genus level while also discriminating between Digitalis species. This work serves as a roadmap for the development of novel DNA-based assays to quickly and reliably detect the presence of toxic species such as Digitalis in food products or dietary supplements using genomic methods and highlights the power of harnessing the entire genome to identify botanical species.

Download Full-text

Single-Nucleotide Polymorphisms in the Whole-Genome Sequence Data of Shiga Toxin-Producing Escherichia coli O157:H7/H- Strains by Cultivation

Current Microbiology ◽

10.1007/s00284-017-1208-z ◽

2017 ◽

Vol 74 (4) ◽

pp. 425-430 ◽

Cited By ~ 3

Author(s):

Eiji Yokoyama ◽

Shinichiro Hirai ◽

Taichiro Ishige ◽

Satoshi Murakami

Keyword(s):

Escherichia Coli ◽

Single Nucleotide Polymorphisms ◽

Genome Sequence ◽

Shiga Toxin ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Escherichia Coli O157 ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

Download Full-text

Discovery of single‐nucleotide polymorphisms (SNPs) in the uncharacterized genome of the ascomycete Ophiognomonia clavigignenti‐juglandacearum from 454 sequence data

Molecular Ecology Resources ◽

10.1111/j.1755-0998.2011.02998.x ◽

2011 ◽

Vol 11 (4) ◽

pp. 693-702 ◽

Cited By ~ 16

Author(s):

K. D. BRODERS ◽

K. E. WOESTE ◽

P. J. SanMIGUEL ◽

R. P. WESTERMAN ◽

G. J. BOLAND

Keyword(s):

Single Nucleotide Polymorphisms ◽

Sequence Data ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

Download Full-text

Complete Genome Sequence of Geobacillus thermoglucosidasius NCIMB 11955, the Progenitor of a Bioethanol Production Strain

Genome Announcements ◽

10.1128/genomea.01065-16 ◽

2016 ◽

Vol 4 (5) ◽

Cited By ~ 3

Author(s):

Lili Sheng ◽

Ying Zhang ◽

Nigel P. Minton

Keyword(s):

Single Nucleotide Polymorphisms ◽

Genome Sequence ◽

Complete Genome Sequence ◽

Complete Genome ◽

Bioethanol Production ◽

Nucleotide Polymorphisms ◽

Industrial Strain ◽

Single Nucleotide ◽

Geobacillus Thermoglucosidasius ◽

Production Strain

The industrially important thermophile Geobacillus thermoglucosidasius has the potential to produce chemicals and fuels from biomass-derived sugar feedstocks. Here, we present the genome sequence of strain NCIMB 11955, the progenitor of an ethanologenic industrial strain, revealing 11 single-nucleotide polymorphisms and 2 indels compared to strain DSM 2542 and two novel plasmids.

Download Full-text

Whole-Genome Sequencing of a Year-Round Fruiting Jackfruit Variety Reveals Very High Single Nucleotide Polymorphisms in Inter-Genic Regions

10.21203/rs.3.rs-1176760/v1 ◽

2021 ◽

Author(s):

Tofazzal Islam ◽

Nadia Afroz ◽

ChuShin Koh ◽

M. Nazmul Haque ◽

Md. Jillur Rahman ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Genome Sequence ◽

De Novo ◽

Agricultural Research ◽

Gc Content ◽

Whole Genome Sequence ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Tropical Fruit

Abstract Background Jackfruit (Artocarpus heterophyllus Lam.) is a tropical and sub-tropical fruit tree distributed in Asia, Africa, and South America. It is the national fruit of Bangladesh and produces fruit in the summer season only. However, a year-round jackfruit variety, BARI Kanthal-3 developed by Bangladesh Agricultural Research Institute (BARI) provides fruits from September to June. This study aimed to evaluate the agronomic performance of BARI Kanthal-3 and to generate a draft whole genome sequence to obtain molecular insights of this important unique variety. Results Number of fruits, average each fruit weight, fruit yield per plant, edible portion in fruit and ß carotene content of BARI Kanthal-3 (n = 5) were 422/plant/year, 5.60 kg, 236.32 kg/year, 53.5% and 3614 mg/100g, respectively. During de novo assembly, 817.7 Mb of the BARI Kanthal-3 genome was scaffolded. However, in the reference-guided genome assembly, almost 843 Mb of the BARI Kanthal-3 genome was scaffolded. Through BUSCO assessment, 97.2% of the core genes were represented in the assembly with 1.3% and 1.5% either fragmented or missing, respectively. By comparing the single copy orthologues (SCOs) in three closely and one distantly related species of BARI Kanthal-3, 706 SCOs were found to be shared across the genomes of the five species. The phylogenetic analysis of the shared SCOs showed that A. heterophyllus is the closest species to BARI Kantal-3. The estimated genome size of BARI Kanthal-3 was 1.04 giga base pairs (Gbp) with a heterozygosity rate of 1.62%. The estimated GC content was 34.10%. Variant analysis revealed that BARI Kanthal-3 includes 5.7 M (35%) and 10.4 M (65%) simple and heterozygous single nucleotide polymorphisms (SNPs), and about 90% of all these polymorphisms are located in inter-genic regions. Conclusion The whole-genome sequence of A. heterophyllus cv. BARI Kanthal-3 reveals extremely high single nucleotide polymorphisms in inter-genic regions. The findings of this study will help better understanding the evolution, domestication, phylogenetic relationships, year-round fruiting and the markers development for molecular breeding of this highly nutritious fruit crop.

Download Full-text

Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

PLoS ONE ◽

10.1371/journal.pone.0131925 ◽

2015 ◽

Vol 10 (7) ◽

pp. e0131925 ◽

Cited By ~ 10

Author(s):

Francesca Bertolini ◽

Concetta Scimone ◽

Claudia Geraci ◽

Giuseppina Schiavo ◽

Valerio Joe Utzeri ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Sequence Data ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Single Nucleotide ◽

Comparative Sequence ◽

Horse Genome ◽

Equus Asinus

Download Full-text

Whole Genome Sequence Analysis of Brucella abortus Isolates from Various Regions of South Africa

Microorganisms ◽

10.3390/microorganisms9030570 ◽

2021 ◽

Vol 9 (3) ◽

pp. 570

Author(s):

Maphuti Betty Ledwaba ◽

Barbara Akorfa Glover ◽

Itumeleng Matle ◽

Giuseppe Profiti ◽

Pier Luigi Martelli ◽

...

Keyword(s):

South Africa ◽

Single Nucleotide Polymorphisms ◽

South African ◽

Genome Sequence ◽

Brucella Abortus ◽

Whole Genome Sequence ◽

Snp Analysis ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

The availability of whole genome sequences in public databases permits genome-wide comparative studies of various bacterial species. Whole genome sequence-single nucleotide polymorphisms (WGS-SNP) analysis has been used in recent studies and allows the discrimination of various Brucella species and strains. In the present study, 13 Brucella spp. strains from cattle of various locations in provinces of South Africa were typed and discriminated. WGS-SNP analysis indicated a maximum pairwise distance ranging from 4 to 77 single nucleotide polymorphisms (SNPs) between the South African Brucella abortus virulent field strains. Moreover, it was shown that the South African B. abortus strains grouped closely to B. abortus strains from Mozambique and Zimbabwe, as well as other Eurasian countries, such as Portugal and India. WGS-SNP analysis of South African B. abortus strains demonstrated that the same genotype circulated in one farm (Farm 1), whereas another farm (Farm 2) in the same province had two different genotypes. This indicated that brucellosis in South Africa spreads within the herd on some farms, whereas the introduction of infected animals is the mode of transmission on other farms. Three B. abortus vaccine S19 strains isolated from tissue and aborted material were identical, even though they originated from different herds and regions of South Africa. This might be due to the incorrect vaccination of animals older than the recommended age of 4–8 months or might be a problem associated with vaccine production.

Download Full-text

Optimizing Sequencing Resources in Genotyped Livestock Populations Using Linear Programming

Frontiers in Genetics ◽

10.3389/fgene.2021.740340 ◽

2021 ◽

Vol 12 ◽

Author(s):

Hao Cheng ◽

Keyu Xu ◽

Jinghui Li ◽

Kuruvilla Joseph Abraham

Keyword(s):

Linear Programming ◽

Genome Sequence ◽

Sequence Data ◽

Low Cost ◽

Whole Genome Sequence ◽

Full Potential ◽

Whole Genome ◽

Efficient Allocation ◽

Nucleotide Polymorphisms ◽

Genome Sequence Data

Low-cost genome-wide single-nucleotide polymorphisms (SNPs) are routinely used in animal breeding programs. Compared to SNP arrays, the use of whole-genome sequence data generated by the next-generation sequencing technologies (NGS) has great potential in livestock populations. However, sequencing a large number of animals to exploit the full potential of whole-genome sequence data is not feasible. Thus, novel strategies are required for the allocation of sequencing resources in genotyped livestock populations such that the entire population can be imputed, maximizing the efficiency of whole genome sequencing budgets. We present two applications of linear programming for the efficient allocation of sequencing resources. The first application is to identify the minimum number of animals for sequencing subject to the criterion that each haplotype in the population is contained in at least one of the animals selected for sequencing. The second application is the selection of animals whose haplotypes include the largest possible proportion of common haplotypes present in the population, assuming a limited sequencing budget. Both applications are available in an open source program LPChoose. In both applications, LPChoose has similar or better performance than some other methods suggesting that linear programming methods offer great potential for the efficient allocation of sequencing resources. The utility of these methods can be increased through the development of improved heuristics.

Download Full-text

The Complete Mitochondrial Genome Sequence and Characterization of Single-Nucleotide Polymorphisms in the Control Region of the Asian Seabass (Lates calcarifer)

Marine Biotechnology ◽

10.1007/s10126-005-5051-z ◽

2006 ◽

Vol 8 (1) ◽

pp. 71-79 ◽

Cited By ~ 37

Author(s):

G. Lin ◽

L. C. Lo ◽

Z. Y. Zhu ◽

F. Feng ◽

R. Chou ◽

...

Keyword(s):

Mitochondrial Genome ◽

Single Nucleotide Polymorphisms ◽

Control Region ◽

Genome Sequence ◽

Complete Mitochondrial Genome ◽

Nucleotide Polymorphisms ◽

Lates Calcarifer ◽

Single Nucleotide ◽

Asian Seabass

Download Full-text

An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

Comparative and Functional Genomics ◽

10.1155/2007/35604 ◽

2007 ◽

Vol 2007 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

B. Jayashree ◽

Manindra S. Hanspal ◽

Rajgopal Srinivasan ◽

R. Vigneshwaran ◽

Rajeev K. Varshney ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Open Source ◽

Open Source Software ◽

Large Scale ◽

Sequence Data ◽

Snp Genotyping ◽

Model Organisms ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Web Interfaces

The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.

Download Full-text