Automated genotyping of microsatellite loci from feces with high throughput sequences

Microsatellite markers remain an important tool for ecological and evolutionary research, but are unavailable for many non-model organisms. One such organism with rare ecological and evolutionary features is the epizoic barnacleChelonibia testudinaria(Linnaeus, 1758).Chelonibia testudinariaappears to be a host generalist, and has an unusual sexual system, androdioecy. Genetic studies on host specificity and mating behavior are impeded by the lack of fine-scale, highly variable markers, such as microsatellite markers. In the present study, we discovered thousands of new microsatellite loci from next-generation sequencing data, and characterized 12 loci thoroughly. We conclude that 11 of these loci will be useful markers in future ecological and evolutionary studies onC. testudinaria.

Download Full-text

Benchmarking Variant Identification Tools for Plant Diversity Discovery

10.21203/rs.2.9666/v2 ◽

2019 ◽

Author(s):

Xing Wu ◽

Christopher Heffelfinger ◽

Hongyu Zhao ◽

Stephen L. Dellaporta

Keyword(s):

Next Generation Sequencing ◽

High Throughput Sequencing ◽

Crop Improvement ◽

Variant Calling ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Variant Discovery ◽

Variant Filtering ◽

Generation Sequencing

Abstract Background The ability to accurately and comprehensively identify genomic variations is critical for plant studies utilizing high-throughput sequencing. Most bioinformatics tools for processing next-generation sequencing data were originally developed and tested in human studies, raising questions as to their efficacy for plant research. A detailed evaluation of the entire variant calling pipeline, including alignment, variant calling, variant filtering, and imputation was performed on different programs using both simulated and real plant genomic datasets. Results A comparison of SOAP2, Bowtie2, and BWA-MEM found that BWA-MEM was consistently able to align the most reads with high accuracy, whereas Bowtie2 had the highest overall accuracy. Comparative results of GATK HaplotypCaller versus SAMtools mpileup indicated that the choice of variant caller affected precision and recall differentially depending on the levels of diversity, sequence coverage and genome complexity. A cross-reference experiment of S. lycopersicum and S. pennellii reference genomes revealed the inadequacy of single reference genome for variant discovery that includes distantly-related plant individuals. Machine-learning-based variant filtering strategy outperformed the traditional hard-cutoff strategy resulting in higher number of true positive variants and fewer false positive variants. A 2-step imputation method, which utilized a set of high-confidence SNPs as the reference panel, showed up to 60% higher accuracy than direct LD-based imputation. Conclusions Programs in the variant discovery pipeline have different performance on plant genomic dataset. Choice of the programs is subjected to the goal of the study and available resources. This study serves as an important guiding information for plant biologists utilizing next-generation sequencing data for diversity characterization and crop improvement.

Download Full-text

MVSC: A Multi-variation Simulator of Cancer Genome

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323666200317121136 ◽

2020 ◽

Vol 23 (4) ◽

pp. 326-333

Author(s):

Ning Li ◽

Jialiang Yang ◽

Wen Zhu ◽

Ying Liang

Keyword(s):

Next Generation Sequencing ◽

Genome Structure ◽

Cancer Genome ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Genomic Variants ◽

Software Packages ◽

Tumor Genome ◽

Generation Sequencing

Background: Many forms of variations exist in the genome, which are the main causes of individual phenotypic differences. The detection of variants, especially those located in the tumor genome, still faces many challenges due to the complexity of the genome structure. Thus, the performance assessment of variation detection tools using next-generation sequencing platforms is urgently needed. Method: We have created a software package called the Multi-Variation Simulator of Cancer genomes (MVSC) to simulate common genomic variants, including single nucleotide polymorphisms, small insertion and deletion polymorphisms, and structural variations (SVs), which are analogous to human somatically acquired variations. Three sets of variations embedded in genomic sequences in different periods were dynamically and sequentially simulated one by one. Results: In cancer genome simulation, complex SVs are important because this type of variation is characteristic of the tumor genome structure. Overlapping variations of different sizes can also coexist in the same genome regions, adding to the complexity of cancer genome architecture. Our results show that MVSC can efficiently simulate a variety of genomic variants that cannot be simulated by existing software packages. Conclusion: The MVSC-simulated variants can be used to assess the performance of existing tools designed to detect SVs in next-generation sequencing data, and we also find that MVSC is memory and time-efficient compared with similar software packages.

Download Full-text

DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

DNA Research ◽

10.1093/dnares/dst017 ◽

2013 ◽

Vol 20 (4) ◽

pp. 383-390 ◽

Cited By ~ 51

Author(s):

H. Nagasaki ◽

T. Mochizuki ◽

Y. Kodama ◽

S. Saruhashi ◽

S. Morizaki ◽

...

Keyword(s):

Cloud Computing ◽

Next Generation Sequencing ◽

High Throughput ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

High Throughput Analysis ◽

Annotation Pipeline ◽

Throughput Analysis ◽

Generation Sequencing

Download Full-text

Microsatellite loci discovery from next-generation sequencing data and marker characterization in the epizoic barnacle Chelonibia testudinaria (Linnaeus, 1798)

10.7287/peerj.preprints.1715v3 ◽

2016 ◽

Author(s):

Christine Ewers-Saucedo ◽

John D Zardus ◽

John P Wares

Keyword(s):

Next Generation Sequencing ◽

Microsatellite Loci ◽

Next Generation Sequencing Data ◽

Model Organisms ◽

Next Generation ◽

Sequencing Data ◽

Genetic Studies ◽

Evolutionary Features ◽

Evolutionary Studies ◽

Generation Sequencing

Microsatellite markers remain an important tool for ecological and evolutionary research, but are unavailable for many non-model organisms. One such organism with rare ecological and evolutionary features is the epizoic barnacle Chelonibia testudinaria (Linnaeus, 1758). Chelonibia testudinaria appears to be a host generalist, and has a unusual sexual system, androdioecy. Genetic studies on host specificity and mating behavior are impeded by the lack of fine-scale, highly variable markers. In the present study, we discovered thousands of new microsatellite loci from next-generation sequencing data, and characterized 12 loci thoroughly. We conclude that 11 of these loci will be useful markers in future ecological and evolutionary studies on C. testudinaria.

Download Full-text

Benchmarking variant identification tools for plant diversity discovery

BMC Genomics ◽

10.1186/s12864-019-6057-7 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 7

Author(s):

Xing Wu ◽

Christopher Heffelfinger ◽

Hongyu Zhao ◽

Stephen L. Dellaporta

Keyword(s):

Next Generation Sequencing ◽

High Throughput Sequencing ◽

Crop Improvement ◽

Variant Calling ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Variant Discovery ◽

Variant Filtering ◽

Generation Sequencing

Abstract Background The ability to accurately and comprehensively identify genomic variations is critical for plant studies utilizing high-throughput sequencing. Most bioinformatics tools for processing next-generation sequencing data were originally developed and tested in human studies, raising questions as to their efficacy for plant research. A detailed evaluation of the entire variant calling pipeline, including alignment, variant calling, variant filtering, and imputation was performed on different programs using both simulated and real plant genomic datasets. Results A comparison of SOAP2, Bowtie2, and BWA-MEM found that BWA-MEM was consistently able to align the most reads with high accuracy, whereas Bowtie2 had the highest overall accuracy. Comparative results of GATK HaplotypCaller versus SAMtools mpileup indicated that the choice of variant caller affected precision and recall differentially depending on the levels of diversity, sequence coverage and genome complexity. A cross-reference experiment of S. lycopersicum and S. pennellii reference genomes revealed the inadequacy of single reference genome for variant discovery that includes distantly-related plant individuals. Machine-learning-based variant filtering strategy outperformed the traditional hard-cutoff strategy resulting in higher number of true positive variants and fewer false positive variants. A 2-step imputation method, which utilized a set of high-confidence SNPs as the reference panel, showed up to 60% higher accuracy than direct LD-based imputation. Conclusions Programs in the variant discovery pipeline have different performance on plant genomic dataset. Choice of the programs is subjected to the goal of the study and available resources. This study serves as an important guiding information for plant biologists utilizing next-generation sequencing data for diversity characterization and crop improvement.

Download Full-text