scholarly journals Fast sequence-based microsatellite genotyping development workflow

2019 ◽  
Author(s):  
Olivier Lepais ◽  
Emilie Chancerel ◽  
Christophe Boury ◽  
Franck Salin ◽  
Aurélie Manicki ◽  
...  

AbstractApplication of high-throughput sequencing technologies to microsatellite genotyping (SSRseq) has been shown to remove many of the limitations of electrophoresis-based methods and to refine inference of population genetic diversity and structure. We present here a streamlined SSRseq development workflow that includes microsatellite development, multiplexed marker amplification and sequencing, and automated bioinformatics data analysis. We illustrate its application to five groups of species across phyla (fungi, plant, insect and fish) with different levels of genomic resource availability. We found that relying on previously developed microsatellite assay is not optimal and leads to a resulting low number of reliable locus being genotyped. In contrast, de novo ad hoc primer designs gives highly multiplexed microsatellite assays that can be sequenced to produce high quality genotypes for 20 to 40 loci. We highlight critical upfront development factors to consider for effective SSRseq setup in a wide range of situations. Sequence analysis accounting for all linked polymorphisms along the sequence, quickly generates a powerful multi-allelic haplotype-based genotypic dataset, calling to new theoretical and analytical frameworks to extract more information from multi-nucleotide polymorphism marker systems.

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9085 ◽  
Author(s):  
Olivier Lepais ◽  
Emilie Chancerel ◽  
Christophe Boury ◽  
Franck Salin ◽  
Aurélie Manicki ◽  
...  

Application of high-throughput sequencing technologies to microsatellite genotyping (SSRseq) has been shown to remove many of the limitations of electrophoresis-based methods and to refine inference of population genetic diversity and structure. We present here a streamlined SSRseq development workflow that includes microsatellite development, multiplexed marker amplification and sequencing, and automated bioinformatics data analysis. We illustrate its application to five groups of species across phyla (fungi, plant, insect and fish) with different levels of genomic resource availability. We found that relying on previously developed microsatellite assay is not optimal and leads to a resulting low number of reliable locus being genotyped. In contrast, de novo ad hoc primer designs gives highly multiplexed microsatellite assays that can be sequenced to produce high quality genotypes for 20–40 loci. We highlight critical upfront development factors to consider for effective SSRseq setup in a wide range of situations. Sequence analysis accounting for all linked polymorphisms along the sequence quickly generates a powerful multi-allelic haplotype-based genotypic dataset, calling to new theoretical and analytical frameworks to extract more information from multi-nucleotide polymorphism marker systems.


2018 ◽  
Author(s):  
Adrian Fritz ◽  
Peter Hofmann ◽  
Stephan Majda ◽  
Eik Dahms ◽  
Johannes Dröge ◽  
...  

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Here, we describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series and differential abundance studies, includes real and simulated strain-level diversity, and generates second and third generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT and metaSPAdes, on several thousand small data sets generated with CAMISIM. CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with truth standards for method evaluation. All data sets and the software are freely available at: https://github.com/CAMI-challenge/CAMISIM


Author(s):  
Yuansheng Liu ◽  
Xiaocai Zhang ◽  
Quan Zou ◽  
Xiangxiang Zeng

Abstract Summary Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand. Availability and implementation https://github.com/yuansliu/minirmd. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Jasper Götting ◽  
Katrin Lazar ◽  
Nicolás M. Suárez ◽  
Lars Steinbrück ◽  
Tabea Rabe ◽  
...  

Reactivation and shedding of human cytomegalovirus (HCMV) in breast milk during lactation is highly frequent in HCMV-seropositive mothers. This represents a key transmission route for postnatal HCMV infection and can lead to severe disease in preterm neonates. Little is known about HCMV strain composition or longitudinal intrahost viral population dynamics in breast milk from immunocompetent women. We performed HCMV-specific target enrichment and high-throughput sequencing of 38 breast milk samples obtained in Germany between days 10 and 60 postpartum from 15 mothers with HCMV DNA lactia, and assembled HCMV consensus sequences de novo. The genotype distribution and number of HCMV strains present in each sample were determined by quantifying genotype-specific sequence motifs in 12 hypervariable viral genes, revealing a wide range of genotypes (82/109) for these genes in the cohort and a unique, longitudinally stable strain composition in each mother. Reactivation of up to three distinct HCMV strains was detected in 8/15 of mothers, indicating that a representative subset of the woman’s HCMV reservoir might be locally reactivated early during lactation. As described previously, nucleotide diversity of samples with multiple strains was much higher than that of samples with single strains. Breast milk as a main source of postnatal mother-to-infant transmission may serve as a repository for viral diversity and thus play an essential role in the natural epidemiology of HCMV.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1229
Author(s):  
David Salgado ◽  
Irina M. Armean ◽  
Michael Baudis ◽  
Sergi Beltran ◽  
Salvador Capella-Gutierrez ◽  
...  

Copy number variations (CNVs) are major causative contributors both in the genesis of genetic diseases and human neoplasias. While “High-Throughput” sequencing technologies are increasingly becoming the primary choice for genomic screening analysis, their ability to efficiently detect CNVs is still heterogeneous and remains to be developed. The aim of this white paper is to provide a guiding framework for the future contributions of ELIXIR’s recently established human CNV Community, with implications beyond human disease diagnostics and population genomics. This white paper is the direct result of a strategy meeting that took place in September 2018 in Hinxton (UK) and involved representatives of 11 ELIXIR Nodes. The meeting led to the definition of priority objectives and tasks, to address a wide range of CNV-related challenges ranging from detection and interpretation to sharing and training. Here, we provide suggestions on how to align these tasks within the ELIXIR Platforms strategy, and on how to frame the activities of this new ELIXIR Community in the international context.


2015 ◽  
Author(s):  
Bong-Hyun Kim ◽  
Jiali Zhuang ◽  
Jie Wang ◽  
Zhiping Weng

Summary: High-throughput sequencing technologies such as ChIP-seq have deepened our understanding in many biological processes. De novo motif search is one of the key downstream computational analysis following the ChIP-seq experiments and several algorithms have been proposed for this purpose. However, most web-based systems do not perform independent filtering or enrichment analyses to ensure the quality of the discovered motifs. Here, we developed a web server Factorbook Motif Pipeline based on an algorithm used in analyzing ENCODE consortium ChIP-seq datasets. It performs comprehensive analysis on the set of peaks detected from a ChIP-seq experiments: (i) de novo motif discovery; (ii) independent composition and bias analyses and (iii) matching to the annotated motifs. The statistical tests employed in our pipeline provide a reliable measure of confidence as to how significant are the motifs reported in the discovery step. Availability: Factorbook Motif Pipeline source code is accessible through the following URL. https://github.com/joshuabhk/factorbook-motif-pipeline


Marine Drugs ◽  
2019 ◽  
Vol 17 (8) ◽  
pp. 466 ◽  
Author(s):  
Ronghua Li ◽  
Michaël Bekaert ◽  
Luning Wu ◽  
Changkao Mu ◽  
Weiwei Song ◽  
...  

The marine gastropod Hemifusus tuba is served as a luxury food in Asian countries and used in traditional Chinese medicine to treat lumbago and deafness. The lack of genomic data on H. tuba is a barrier to aquaculture development and functional characteristics of potential bioactive molecules are poorly understood. In the present study, we used high-throughput sequencing technologies to generate the first transcriptomic database of H. tuba. A total of 41 unique conopeptides were retrieved from 44 unigenes, containing 6-cysteine frameworks belonging to four superfamilies. Duplication of mature regions and alternative splicing were also found in some of the conopeptides, and the de novo assembly identified a total of 76,306 transcripts with an average length of 824.6 nt, of which including 75,620 (99.1%) were annotated. In addition, simple sequence repeats (SSRs) detection identified 14,000 unigenes containing 20,735 SSRs, among which, 23 polymorphic SSRs were screened. Thirteen of these markers could be amplified in Hemifusus ternatanus and seven in Rapana venosa. This study provides reports of conopeptide genes in Buccinidae for the first time as well as genomic resources for further drug development, gene discovery and population resource studies of this species.


2019 ◽  
Vol 5 (2) ◽  
pp. 38 ◽  
Author(s):  
Yelyzaveta Shlyakhtina ◽  
Katherine L. Moran ◽  
Maximiliano M. Portal

During the last decade, and mainly primed by major developments in high-throughput sequencing technologies, the catalogue of RNA molecules harbouring regulatory functions has increased at a steady pace. Current evidence indicates that hundreds of mammalian RNAs have regulatory roles at several levels, including transcription, translation/post-translation, chromatin structure, and nuclear architecture, thus suggesting that RNA molecules are indeed mighty controllers in the flow of biological information. Therefore, it is logical to suggest that there must exist a series of molecular systems that safeguard the faithful inheritance of RNA content throughout cell division and that those mechanisms must be tightly controlled to ensure the successful segregation of key molecules to the progeny. Interestingly, whilst a handful of integral components of mammalian cells seem to follow a general pattern of asymmetric inheritance throughout division, the fate of RNA molecules largely remains a mystery. Herein, we will discuss current concepts of asymmetric inheritance in a wide range of systems, including prions, proteins, and finally RNA molecules, to assess overall the biological impact of RNA inheritance in cellular plasticity and evolutionary fitness.


2018 ◽  
Vol 35 (12) ◽  
pp. 2066-2074 ◽  
Author(s):  
Yuansheng Liu ◽  
Zuguo Yu ◽  
Marcel E Dinger ◽  
Jinyan Li

Abstract Motivation Advanced high-throughput sequencing technologies have produced massive amount of reads data, and algorithms have been specially designed to contract the size of these datasets for efficient storage and transmission. Reordering reads with regard to their positions in de novo assembled contigs or in explicit reference sequences has been proven to be one of the most effective reads compression approach. As there is usually no good prior knowledge about the reference sequence, current focus is on the novel construction of de novo assembled contigs. Results We introduce a new de novo compression algorithm named minicom. This algorithm uses large k-minimizers to index the reads and subgroup those that have the same minimizer. Within each subgroup, a contig is constructed. Then some pairs of the contigs derived from the subgroups are merged into longer contigs according to a (w, k)-minimizer-indexed suffix–prefix overlap similarity between two contigs. This merging process is repeated after the longer contigs are formed until no pair of contigs can be merged. We compare the performance of minicom with two reference-based methods and four de novo methods on 18 datasets (13 RNA-seq datasets and 5 whole genome sequencing datasets). In the compression of single-end reads, minicom obtained the smallest file size for 22 of 34 cases with significant improvement. In the compression of paired-end reads, minicom achieved 20–80% compression gain over the best state-of-the-art algorithm. Our method also achieved a 10% size reduction of compressed files in comparison with the best algorithm under the reads-order preserving mode. These excellent performances are mainly attributed to the exploit of the redundancy of the repetitive substrings in the long contigs. Availability and implementation https://github.com/yuansliu/minicom Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Guanliang Meng ◽  
Yiyuan Li ◽  
Chentao Yang ◽  
Shanlin Liu

AbstractMitochondrial genome (mitogenome) plays important roles in evolutionary and ecological studies. It becomes routine to utilize multiple genes on mitogenome or the entire mitogenomes to investigate phylogeny and biodiversity of focal groups with the onset of High Throughput Sequencing technologies. We developed a mitogenome toolkit MitoZ, consisting of independent modules ofde novoassembly, findMitoScaf, annotation and visualization, that can generate mitogenome assembly together with annotation and visualization results from HTS raw reads. We evaluated its performance using a total of 50 samples of which mitogenomes are publicly available. The results showed that MitoZ can recover more full-length mitogenomes with higher accuracy compared to the other available mitogenome assemblers. Overall, MitoZ provides a one-click solution to construct the annotated mitogenome from HTS raw data and will facilitate large scale ecological and evolutionary studies. MitoZ is free open source software distributed under GPLv3 license and available athttps://github.com/linzhi2013/MitoZ.


Sign in / Sign up

Export Citation Format

Share Document