scholarly journals LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Emanuel Maldonado ◽  
Agostinho Antunes

Abstract Background Recent advances in genome sequencing technologies and the cost drop in high-throughput sequencing continue to give rise to a deluge of data available for downstream analyses. Among others, evolutionary biologists often make use of genomic data to uncover phenotypic diversity and adaptive evolution in protein-coding genes. Therefore, multiple sequence alignments (MSA) and phylogenetic trees (PT) need to be estimated with optimal results. However, the preparation of an initial dataset of multiple sequence file(s) (MSF) and the steps involved can be challenging when considering extensive amount of data. Thus, it becomes necessary the development of a tool that removes the potential source of error and automates the time-consuming steps of a typical workflow with high-throughput and optimal MSA and PT estimations. Results We introduce LMAP_S (Lightweight Multigene Alignment and Phylogeny eStimation), a user-friendly command-line and interactive package, designed to handle an improved alignment and phylogeny estimation workflow: MSF preparation, MSA estimation, outlier detection, refinement, consensus, phylogeny estimation, comparison and editing, among which file and directory organization, execution, manipulation of information are automated, with minimal manual user intervention. LMAP_S was developed for the workstation multi-core environment and provides a unique advantage for processing multiple datasets. Our software, proved to be efficient throughout the workflow, including, the (unlimited) handling of more than 20 datasets. Conclusions We have developed a simple and versatile LMAP_S package enabling researchers to effectively estimate multiple datasets MSAs and PTs in a high-throughput fashion. LMAP_S integrates more than 25 software providing overall more than 65 algorithm choices distributed in five stages. At minimum, one FASTA file is required within a single input directory. To our knowledge, no other software combines MSA and phylogeny estimation with as many alternatives and provides means to find optimal MSAs and phylogenies. Moreover, we used a case study comparing methodologies that highlighted the usefulness of our software. LMAP_S has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP_S package is released under GPLv3 license and is freely available at https://lmap-s.sourceforge.io/.

2020 ◽  
Author(s):  
Dustin J. Wcisel ◽  
J. Thomas Howard ◽  
Jeffrey A. Yoder ◽  
Alex Dornburg

Abstract Background Advances in next-generation sequencing technologies have reduced the cost of whole transcriptome analyses, allowing characterization of non-model species at unprecedented levels. The rapid pace of transcriptomic sequencing has driven the public accumulation of a wealth of data for phylogenomic analyses, however lack of tools aimed towards phylogeneticists to efficiently identify orthologous sequences currently hinders effective harnessing of this resource. Results We introduce TOAST, an open source R software package that can utilize the ortholog searches based on the software Benchmarking Universal Single-Copy Orthologs (BUSCO) to assemble multiple sequence alignments of orthologous loci from transcriptomes for any group of organisms. By streamlining search, query, and alignment, TOAST automates the generation of locus and concatenated alignments, and also presents a series of outputs from which users can not only explore missing data patterns across their alignments, but also reassemble alignments based on user-defined acceptable missing data levels for a given research question. Conclusions TOAST provides a comprehensive set of tools for assembly of sequence alignments of orthologs for comparative transcriptomic and phylogenomic studies. This software empowers easy assembly of public and novel sequences for any target database of candidate orthologs, and fills a critically needed niche for tools that enable quantification and testing of the impact of missing data. As open-source software, TOAST is fully customizable for integration into existing or novel custom informatic pipelines for phylogenomic inference.


2019 ◽  
Author(s):  
Alex Dornburg ◽  
Dustin J. Wcisel ◽  
J. Thomas Howard ◽  
Jeffrey A. Yoder

Abstract Background Advances in next-generation sequencing technologies have reduced the cost of whole transcriptome analyses, allowing characterization of non-model species at unprecedented levels. The rapid pace of transcriptomic sequencing has driven the public accumulation of a wealth of data for phylogenomic analyses, however lack of tools aimed towards phylogeneticists to efficiently identify orthologous sequences currently hinders effective harnessing of this resource.Results We introduce TOAST, an open source R software package that can utilize the ortholog searches based on the software Benchmarking Universal Single-Copy Orthologs (BUSCO) to assemble multiple sequence alignments of orthologous loci from transcriptomes for any group of organisms. By streamlining search, query, and alignment, TOAST automates the generation of locus and concatenated alignments, and also presents a series of outputs from which users can not only explore missing data patterns across their alignments, but also reassemble alignments based on user-defined acceptable missing data levels for a given research question.Conclusions TOAST provides a comprehensive set of tools for assembly of sequence alignments of orthologs for comparative transcriptomic and phylogenomic studies. This software empowers easy assembly of public and novel sequences for any target database of candidate orthologs, and fills a critically needed niche for tools that enable quantification and testing of the impact of missing data. As open-source software, TOAST is fully customizable for integration into existing or novel custom informatic pipelines for phylogenomic inference.


2020 ◽  
Author(s):  
Dustin J. Wcisel ◽  
J. Thomas Howard ◽  
Jeffrey A. Yoder ◽  
alex dornburg

Abstract Background Advances in next-generation sequencing technologies have reduced the cost of whole transcriptome analyses, allowing characterization of non-model species at unprecedented levels. The rapid pace of transcriptomic sequencing has driven the public accumulation of a wealth of data for phylogenomic analyses, however lack of tools aimed towards phylogeneticists to efficiently identify orthologous sequences currently hinders effective harnessing of this resource. Results We introduce TOAST, an open source R software package that can utilize the ortholog searches based on the software Benchmarking Universal Single-Copy Orthologs (BUSCO) to assemble multiple sequence alignments of orthologous loci from transcriptomes for any group of organisms. By streamlining search, query, and alignment, TOAST automates the generation of locus and concatenated alignments, and also presents a series of outputs from which users can not only explore missing data patterns across their alignments, but also reassemble alignments based on user-defined acceptable missing data levels for a given research question. Conclusions TOAST provides a comprehensive set of tools for assembly of sequence alignments of orthologs for comparative transcriptomic and phylogenomic studies. This software empowers easy assembly of public and novel sequences for any target database of candidate orthologs, and fills a critically needed niche for tools that enable quantification and testing of the impact of missing data. As open-source software, TOAST is fully customizable for integration into existing or novel custom informatic pipelines for phylogenomic inference.


2020 ◽  
Author(s):  
alex dornburg ◽  
Dustin J. Wcisel ◽  
J. Thomas Howard ◽  
Jeffrey A. Yoder

Abstract Background: Advances in next-generation sequencing technologies have reduced the cost of whole transcriptome analyses, allowing characterization of non-model species at unprecedented levels. The rapid pace of transcriptomic sequencing has driven the public accumulation of a wealth of data for phylogenomic analyses, however lack of tools aimed towards phylogeneticists to efficiently identify orthologous sequences currently hinders effective harnessing of this resource.Results: We introduce TOAST, an open source R software package that can utilize the ortholog searches based on the software Benchmarking Universal Single-Copy Orthologs (BUSCO) to assemble multiple sequence alignments of orthologous loci from transcriptomes for any group of organisms. By streamlining search, query, and alignment, TOAST automates the generation of locus and concatenated alignments, and also presents a series of outputs from which users can not only explore missing data patterns across their alignments, but also reassemble alignments based on user-defined acceptable missing data levels for a given research question.Conclusions: TOAST provides a comprehensive set of tools for assembly of sequence alignments of orthologs for comparative transcriptomic and phylogenomic studies. This software empowers easy assembly of public and novel sequences for any target database of candidate orthologs, and fills a critically needed niche for tools that enable quantification and testing of the impact of missing data. As open-source software, TOAST is fully customizable for integration into existing or novel custom informatic pipelines for phylogenomic inference. Software, a detailed manual, and example data files are available through github carolinafishes.github.io


Author(s):  
Stella C. Yuan ◽  
Eric Malekos ◽  
Melissa T. R. Hawkins

AbstractThe use of museum specimens held in natural history repositories for population and conservation genetic research is increasing in tandem with the use of massively parallel sequencing technologies. Short Tandem Repeats (STRs), or microsatellite loci, are commonly used genetic markers in wildlife and population genetic studies. However, they traditionally suffered from a host of issues including length homoplasy, high costs, low throughput, and difficulties in reproducibility across laboratories. Massively parallel sequencing technologies can address these problems, but the incorporation of museum specimen derived DNA suffers from significant fragmentation and exogenous DNA contamination. Combatting these issues requires extra measures of stringency in the lab and during data analysis, yet there have not been any high-throughput sequencing studies evaluating microsatellite allelic dropout from museum specimen extracted DNA. In this study, we evaluate genotyping errors derived from mammalian museum skin DNA extracts for previously characterized microsatellites across PCR replicates utilizing high-throughput sequencing. We found it useful to classify samples based on DNA concentration, which determined the rate by which genotypes were accurately recovered. Longer microsatellites performed worse in all museum specimens. Allelic dropout rates across loci were dependent on sample quantity, with high concentration museum specimens performing as well and recovering quality metrics nearly as high as the frozen tissue sample. Based on our results, we provide a set of best practices for quality assurance and incorporation of reliable genotypes from museum specimens.


2019 ◽  
Author(s):  
Reneth Millas ◽  
Mary Espina ◽  
CM Sabbir Ahmed ◽  
Angelina Bernardini ◽  
Ekundayo Adeleke ◽  
...  

ABSTRACTOne of the most important tools in genetic improvement is mutagenesis, which is a useful tool to induce genetic and phenotypic variation for trait improvement and discovery of novel genes. JTN-5203 (MG V) mutant population was generated using an induced ethyl methane sulfonate (EMS) mutagenesis and was used for detection of induced mutations in FAD2-1A and FAD2-1B genes using reverse genetics approach. Optimum concentration of EMS was used to treat 15,000 bulk JTN-5203 seeds producing 1,820 M2 population. DNA was extracted, normalized, and pooled from these individuals. Specific primers were designed from FAD2-1A and FAD2-1B genes that are involved in the fatty acid biosynthesis pathway for further analysis using next-generation sequencing. High throughput mutation discovery through TILLING-by-Sequencing approach was used to detect novel allelic variations in this population. Several mutations and allelic variations with high impacts were detected for FAD2-1A and FAD2-1B. This includes GC to AT transition mutations in FAD2-1A (20%) and FAD2-1B (69%). Mutation density for this population is estimated to be about 1/136kb. Through mutagenesis and high-throughput sequencing technologies, novel alleles underlying the mutations observed in mutants with reduced polyunsaturated fatty acids will be identified, and these mutants can be further used in breeding soybean lines with improved fatty acid profile, thereby developing heart-healthy-soybeans.


Author(s):  
AA Kliuchnikova ◽  
SA Moshkovskii

Adenosine-to-inosine (A-to-I) RNA editing is a common mechanism of post-transcriptional modification in many metazoans including vertebrates; the process is catalyzed by adenosine deaminases acting on RNA (ADARs). Using high-throughput sequencing technologies resulted in finding thousands of RNA editing sites throughout the human transcriptome however, their functions are still poorly understood. The aim of this brief review is to draw attention of clinicians and biomedical researchers to ADAR-mediated RNA editing phenomenon and its possible implication in development of neuropathologies, antiviral immune responses and cancer.


2021 ◽  
Vol 4 ◽  
Author(s):  
Dalila Destanović ◽  
Lejla Ušanović ◽  
Lejla Lasić ◽  
Jasna Hanjalić ◽  
Belma Kalamujić Stroil

Chaetopteryx villosa (Fabricius, 1798) is a caddisfly species distributed throughout Europe, except in the Balkan and Apennine Peninsula. However, phylogenetically close species belonging to the C. villosa group are widespread throughout entire Europe. Species of this group (C. villosa, C. gessneri, C. fusca, C. sahlbergi, C. atlantica, C. bosniaca, C. vulture, and C. trinacriae) have distinct distributions with some overlaps. Adult forms of these species are morphologically similar, whereas larval morphology is only known for some species. There are also indications of species hybridization (e.g., C. villosa x fusca). Presumably, the molecular approach for the species determination of this group would be highly beneficial. In the BOLD database, there are 154 specimens with COI-5P barcodes of C. villosa species. Out of the remaining species, C. sahlbergi has 27 specimens with a barcode, C. fusca 20, C. gessneri 5, C. bosniaca 5, and C. atlantica 1, whereas sequences from the species C. vulture and C. trinacriae are missing. Therefore, we tested the power of discrimination of the COI-5P marker in the C. villosa group, as the most common barcoding markers for species identification in animals. Only sequences from public records originating from experienced research groups or taxonomists and containing a specimen photograph were taken as input. A total of 75 sequences from the BOLD database were obtained. Out of these sequences, 11 belonged to C. fusca, 5 to C. gessneri, 52 to C. villosa, 5 to C. bosniaca, and 2 to C. sahlbergi. For the generation of overview trees, COI-5P barcodes of Rhyacophila fasciata and Rh. nubila were used as outgroups. All sequences were trimmed at 5’ and 3’ ends, resulting in a final alignment length of 516 base pairs. Multiple sequence alignments and editing were done in the MEGA-X software. Analysis of nucleotide polymorphism was done in DNASP6 software. MEGA-X was used to calculate the pairwise distance and overall mean p-distance, and to construct the overview trees. Analysis of DNA polymorphism revealed 14 haplotypes of C. villosa, 3 haplotypes of C. fusca, 2 haplotypes of C. gessneri, and one for species C. bosniaca and C. sahlbergi. There were no significant interspecific and intraspecific differences among haplotypes based on pairwise distances. The p-distance between one of the haplotypes of C. fusca and C. villosa was 0.000, whereas the p-distance among haplotypes of C. villosa varied from 0.001 to about 0.055. The mean overall p-distance among haplotypes of all species equaled 0.03. No species-specific clusters were observed when phylogenetic trees were constructed except for C. gessneri, regardless of the method used (i.e., NJ, UPGMA, ML, ME, or MP). To minimize the possibility of species misidentification, we used only records submitted by NTNU-Norwegian University of Science and Technology (Norway), SNSB-Zoologische Staatssammlung Muenchen (Germany), Zoologisches Forschungsmuseum Alexander Koenig (Germany), University of Oulu, Zoological Museum (Finland), prof Hans Malicky and prof Mladen Kučinić. No records identified as hybrids were included in the analyses. With the exception of C. gessneri, COI-5P marker failed to separate the species of the C. villosa group. However, it is highly unlikely that poor species determination was the basis for such a result. To enable the comprehensive and unbiased evaluation of the relationships within this group, data coverage in BOLD database for most of the studied species should be enhanced, encompassing different geographical distribution of samples. Further studies are needed to detect the array of molecular markers suitable for the species delineation in a complex group such as C. villosa.


2019 ◽  
Vol 16 (1) ◽  
Author(s):  
Jati Adiputra ◽  
Sridhar Jarugula ◽  
Rayapati A. Naidu

Abstract Background Grapevine leafroll disease is one of the most economically important viral diseases affecting grape production worldwide. Grapevine leafroll-associated virus 4 (GLRaV-4, genus Ampelovirus, family Closteroviridae) is one of the six GLRaV species documented in grapevines (Vitis spp.). GLRaV-4 is made up of several distinct strains that were previously considered as putative species. Currently known strains of GLRaV-4 stand apart from other GLRaV species in lacking the minor coat protein. Methods In this study, the complete genome sequence of three strains of GLRaV-4 from Washington State vineyards was determined using a combination of high-throughput sequencing, Sanger sequencing and RACE. The genome sequence of these three strains was compared with corresponding sequences of GLRaV-4 strains reported from other grapevine-growing regions. Phylogenetic analysis and SimPlot and Recombination Detection Program (RDP) were used to identify putative recombination events among GLRaV-4 strains. Results The genome size of GLRaV-4 strain 4 (isolate WAMR-4), strain 5 (isolate WASB-5) and strain 9 (isolate WALA-9) from Washington State vineyards was determined to be 13,824 nucleotides (nt), 13,820 nt, and 13,850 nt, respectively. Multiple sequence alignments showed that a 11-nt sequence (5′-GTAATCTTTTG-3′) towards 5′ terminus of the 5′ non-translated region (NTR) and a 10-nt sequence (5′-ATCCAGGACC-3′) towards 3′ end of the 3′ NTR are conserved among the currently known GLRaV-4 strains. LR-106 isolate of strain 4 and Estellat isolate of strain 6 were identified as recombinants due to putative recombination events involving divergent sequences in the ORF1a from strain 5 and strain Pr. Conclusion Genome-wide analyses showed for the first time that recombinantion can occur between distinct strains of GLRaV-4 resulting in the emergence of genetically stable and biologically successful chimeric viruses. Although the origin of recombinant strains of GLRaV-4 remains elusive, intra-species recombination could be playing an important role in shaping genetic diversity and evolution of the virus and modulating the biology and epidemiology of GLRaV-4 strains.


Sign in / Sign up

Export Citation Format

Share Document