scholarly journals Revising transcriptome assemblies with phylogenetic information in Agalma1.0

2017 ◽  
Author(s):  
August Guang ◽  
Mark Howison ◽  
Felipe Zapata ◽  
Charles Lawrence ◽  
Casey Dunn

AbstractMotivationOne of the most common transcriptome assembly errors is to mistake different transcripts of the same gene as transcripts from multiple closely related genes. It is difficult to identify these errors during assembly, but in a phylogenetic analysis these errors can be diagnosed from gene trees containing clades of tips from the same species with improbably short branch lengths.Resultstreeinform is a module implemented in Agalma1.0 that uses phylogenetic analyses across species to refine transcriptome assemblies. It identifies transcripts of the same gene that were incorrectly assigned to multiple genes and reassign them as transcripts of the same gene.Availability and Implementationtreeinform is implemented in Agalma1.0, available at https://bitbucket.org/caseywdunn/[email protected] informationSupplementary information is available at bioRxiv.

PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0244202
Author(s):  
August Guang ◽  
Mark Howison ◽  
Felipe Zapata ◽  
Charles Lawrence ◽  
Casey W. Dunn

A common transcriptome assembly error is to mistake different transcripts of the same gene as transcripts from multiple closely related genes. This error is difficult to identify during assembly, but in a phylogenetic analysis such errors can be diagnosed from gene phylogenies where they appear as clades of tips from the same species with improbably short branch lengths. treeinform is a method that uses phylogenetic information across species to refine transcriptome assemblies within species. It identifies transcripts of the same gene that were incorrectly assigned to multiple genes and reassign them as transcripts of the same gene. The treeinform method is implemented in Agalma, available at https://bitbucket.org/caseywdunn/agalma, and the general approach is relevant in a variety of other contexts.


Insects ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 320
Author(s):  
Sheng-Shan Lu ◽  
Junichi Takahashi ◽  
Wen-Chi Yeh ◽  
Ming-Lun Lu ◽  
Jing-Yi Huang ◽  
...  

The invasive alien species (IAS) Vespa bicolor is the first reported hornet that has established in Taiwan and is concerning as they prey on honeybee Apis mellifera, which leads to colony losses and public concerns. Thus, the aim of this study was to assess the current status of V. bicolor abundance, dispersal, and impact and to trace the origins of Taiwan’s V. bicolor population. Our studies took place in five areas in northern to central Taiwan. We used mtDNA in the phylogenetic analyses. Field survey and ecological niche modeling (ENM) were used to understand the origins and current range of the invasive species. Two main subgroups of V. bicolor in the phylogenetic tree were found, and a clade with short branch lengths in Southeastern China and Taiwan formed a subgroup, which shows that the Taiwan population may have invaded from a single event. Evidence shows that V. bicolor is not a severe pest to honeybees in the study area; however, using ENM, we predict the rapid dispersion of this species to the cooler and hilly mountain areas of Taiwan. The management of V. bicolor should also involve considering it a local pest to reduce loss by beekeepers and public fear in Taiwan. Our findings highlight how the government, beekeepers, and researchers alike should be aware of the implications of V. bicolor’s rapid range expansion in Taiwan, or in other countries.


2020 ◽  
Vol 36 (18) ◽  
pp. 4819-4821
Author(s):  
Anastasiia Kim ◽  
James H Degnan

Abstract Summary PRANC computes the Probabilities of RANked gene tree topologies under the multispecies coalescent. A ranked gene tree is a gene tree accounting for the temporal ordering of internal nodes. PRANC can also estimate the maximum likelihood (ML) species tree from a sample of ranked or unranked gene tree topologies. It estimates the ML tree with estimated branch lengths in coalescent units. Availability and implementation PRANC is written in C++ and freely available at github.com/anastasiiakim/PRANC. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Patricia E. Sørensen ◽  
Wim Van Den Broeck ◽  
Kristoffer Kiil ◽  
Dziuginta Jasinskyte ◽  
Arshnee Moodley ◽  
...  

Abstract Despite phages’ ubiquitous presence and great importance in shaping microbial communities, little is known about the diversity of specific phages in different ecological niches. Here, we isolated, sequenced, and characterized 38 Escherichia coli-infecting phages (coliphages) from poultry faeces to gain a better understanding of the coliphage diversity in the poultry intestine. All phages belonged to either the Siphoviridae or Myoviridae family and their genomes ranged between 44,324 and 173,384 bp, with a G+C content between 35.5 and 46.4%. Phylogenetic analysis was performed based on single “marker” genes; the terminase large subunit, portal protein, and exonucleases, as well as the full draft genomes. Single gene analysis resulted in six distinct clusters. Only minor differences were observed between the different phylogenetic analyses, including branch lengths and additional duplicate or triplicate subclustering. Cluster formation was according to genome size, G+C content and phage subfamily. Phylogenetic analysis based on the full genomes supported these clusters. Moreover, several of our Siphoviridae phages might represent a novel unclassified phage genus. This study allowed for identification of several novel coliphages and provides new insights to the coliphage diversity in the intestine of poultry. Great diversity was observed amongst the phages, while they were isolated from an otherwise similar ecosystem.


Holzforschung ◽  
2014 ◽  
Vol 68 (2) ◽  
pp. 247-251 ◽  
Author(s):  
Young Min Lee ◽  
Hanbyul Lee ◽  
Yeongseon Jang ◽  
Yirang Cho ◽  
Gyu-Hyeok Kim ◽  
...  

Abstract Twenty-four Alternaria strains have been isolated from wood samples in Korea and submitted to phylogenetic analyses. The gene trees generated from the ITS and histone gene region sequences revealed that, among the genus Alternaria, two species, Alternaria alternata sensu lato (s.l.) and Alternaria tenuissima, are involved in wood discoloration. In addition, the histone gene was useful as a marker for differentiating between A. alternata s.l. and A. tenuissima.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Thuan Duc Lao ◽  
Thuy Ai Huyen Le ◽  
Nguyen Binh Truong

AbstractAn entomopathogenic fungus newly named Ophiocordyceps langbianensis was collected from Lang Biang Biosphere Reserve, located in Lam Dong Province, Vietnam. It is characterized as a species of Ophiocordyceps (Ophiocordycipitaceae, Hypocreales) having the unique characteristics of a cylindrical fertile part and several branched apical appendices. Each ascospore develops as two swollen, constricted part-spores. A phylogenetic analysis of multiple genes, including nrLSU, nrSSU, Rpb1, ITS and Tef, supported its systematic position in the genus of Ophiocordyceps; it is related to O. brunneipunctata. Based on morphological and phylogenetic analyses, O. langbianensis was confirmed as a new species from Vietnam.


Genes ◽  
2019 ◽  
Vol 10 (1) ◽  
pp. 23 ◽  
Author(s):  
Ji-Hyeon Jeon ◽  
Seung-Chul Kim

Species belonging to Rosa section Synstylae (Rosaceae) are mainly distributed in East Asia, and represent recently diverged lineages within the genus. Over decades, inferring phylogenetic relationships within section Synstylae have been exceptional challenges, due to short branch lengths and low support values. Of approximately 36 species in the section Synstylae, Rosa multiflora, Rosa luciae and Rosa maximowicziana are widely distributed in the Sino-Japanese floristic region. In this study, we assembled chloroplast genomes of these three species to compare the genomic features within section Synstylae, and to compare with other infrageneric groups. We found that three Rosa sect. Synstylae species had lost infA genes with pseudogenization, and they were almost identical to each other. Two protein-coding gene regions (ndhF and ycf1) and five non-coding regions (5’matK-trnK, psbI-trnS-trnG, rps16-trnG, rpoB-trnC, and rps4-trnT) were identified as being highly informative markers. Within three section Synstylae chloroplast genomes, 85 simple sequence repeat (SSR) motifs were detected, of which at least 13 motifs were identified to be effective markers. The phylogenetic relationships of R. multiflora, R. luciae and R. maximowicziana could not be resolved, even with chloroplast genome-wide data. This study reveals the chloroplast genomic data of Rosa sect. Synstylae, and it provides valuable markers for DNA barcoding and phylogenetic analyses for further studies.


2018 ◽  
Author(s):  
Benoit Morel ◽  
Alexey M. Kozlov ◽  
Alexandros Stamatakis

ABSTRACTMotivationCoalescent- and reconciliation-based methods are now widely used to infer species phylogenies from genomic data. They typically use per-gene phylogenies as input, which requires conducting multiple individual tree inferences on a large set of multiple sequence alignments (MSAs). At present, no easy-to-use parallel tool for this task exists. Ad hoc scripts for this purpose do not only induce additional implementation overhead, but can also lead to poor resource utilization and long times-to-solution. We present ParGenes, a tool for simultaneously determining the best-fit model and inferring maximum likelihood (ML) phylogenies on thousands of independent MSAs using supercomputers.ResultsParGenes executes common phylogenetic pipeline steps such as model-testing, ML inference(s), bootstrapping, and computation of branch support values via a single parallel program invocation. We evaluated ParGenes by inferring > 20, 000 phylogenetic gene trees with bootstrap support values from Ensembl Compara and VectorBase alignments in 28 hours on a cluster with 1024 nodes.AvailabilityGNU GPL at https://github.com/BenoitMorel/[email protected] informationSupplementary material is available at Bioinformatics online.


2019 ◽  
Author(s):  
Mark S. Springer ◽  
John Gatesy

ABSTRACTSummary coalescence methods were developed to address the negative impacts of incomplete lineage sorting on species tree estimation with concatenation. Coalescence methods are statistically consistent if certain requirements are met including no intralocus recombination, neutral evolution, and no gene tree reconstruction error. However, the assumption of no intralocus recombination may not hold for many DNA sequence data sets, and neutral evolution is not the rule for genetic markers that are commonly employed in phylogenomic coalescence analyses. Most importantly, the assumption of no gene tree reconstruction error is routinely violated, especially for rapid radiations that are deep in the Tree of Life. With the sequencing of complete genomes and novel pipelines, phylogenetic analysis of retroposon insertions has emerged as a valuable alternative to sequence-based phylogenetic analysis. Retroposon insertions avoid or reduce several problems that beset analysis of sequence data with summary coalescence methods: 1) intralocus recombination is avoided because retroposon insertions are singular evolutionary events, 2) neutral evolution is approximated in many cases, and 3) gene tree reconstruction errors are rare because retroposons have low rates of homoplasy. However, the analysis of retroposons within a multispecies coalescent framework has not been realized. Here, we propose a simple workaround in which a retroposon insertion matrix is first transformed into a series of incompletely resolved gene trees. Next, the program ASTRAL is used to estimate a species tree in the statistically consistent framework of the multispecies coalescent. The inferred species tree includes support scores at all nodes and internal branch lengths in coalescent units. As a test case, we analyzed a retroposon dataset for palaeognath birds (ratites and tinamous) with ASTRAL and compared the resulting species tree to an MP-EST species tree for the same clade derived from thousands of sequence-based gene trees. The MP-EST species tree suggests an empirical case of the ‘anomaly zone’ with three very short internal branches at the base of Palaeognathae, and as predicted for anomaly zone conditions, the MP-EST species tree differs from the most common gene tree. Although identical in topology to the MP-EST tree, the ASTRAL species tree based on retroposons shows branch lengths that are much longer and incompatible with anomaly zone conditions. Simulation of gene trees from the retroposon-based species tree reveals that the most common gene tree matches the species tree. We contend that the wide discrepancies in branch lengths between sequence-based and retroposon-based species trees are explained by the greater accuracy of retroposon gene trees (bipartitions) relative to sequence-based gene trees. Coalescence analysis of retroposon data provides a promising alternative to the status quo by reducing gene tree reconstruction error that can have large impacts on both branch length estimates and evolutionary interpretations.


2021 ◽  
Vol 16 (1) ◽  
pp. 711-718
Author(s):  
Thuan Duc Lao ◽  
Hanh Van Trinh ◽  
Loi Vuong ◽  
Luyen Tien Vu ◽  
Thuy Ai Huyen Le ◽  
...  

Abstract The entomopathogenic fungus T011, parasitizing on nymph of Cicada, collected in the coffee garden in Dak Lak Province, Vietnam, was preliminarily morphologically identified as Isaria cicadae, belonged to order Hypocreales and family Clavicipitaceae. To ensure the authenticity of T011, phylogenetic analysis of the concatenated set of multiple genes including ITS, nrLSU, nrSSU, Rpb1, and Tef1 was applied to support the identification. Genomic DNA was isolated from dried sample T011. The PCR assay sequencing was applied to amplify ITS, nrLSU, nrSSU, Rpb1, and Tef1 gene. For phylogenetic analysis, the concatenated data of both target gens were constructed with MEGAX with a 1,000 replicate bootstrap based on the neighbor-joining, maximum likelihood, maximum parsimony method. As the result, the concatenated data containing 62 sequences belonged to order Hypocreales, families Clavicipitaceae, and 2 outgroup sequences belonged to order Hypocreales, genus Verticillium. The phylogenetic analysis results indicated that T011 was accepted at subclade Cordyceps and significantly formed the monophyletic group with referent Cordyceps cicadae (Telemorph of Isaria cicadae) with high bootstrap value. The phylogenetically analyzed result was strongly supported by our morphological analysis described as the Isaria cicadae. In summary, phylogenetic analyses based on the concatenated dataset were successfully applied to strengthen the identification of T011 as Isaria cicadae.


Sign in / Sign up

Export Citation Format

Share Document