lowest common ancestor
Recently Published Documents


TOTAL DOCUMENTS

44
(FIVE YEARS 10)

H-INDEX

7
(FIVE YEARS 2)

PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258693
Author(s):  
Yuval Bussi ◽  
Ruti Kapon ◽  
Ziv Reich

Information theoretic approaches are ubiquitous and effective in a wide variety of bioinformatics applications. In comparative genomics, alignment-free methods, based on short DNA words, or k-mers, are particularly powerful. We evaluated the utility of varying k-mer lengths for genome comparisons by analyzing their sequence space coverage of 5805 genomes in the KEGG GENOME database. In subsequent analyses on four k-mer lengths spanning the relevant range (11, 21, 31, 41), hierarchical clustering of 1634 genus-level representative genomes using pairwise 21- and 31-mer Jaccard similarities best recapitulated a phylogenetic/taxonomic tree of life with clear boundaries for superkingdom domains and high subtree similarity for named taxons at lower levels (family through phylum). By analyzing ~14.2M prokaryotic genome comparisons by their lowest-common-ancestor taxon levels, we detected many potential misclassification errors in a curated database, further demonstrating the need for wide-scale adoption of quantitative taxonomic classifications based on whole-genome similarity.


Viruses ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1215
Author(s):  
Hasan Arsın ◽  
Andrius Jasilionis ◽  
Håkon Dahle ◽  
Ruth-Anne Sandaa ◽  
Runar Stokke ◽  
...  

Marine viral sequence space is immense and presents a promising resource for the discovery of new enzymes interesting for research and biotechnology. However, bottlenecks in the functional annotation of viral genes and soluble heterologous production of proteins hinder access to downstream characterization, subsequently impeding the discovery process. While commonly utilized for the heterologous expression of prokaryotic genes, codon adjustment approaches have not been fully explored for viral genes. Herein, the sequence-based identification of a putative prophage is reported from within the genome of Hypnocyclicus thermotrophus, a Gram-negative, moderately thermophilic bacterium isolated from the Seven Sisters hydrothermal vent field. A prophage-associated gene cluster, consisting of 46 protein coding genes, was identified and given the proposed name Hypnocyclicus thermotrophus phage H1 (HTH1). HTH1 was taxonomically assigned to the viral family Siphoviridae, by lowest common ancestor analysis of its genome and phylogeny analyses based on proteins predicted as holin and DNA polymerase. The gene neighbourhood around the HTH1 lytic cassette was found most similar to viruses infecting Gram-positive bacteria. In the HTH1 lytic cassette, an N-acetylmuramoyl-L-alanine amidase (Amidase_2) with a peptidoglycan binding motif (LysM) was identified. A total of nine genes coding for enzymes putatively related to lysis, nucleic acid modification and of unknown function were subjected to heterologous expression in Escherichia coli. Codon optimization and codon harmonization approaches were applied in parallel to compare their effects on produced proteins. Comparison of protein yields and thermostability demonstrated that codon optimization yielded higher levels of soluble protein, but codon harmonization led to proteins with higher thermostability, implying a higher folding quality. Altogether, our study suggests that both codon optimization and codon harmonization are valuable approaches for successful heterologous expression of viral genes in E. coli, but codon harmonization may be preferable in obtaining recombinant viral proteins of higher folding quality.


2021 ◽  
Vol 4 ◽  
Author(s):  
Shivakumara Manu

Biodiversity is declining on a planetary scale at an alarming rate due to anthropogenic factors. Classical biodiversity monitoring approaches are time-consuming, resource-intensive, and not scalable to address the current biodiversity crisis. The environmental DNA-based next-generation biomonitoring framework provides an efficient, scalable, and holistic solution for evaluating changes in various ecological entities. However, its scope is currently limited to monitoring targeted groups of organisms using metabarcoding, which suffers from various PCR-induced biases. To utilise the full potential of next-generation biomonitoring, we intended to develop PCR-free genomic technologies that can deliver unbiased biodiversity data across the tree of life in a single assay. Here, we present a novel metagenomic workflow comprising of a lysis-free extracellular DNA enrichment protocol from large-volume filtered water samples, a completely PCR-free library preparation step, an ultra-deep next-generation sequencing, and a pseudo-taxonomic assignment strategy using the dual lowest common ancestor algorithm. We demonstrate the utility of our approach in a pilot-scale spatially-replicated experimental setup in Chilika, a large hyper-diverse brackish lagoon ecosystem in India. Using incidence-based statistics, we show that biodiversity across the tree of life, from microorganisms to the relatively low-abundant macroorganisms such as Arthropods and Fishes, can be effectively detected with about one billion paired-end reads using our reproducible workflow. With decreasing costs of sequencing and the increasing availability of genomic resources from the earth biogenome project, our approach can be tested in different ecosystems and adapted for large-scale rapid assessment of biodiversity across the tree of life. *1


Author(s):  
Shivakumara Manu ◽  
Govindhaswamy Umapathy

Biodiversity is declining on a planetary scale at an alarming rate due to anthropogenic factors. Classical biodiversity monitoring approaches are time-consuming, resource-intensive, and not scalable to address the current biodiversity crisis. The environmental DNA-based next-generation biomonitoring framework provides an efficient, scalable, and holistic solution for evaluating changes in various ecological entities. However, its scope is currently limited to monitoring targeted groups of organisms using metabarcoding, which suffers from various PCR-induced biases. To utilise the full potential of next-generation biomonitoring, we intended to develop PCR-free genomic technologies that can deliver unbiased biodiversity data across the tree of life in a single assay. Here, we describe a novel metagenomic workflow comprising of a customised extracellular DNA enrichment protocol from large-volume filtered water samples, a completely PCR-free library preparation step, an ultra-deep next-generation sequencing, and a pseudo-taxonomic assignment strategy using the dual lowest common ancestor algorithm. We demonstrate the utility of our approach in a pilot-scale spatially-replicated experimental setup in Chilika, a large hyper-diverse brackish lagoon ecosystem in India. Using incidence-based statistics, we show that biodiversity across the tree of life, from microorganisms to the relatively low-abundant macroorganisms such as Arthropods and Fishes, can be effectively detected with about one billion paired-end reads using our reproducible workflow. With decreasing costs of sequencing and the increasing availability of genomic resources from the earth biogenome project, our approach can be tested in different ecosystems and adapted for large-scale rapid assessment of biodiversity across the tree of life


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ab Rouf Khan ◽  
Mohammad Ahsan Chishti

Purpose The purpose of this study is to exploit the lowest common ancestor technique in an m-ary data aggregation tree in the fog computing-enhanced IoT to assist in contact tracing in COVID-19. One of the promising characteristics of the Internet of Things (IoT) that can be used to save the world from the current crisis of COVID-19 pandemic is data aggregation. As the number of patients infected by the disease is already huge, the data related to the different attributes of patients such as patient thermal image record and the previous health record of the patient is going to be gigantic. The authors used the technique of data aggregation to efficiently aggregate the sensed data from the patients and analyse it. Among the various inferences drawn from the aggregated data, one of the most important is contact tracing. Contact tracing in COVID-19 deals with finding out a person or a group of persons who have infected or were infected by the disease. Design/methodology/approach The authors propose to exploit the technique of lowest common ancestor in an m-ary data aggregation tree in the Fog-Computing enhanced IoT to help the health-care experts in contact tracing in a particular region or community. In this research, the authors argue the current scenario of COVID-19 pandemic, finding the person or a group of persons who has/have infected a group of people is of extreme importance. Finding the individuals who have been infected or are infecting others can stop the pandemic from worsening by stopping the community transfer. In a community where the outbreak has spiked, the samples from either all the persons or the patients showing the symptoms are collected and stored in an m-ary tree-based structure sorted over time. Findings Contact tracing in COVID-19 deals with finding out a person or a group of persons who have infected or were infected by the disease. The authors exploited the technique of lowest common ancestor in an m-ary data aggregation tree in the fog-computing-enhanced IoT to help the health-care experts in contact tracing in a particular region or community. The simulations were carried randomly on a set of individuals. The proposed algorithm given in Algorithm 1 is executed on the samples collected at level-0 of the simulation model, and to aggregate the data and transmit the data, the authors implement Algorithm 2 at the level-1. It is found from the results that a carrier can be easily identified from the samples collected using the approach designed in the paper. Practical implications The work presented in the paper can aid the health-care experts fighting the COVID-19 pandemic by reducing the community transfer with efficient contact tracing mechanism proposed in the paper. Social implications Fighting COVID-19 efficiently and saving the humans from the pandemic has huge social implications in the current times of crisis. Originality/value To the best of the authors’ knowledge, the lowest common ancestor technique in m-ary data aggregation tree in the fog computing-enhanced IoT to contact trace the individuals who have infected or were infected during the transmission of COVID-19 is first of its kind proposed. Creating a graph or an m-ary tree based on the interactions/connections between the people in a particular community like location, friends and time, the authors can attempt to traverse it to find out who infected any two persons or a group of persons or was infected by exploiting the technique of finding out the lowest common ancestor in a m-ary tree.


2020 ◽  
Author(s):  
Gabriel Al-Ghalith ◽  
Dan Knights

AbstractOne of the fundamental tasks in analyzing next-generation sequencing data is genome database search, in which DNA sequences are compared to known reference genomes for identification or annotation. Although algorithms exist for optimal database search with perfect sensitivity and specificity, these have largely been abandoned for next-generation sequencing (NGS) data in favor of faster heuristic algorithms that sacrifice alignment quality. Virtually all DNA alignment tools that are commonly used in genomic and metagenomic database search use approximate methods that sometimes report the wrong match, and sometimes fail to find a valid match when present. Here we introduce BURST, a high-throughput DNA short-read aligner that uses several new synergistic optimizations to enable provably optimal alignment in NGS datasets. BURST finds all equally good matches in the database above a specified identity threshold and can either report all of them, pick the most likely among tied matches, or provide lowest-common-ancestor taxonomic annotation among tied matches. BURST can align, disambiguate, and assign taxonomy at a rate of 1,000,000 query sequences per minute against the RefSeq v82 representative prokaryotic genome database (5,500 microbial genomes, 19GB) at 98% identity on a 32-core computer, representing a speedup of up to 20,000-fold over current optimal gapped alignment techniques. This may have broader implications for clinical applications, strain tracking, and other situations where fast, exact, extremely sensitive alignment is desired.


2020 ◽  
Vol 34 (04) ◽  
pp. 6094-6101
Author(s):  
Guojia Wan ◽  
Bo Du ◽  
Shirui Pan ◽  
Gholameza Haffari

Meta-paths are important tools for a wide variety of data mining and network analysis tasks in Heterogeneous Information Networks (HINs), due to their flexibility and interpretability to capture the complex semantic relation among objects. To date, most HIN analysis still relies on hand-crafting meta-paths, which requires rich domain knowledge that is extremely difficult to obtain in complex, large-scale, and schema-rich HINs. In this work, we present a novel framework, Meta-path Discovery with Reinforcement Learning (MPDRL), to identify informative meta-paths from complex and large-scale HINs. To capture different semantic information between objects, we propose a novel multi-hop reasoning strategy in a reinforcement learning framework which aims to infer the next promising relation that links a source entity to a target entity. To improve the efficiency, moreover, we develop a type context representation embedded approach to scale the RL framework to handle million-scale HINs. As multi-hop reasoning generates rich meta-paths with various length, we further perform a meta-path induction step to summarize the important meta-paths using Lowest Common Ancestor principle. Experimental results on two large-scale HINs, Yago and NELL, validate our approach and demonstrate that our algorithm not only achieves superior performance in the link prediction task, but also identifies useful meta-paths that would have been ignored by human experts.


Author(s):  
Tao Zhang ◽  
Qunfu Wu ◽  
Zhigang Zhang

AbstractTo explore potential intermediate host of a novel coronavirus is vital to rapidly control continuous COVID-19 spread. We found genomic and evolutionary evidences of the occurrence of 2019-nCoV-like coronavirus (named as Pangolin-CoV) from dead Malayan Pangolins. Pangolin-CoV is 91.02% and 90.55% identical at the whole genome level to 2019-nCoV and BatCoV RaTG13, respectively. Pangolin-CoV is the lowest common ancestor of 2019-nCoV and RaTG13. The S1 protein of Pangolin-CoV is much more closely related to 2019-nCoV than RaTG13. Five key amino-acid residues involved in the interaction with human ACE2 are completely consistent between Pangolin-CoV and 2019-nCoV but four amino-acid mutations occur in RaTG13. It indicates Pangolin-CoV has similar pathogenic potential to 2019-nCoV, and would be helpful to trace the origin and probable intermediate host of 2019-nCoV.


2019 ◽  
Vol 35 (19) ◽  
pp. 3794-3802 ◽  
Author(s):  
Guangxu Xun ◽  
Kishlay Jha ◽  
Ye Yuan ◽  
Yaqing Wang ◽  
Aidong Zhang

Abstract Motivation MEDLINE is the primary bibliographic database maintained by National Library of Medicine (NLM). MEDLINE citations are indexed with Medical Subject Headings (MeSH), which is a controlled vocabulary curated by the NLM experts. This greatly facilitates the applications of biomedical research and knowledge discovery. Currently, MeSH indexing is manually performed by human experts. To reduce the time and monetary cost associated with manual annotation, many automatic MeSH indexing systems have been proposed to assist manual annotation, including DeepMeSH and NLM’s official model Medical Text Indexer (MTI). However, the existing models usually rely on the intermediate results of other models and suffer from efficiency issues. We propose an end-to-end framework, MeSHProbeNet (formerly named as xgx), which utilizes deep learning and self-attentive MeSH probes to index MeSH terms. Each MeSH probe enables the model to extract one specific aspect of biomedical knowledge from an input article, thus comprehensive biomedical information can be extracted with different MeSH probes and interpretability can be achieved at word level. MeSH terms are finally recommended with a unified classifier, making MeSHProbeNet both time efficient and space efficient. Results MeSHProbeNet won the first place in the latest batch of Task A in the 2018 BioASQ challenge. The result on the last test set of the challenge is reported in this paper. Compared with other state-of-the-art models, such as MTI and DeepMeSH, MeSHProbeNet achieves the highest scores in all the F-measures, including Example Based F-Measure, Macro F-Measure, Micro F-Measure, Hierarchical F-Measure and Lowest Common Ancestor F-measure. We also intuitively show how MeSHProbeNet is able to extract comprehensive biomedical knowledge from an input article.


Author(s):  
Dayananda P. ◽  
Sowmyarani C. N.

The size of semi-structured data is increasing continuously. Handling semi-structured data efficiently is a challenging task. Keyword search is an important task, and required information can be retrieved without having knowledge of data storage hierarchy. There are several challenges in handling XML data. This chapter discusses various challenges in terms of lowest common ancestor (LCA) semantics, processing of queries efficiently, retrieving top-k results for user needed data. The existing approach is defined under many classes based on how the problem and solution are tackled. Analysis of keyword search and ranking techniques for retrieving desired information are discussed in detail.


Sign in / Sign up

Export Citation Format

Share Document