genome dataset
Recently Published Documents


TOTAL DOCUMENTS

28
(FIVE YEARS 21)

H-INDEX

5
(FIVE YEARS 3)

2021 ◽  
Vol 288 (1965) ◽  
Author(s):  
Rupert Mazzucco ◽  
Christian Schlötterer

The influence of the microbiome on its host is well-documented, but the interplay of its members is not yet well-understood. Even for simple microbiomes, the interaction among members of the microbiome is difficult to study. Longitudinal studies provide a promising approach to studying such interactions through the temporal covariation of different taxonomic units. By contrast to most longitudinal studies, which span only a single host generation, we here present a post hoc analysis of a whole-genome dataset of 81 samples that follows microbiome composition for up to 180 host generations, which cover nearly 10 years. The microbiome diversity remained rather stable in replicated Drosophila melanogaster populations exposed to two different temperature regimes. The composition changed, however, systematically across replicates of the two temperature regimes. Significant associations between families, mostly specific to one temperature regime, indicate functional interdependence of different microbiome components. These associations also involve moderately abundant families, which emphasizes their functional importance, and highlights the importance of looking beyond the common constituents of the Drosophila microbiome.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yang Fang ◽  
Menglong Li ◽  
Xufeng Li ◽  
Yi Yang

Abstract Background Phylogenetic profiling is widely used to predict novel members of large protein complexes and biological pathways. Although methods combined with phylogenetic trees have significantly improved prediction accuracy, computational efficiency is still an issue that limits its genome-wise application. Results Here we introduce a new tree-based phylogenetic profiling algorithm named GFICLEE, which infers common single and continuous loss (SCL) events in the evolutionary patterns. We validated our algorithm with human pathways from three databases and compared the computational efficiency with current tree-based with 10 different scales genome dataset. Our algorithm has a better predictive performance with high computational efficiency. Conclusions The GFICLEE is a new method to infers genome-wide gene function. The accuracy and computational efficiency of GFICLEE make it possible to explore gene functions at the genome-wide level on a personal computer.


2021 ◽  
Author(s):  
Rooban Thavarajah ◽  
Elizabeth Joshua ◽  
Kannan Ranganathan

Introduction: Evasion of programmed cell death (PCD) is a hall mark of oncogenesis. There are different types of PCD. Iron related PCD, ferroptosis is being increasingly associated with neoplastic process. There are very few reports that investigate the role of ferroptosis in Oral Squamous Cell Carcinoma (OSCC). An attempt is made to compare the ferroptosis related genes(FRGs) expression in human OSCC and normal oral tissues. Materials and Methods: Gene Expression Omnibus repository was scanned for OSCC mRNA datasets along with normal control tissues. Datasets fulfilling inclusion and exclusion criteria as well as that fulfilled the statistical correlation requirements were considered for this study. Differentially expressed mRNAs were identified. From the literature and ferroptosis database, FRGs were identified and those FRGs were differentially expressed were validated using The Human Cancer Genome dataset. Results: In all 44 FRGs were identified to be differentially expressed between OSCC and control tissues. Of the 44, 21 were that promoted ferroptosis including 18 drivers of ferroptosis. Of the 21 FRGs that drives ferroptosis, 9 were found significantly elevated in controls while the remaining 12 were elevated in OSCC. The role of the differentially expressed FRGs were also studied. Of the 44 FRGs, 36 were validated using the human cancer genome dataset. Discussion and Conclusion: Drivers and suppressors of ferroptosis were differentially expressed in OSCC and controls. This reflects that ferroptosis has a dual role in oncogenesis, both as a promoter and a suppressor. The identified specific FRGs in this studied would help to understand the role of PCD in OSCC progression and help in designing better treatment.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Kaleb Abram ◽  
Zulema Udaondo ◽  
Carissa Bleker ◽  
Visanu Wanchai ◽  
Trudy M. Wassenaar ◽  
...  

AbstractIn this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Abdul Wahab ◽  
Hilal Tayara ◽  
Zhenyu Xuan ◽  
Kil To Chong

AbstractN4-methylcytosine is a biochemical alteration of DNA that affects the genetic operations without modifying the DNA nucleotides such as gene expression, genomic imprinting, chromosome stability, and the development of the cell. In the proposed work, a computational model, 4mCNLP-Deep, used the word embedding approach as a vector formulation by exploiting deep learning based CNN algorithm to predict 4mC and non-4mC sites on the C.elegans genome dataset. Diversity of ranges employed for the experimental such as corpus k-mer and k-fold cross-validation to obtain the prevailing capabilities. The 4mCNLP-Deep outperform from the state-of-the-art predictor by achieving the results in five evaluation metrics by following; Accuracy (ACC) as 0.9354, Mathew’s correlation coefficient (MCC) as 0.8608, Specificity (Sp) as 0.89.96, Sensitivity (Sn) as 0.9563, and Area under curve (AUC) as 0.9731 by using 3-mer corpus word2vec and 3-fold cross-validation and attained the increment of 1.1%, 0.6%, 0.58%, 0.77%, and 4.89%, respectively. At last, we developed the online webserver http://nsclbio.jbnu.ac.kr/tools/4mCNLP-Deep/, for the experimental researchers to get the results easily.


2021 ◽  
pp. 17-25
Author(s):  
Mahmud Alosta ◽  
◽  
◽  
Alireza Souri

In recent years, a massive amount of genomic DNA sequences are being created which leads to the development of new storing and archiving methods. There is a major challenge to process, store or transmit the huge volume of DNA sequences data. To lessen the number of bits needed to store and transmit data, data compression (DC) techniques are proposed. Recently, DC becomes more popular, and large number of techniques is proposed with applications in several domains. In this paper, a lossless compression technique named Arithmetic coding is employed to compress DNA sequences. In order to validate the performance of the proposed model, the artificial genome dataset is used and the results are investigated interms of different evaluation parameters. Experiments were performed on artificial datasets and the compression performance of Arithmetic coding is compared to Huffman coding, LZW coding, and LZMA techniques. From simulation results, it is clear that the Arithmetic coding achieves significantly better compression with a compression ratio of 0.261 at the bit rate of 2.16 bpc.


Viruses ◽  
2020 ◽  
Vol 13 (1) ◽  
pp. 12
Author(s):  
Larissa Catharina Costa ◽  
Rafael Valente Veiga ◽  
Juliane Fonseca Oliveira ◽  
Moreno S. Rodrigues ◽  
Roberto F. S. Andrade ◽  
...  

Zika virus (ZIKV) became a worldwide public health emergency after its introduction in the Americas. Brazil was implicated as central in the ZIKV dispersion, however, a better understanding of the pathways the virus took to arrive in Brazil and the dispersion within the country is needed. An updated genome dataset was assembled with publicly available data. Bayesian phylogeography methods were applied to reconstruct the spatiotemporal history of ZIKV in the Americas and with more detail inside Brazil. Our analyses reconstructed the Brazilian state of Pernambuco as the likely point of introduction of ZIKV in Brazil, possibly during the 2013 Confederations Cup. Pernambuco played an important role in spreading the virus to other Brazilian states. Our results also underscore the long cryptic circulation of ZIKV in all analyzed locations in Brazil. Conclusions: This study brings new insights about the early moments of ZIKV in the Americas, especially regarding the Brazil-Haiti cluster at the base of the American clade and describing for the first time migration patterns within Brazil.


Nature ◽  
2020 ◽  
Vol 587 (7833) ◽  
pp. 252-257 ◽  
Author(s):  
Shaohong Feng ◽  
Josefin Stiller ◽  
Yuan Deng ◽  
Joel Armstrong ◽  
Qi Fang ◽  
...  

AbstractWhole-genome sequencing projects are increasingly populating the tree of life and characterizing biodiversity1–4. Sparse taxon sampling has previously been proposed to confound phylogenetic inference5, and captures only a fraction of the genomic diversity. Here we report a substantial step towards the dense representation of avian phylogenetic and molecular diversity, by analysing 363 genomes from 92.4% of bird families—including 267 newly sequenced genomes produced for phase II of the Bird 10,000 Genomes (B10K) Project. We use this comparative genome dataset in combination with a pipeline that leverages a reference-free whole-genome alignment to identify orthologous regions in greater numbers than has previously been possible and to recognize genomic novelties in particular bird lineages. The densely sampled alignment provides a single-base-pair map of selection, has more than doubled the fraction of bases that are confidently predicted to be under conservation and reveals extensive patterns of weak selection in predominantly non-coding DNA. Our results demonstrate that increasing the diversity of genomes used in comparative studies can reveal more shared and lineage-specific variation, and improve the investigation of genomic characteristics. We anticipate that this genomic resource will offer new perspectives on evolutionary processes in cross-species comparative analyses and assist in efforts to conserve species.


Plants ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1523
Author(s):  
Wei Gou ◽  
Sheng-Bin Jia ◽  
Megan Price ◽  
Xian-Lin Guo ◽  
Song-Dong Zhou ◽  
...  

Hansenia Turcz., Haplosphaera Hand.-Mazz. and Sinodielsia H.Wolff are three Apiaceae genera endemic to the Hengduan Mountains and the Himalayas, which usually inhabit elevations greater than 2000 m. The phylogenetic relationships between and within the genera were uncertain, especially the placement of Hap. himalayensis and S. microloba. Therefore, we aimed to conduct comparative (simple sequence repeat (SSR) structure, codon usage bias, nucleotide diversity (Pi) and inverted repeat (IR) boundaries) and phylogenetic analyses of Hansenia, Haplosphaera and Sinodielsia (also compared with Chamaesium and Bupleurum) to reduce uncertainties in intergeneric and interspecific relationships. We newly assembled eight plastid genomes from Hansenia, Haplosphaera and Sinodielsia species, and analyzed them with two plastid genomes from GenBank of Hap. phaea,S. yunnanensis. Phylogenetic analyses used these ten genomes and another 22 plastid genome sequences of Apiaceae. We found that the newly assembled eight genomes ranged from 155,435 bp to 157,797 bp in length and all had a typical quadripartite structure. Fifty-five to 75 SSRs were found in Hansenia, Haplosphaera and Sinodielsia species, and the most abundant SSR was mononucleotide, which accounted for 58.47% of Hansenia, 60.21% of Haplosphaera and 48.01% of Sinodielsia. There was no evident divergence of codon usage frequency between the three genera, where codons ranged from 21,134 to 21,254. The Pi analysis showed that trnE(UUC)-trnT(GGU), trnH(GUG)-psbA and trnE(UUC)-trnT(GGU) spacer regions had the highest Pi values in the plastid genomes of Hansenia (0.01889), Haplosphaera (0.04333) and Sinodielsia (0.01222), respectively. The ndhG-ndhI spacer regions were found in all three genera to have higher diversity values (Pi values: 0.01028–0.2), and thus may provide potential DNA barcodes in phylogenetic analysis. IR boundary analysis showed that the length of rps19 and ycf1 genes entering IRs were usually stable in the same genus. Our phylogenetic tree demonstrated that Hap. himalayensis is sister to Han. weberbaueriana; meanwhile, Haplosphaera and Hansenia are nested together in the East Asia clade, and S. microloba is nested within individuals of S. yunnanensis in the Acronema clade. This study will enrich the complete plastid genome dataset of the Apiaceae genera and has provided a new insight into phylogeny reconstruction using complete plastid genomes of Hansenia, Haplosphaera and Sinodielsia.


Author(s):  
Robert Vaser ◽  
Mile Šikić

We present new methods for the improvement of long-read de novo genome assembly incorporated into a straightforward tool called Raven (https://github.com/lbcb-sci/raven). Compared with other assemblers, Raven is one of two fastest, it reconstructs the sequenced genome in the least amount of fragments, has better or comparable accuracy, and maintains similar performance for various genomes. Raven takes 500 CPU hours to assemble a 44x human genome dataset in only 259 fragments.


Sign in / Sign up

Export Citation Format

Share Document