scholarly journals GSER (a Genome Size Estimator using R): a pipeline for quality assessment of sequenced genome libraries through genome size estimation

2021 ◽  
Vol 11 (4) ◽  
pp. 20200077 ◽  
Author(s):  
Braulio Valdebenito-Maturana ◽  
Gonzalo Riadi

The first step in any genome research after obtaining the read data is to perform a due quality control of the sequenced reads. In a de novo genome assembly project, the second step is to estimate two important features, the genome size and ‘best k -mer’, to start the assembly tests with different de novo assembly software and its parameters. However, the quality control of the sequenced genome libraries as a whole, instead of focusing on the reads only, is frequently overlooked and realized to be important only when the assembly tests did not render the expected results. We have developed GSER, a Genome Size Estimator using R, a pipeline to evaluate the relationship between k -mers and genome size, as a means for quality assessment of the sequenced genome libraries. GSER generates a set of charts that allow the analyst to evaluate the library datasets before starting the assembly. The script which runs the pipeline can be downloaded from http://www.mobilomics.org/GSER/downloads or http://github.com/mobilomics/GSER .

2020 ◽  
Vol 9 (37) ◽  
Author(s):  
Samuel O’Donnell ◽  
Frederic Chaux ◽  
Gilles Fischer

ABSTRACT The current Chlamydomonas reinhardtii reference genome remains fragmented due to gaps stemming from large repetitive regions. To overcome the vast majority of these gaps, publicly available Oxford Nanopore Technology data were used to create a new reference-quality de novo genome assembly containing only 21 contigs, 30/34 telomeric ends, and a genome size of 111 Mb.


2021 ◽  
Author(s):  
Stephanie H Chen ◽  
Maurizio Rossetto ◽  
Marlien van der Merwe ◽  
Patricia Lu-Irving ◽  
Jia-Yee S Yap ◽  
...  

Background: Telopea speciosissima, the New South Wales waratah, is Australian endemic woody shrub in the family Proteaceae. Waratahs have great potential as a model clade to better understand processes of speciation, introgression and adaptation, and are significant from a horticultural perspective. Findings: Here, we report the first chromosome-level reference genome for T. speciosissima. Combining Oxford Nanopore long-reads, 10x Genomics Chromium linked-reads and Hi-C data, the assembly spans 823 Mb (scaffold N50 of 69.0 Mb) with 91.2 % of Embryophyta BUSCOs complete. We introduce a new method in Diploidocus (https://github.com/slimsuite/diploidocus) for classifying, curating and QC-filtering assembly scaffolds. We also present a new tool, DepthSizer (https://github.com/slimsuite/depthsizer), for genome size estimation from the read depth of single copy orthologues and find that the assembly is 93.9 % of the estimated genome size. The largest 11 scaffolds contained 94.1 % of the assembly, conforming to the expected number of chromosomes (2n = 22). Genome annotation predicted 40,158 protein-coding genes, 351 rRNAs and 728 tRNAs. Our results indicate that the waratah genome is highly repetitive, with a repeat content of 62.3 %. Conclusions: The T. speciosissima genome (Tspe_v1) will accelerate waratah evolutionary genomics and facilitate marker assisted approaches for breeding. Broadly, it represents an important new genomic resource of Proteaceae to support the conservation of flora in Australia and further afield.


Author(s):  
Elverson Soares de Melo ◽  
Gabriel da Luz Wallau

AbstractTransposable elements (TEs) are a set of mobile elements within a genome. Due to their complexity, an in-depth TE characterization is only available for a handful of model organisms. In the present study, we performed a de novo and homology-based characterization of TEs in the genomes of 24 mosquito species and investigated their mode of inheritance. More than 40% of the genome of Aedes aegypti, Aedes albopictus, and Culex quinquefasciatus is composed of TEs, varying substantially among Anopheles species (0.13%–19.55%). Class I TEs are the most abundant among mosquitoes and at least 24 TE superfamilies were found. Interestingly, TEs have been continuously exchanged by horizontal transfer (212 TE families of 18 different superfamilies) among mosquitoes since 30 million years ago, representing around 6% of the genome in Aedes genomes and a small fraction in Anopheles genomes. Most of these horizontally transferred TEs are from the three ubiquitous LTR superfamilies: Gypsy, Bel-Pao and Copia. Searching more 32,000 genomes, we also uncover transfers between mosquitoes and two different Phyla—Cnidaria and Nematoda—and two subphyla—Chelicerata and Crustacea, identifying a vector, the worm Wuchereria bancrofti, that enabled the horizontal spread of a Tc1-mariner element of irritans subfamily among various Anopheles species. These data also allowed us to reconstruct the horizontal transfer network of this TE involving more than 40 species. In summary, our results suggest that TEs are constantly exchanged by common phenomena of horizontal transfers among mosquitoes, influencing genome variation and contributing to genome size expansion.Author SummaryMost eukaryotes have DNA fragments inside their genome that can multiply by inserting themselves in other regions of the genome, generating variability. These fragments are called Transposable Elements (TEs). Since they are a constituent part of the eukaryote genomes, these pieces of DNA are usually inherited vertically by the offspring. To avoid damage to the genome caused by the replication and insertion of TEs, organisms usually control them, leading to their inactivation. However, TEs sometimes get out of control and invade other species through a horizontal transfer mechanism. This dynamic is not known in mosquitoes, a group of organisms that acts as vectors of many human diseases. We collected mosquito genomes available in public databases and characterized the whole content of TEs. Using a statistic supported method, we investigate TE relations among mosquitoes and discover that horizontal transfers of transposons are common and occurred in the last 30 million years among these species. Although not as common as transfers among closely related species, transposon transfer to distant species also occur. We also identify a parasite, a filarial worm, that may have facilitated the transfer of TE to many mosquitoes. Together, horizontally transferred TEs contribute to increasing mosquito genome size and variation.


2018 ◽  
Author(s):  
Jesse Kerkvliet ◽  
Arthur de Fouchier ◽  
Michiel van Wijk ◽  
Astrid T. Groot

AbstractTranscriptome quality control is an important step in RNA-seq experiments. However, the quality of de novo assembled transcriptomes is difficult to assess, due to the lack of reference genome to compare the assembly to. We developed a method to assess and improve the quality of de novo assembled transcriptomes by focusing on the removal of chimeric sequences. These chimeric sequences can be the result of faulty assembled contigs, merging two transcripts into one. The developed method is incorporated into a pipeline, that we named Bellerophon, which is broadly applicable and easy to use. Bellerophon first uses the quality-assessment tool TransRate to indicate the quality, after which it uses a Transcripts Per Million (TPM) filter to remove lowly expressed contigs and CD-HIT-EST to remove highly identical contigs. To validate the quality of this method, we performed three benchmark experiments: 1) a computational creation of chimeras, 2) identification of chimeric contigs in a transcriptome assembly, 3) a simulated RNAseq experiment using a known reference transcriptome. Overall, the Bellerophon pipeline was able to remove between 40 to 91.9% of the chimeras in transcriptome assemblies and removed more chimeric than non-chimeric contigs. Thus, the Bellerophon sequence of filtration steps is a broadly applicable solution to improve transcriptome assemblies.


2020 ◽  
Vol 15 ◽  
Author(s):  
Dicle Yalcin ◽  
Hasan H. Otu

Background: Epigenetic repression mechanisms play an important role in gene regulation, specifically in cancer development. In many cases, a CpG island’s (CGI) susceptibility or resistance to methylation are shown to be contributed by local DNA sequence features. Objective: To develop unbiased machine learning models–individually and combined for different biological features–that predict the methylation propensity of a CGI. Methods: We developed our model consisting of CGI sequence features on a dataset of 75 sequences (28 prone, 47 resistant) representing a genome-wide methylation structure. We tested our model on two independent datasets that are chromosome (132 sequences) and disease (70 sequences) specific. Results: We provided improvements in prediction accuracy over previous models. Our results indicate that combined features better predict the methylation propensity of a CGI (area under the curve (AUC) ~0.81). Our global methylation classifier performs well on independent datasets reaching an AUC of ~0.82 for the complete model and an AUC of ~0.88 for the model using select sequences that better represent their classes in the training set. We report certain de novo motifs and transcription factor binding site (TFBS) motifs that are consistently better in separating prone and resistant CGIs. Conclusion: Predictive models for the methylation propensity of CGIs lead to a better understanding of disease mechanisms and can be used to classify genes based on their tendency to contain methylation prone CGIs, which may lead to preventative treatment strategies. MATLAB and Python™ scripts used for model building, prediction, and downstream analyses are available at https://github.com/dicleyalcin/methylProp_predictor.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 246
Author(s):  
Xiaomeng Chen ◽  
Rui Li ◽  
Yonglin Wang ◽  
Aining Li

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.


Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 563
Author(s):  
Monika Rewers ◽  
Iwona Jedrzejczyk ◽  
Agnieszka Rewicz ◽  
Anna Jakubska-Busse

Orchidaceae is one of the largest and the most widespread plant families with many species threatened with extinction. However, only about 1.5% of orchids’ genome sizes have been known so far. The aim of this study was to estimate the genome size of 15 species and one infraspecific taxon of endangered and protected orchids growing wild in Poland to assess their variability and develop additional criterion useful in orchid species identification and characterization. Flow cytometric genome size estimation revealed that investigated orchid species possessed intermediate, large, and very large genomes. The smallest 2C DNA content possessed Liparis loeselii (14.15 pg), while the largest Cypripedium calceolus (82.10 pg). It was confirmed that the genome size is characteristic to the subfamily. Additionally, for four species Epipactis albensis, Ophrys insectifera, Orchis mascula, Orchis militaris and one infraspecific taxon, Epipactis purpurata f. chlorophylla the 2C DNA content has been estimated for the first time. Genome size estimation by flow cytometry proved to be a useful auxiliary method for quick orchid species identification and characterization.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Huihui Li ◽  
Mingzhe Xie ◽  
Yan Wang ◽  
Ludong Yang ◽  
Zhi Xie ◽  
...  

AbstractriboCIRC is a translatome data-oriented circRNA database specifically designed for hosting, exploring, analyzing, and visualizing translatable circRNAs from multi-species. The database provides a comprehensive repository of computationally predicted ribosome-associated circRNAs; a manually curated collection of experimentally verified translated circRNAs; an evaluation of cross-species conservation of translatable circRNAs; a systematic de novo annotation of putative circRNA-encoded peptides, including sequence, structure, and function; and a genome browser to visualize the context-specific occupant footprints of circRNAs. It represents a valuable resource for the circRNA research community and is publicly available at http://www.ribocirc.com.


Author(s):  
Dörte Schmidt

Abstract The article discusses how new developments in the notation of contemporary music were negotiated within the framework of the Darmstadt Summer Courses and which interests and actors played a role in this. The first part examines the publications and publication projects that emerged in the context of the Notation conference in 1964. The focus is on the interests of institutions such as the International Music Council and the International Association of Music Libraries, in whose name the New York publisher Kurt Stone attempted to persuade the International Music Institute Darmstadt to cooperate and, following on from the debates there, to systematically record various forms of notation together. In a second step, the content of the debates at the conference is examined, with a particular focus on the different and sometimes conflicting perspectives of interpreters and composers. Numerous connections to fundamental aesthetic discussions of the time can be worked out, in particular to the relationship between the composer’s intention and interpretation, which was renegotiated in a form of notation that was individualized to the extreme. Finally, with a view to later discussions, this topic is pointed to the question of the relationship between morphology and musical structure, exemplified by positions of Wolfgang Rihm (1982), Klaus Huber (1988) and John Cage (1990).


Sign in / Sign up

Export Citation Format

Share Document