reference genome
Recently Published Documents


TOTAL DOCUMENTS

1065
(FIVE YEARS 698)

H-INDEX

51
(FIVE YEARS 14)

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Ping Lin ◽  
Kailiang Wang ◽  
Yupeng Wang ◽  
Zhikang Hu ◽  
Chao Yan ◽  
...  

Abstract Background As a perennial crop, oil-Camellia possesses a long domestication history and produces high-quality seed oil that is beneficial to human health. Camellia oleifera Abel. is a sister species to the tea plant, which is extensively cultivated for edible oil production. However, the molecular mechanism of the domestication of oil-Camellia is still limited due to the lack of sufficient genomic information. Results To elucidate the genetic and genomic basis of evolution and domestication, here we report a chromosome-scale reference genome of wild oil-Camellia (2.95 Gb), together with transcriptome sequencing data of 221 cultivars. The oil-Camellia genome, assembled by an integrative approach of multiple sequencing technologies, consists of a large proportion of repetitive elements (76.1%) and high heterozygosity (2.52%). We construct a genetic map of high-density corrected markers by sequencing the controlled-pollination hybrids. Genome-wide association studies reveal a subset of artificially selected genes that are involved in the oil biosynthesis and phytohormone pathways. Particularly, we identify the elite alleles of genes encoding sugar-dependent triacylglycerol lipase 1, β-ketoacyl-acyl carrier protein synthase III, and stearoyl-acyl carrier protein desaturases; these alleles play important roles in enhancing the yield and quality of seed oil during oil-Camellia domestication. Conclusions We generate a chromosome-scale reference genome for oil-Camellia plants and demonstrate that the artificial selection of elite alleles of genes involved in oil biosynthesis contributes to oil-Camellia domestication.


Author(s):  
Jingxuan Chen ◽  
David J. Garfinkel ◽  
Casey M. Bergman

Here, we report a long-read genome assembly for Saccharomyces uvarum strain CBS 7001 based on PacBio whole-genome shotgun sequence data. Our assembly provides an improved reference genome for an important yeast in the Saccharomyces sensu stricto clade.


2022 ◽  
Vol 2022 ◽  
pp. 1-12
Author(s):  
Motonori Tomita ◽  
Ryotaro Tokuyama ◽  
Shosuke Matsumoto ◽  
Kazuo Ishii

We identified the key genes controlling the late maturation of the Japonica cultivar Isehikari, which was found at Ise Jingu Shrine and matures 6 days later than Koshihikari. We conducted a genetics-based approach through this study. First, the latest mature plants, which flowered later than Isehikari, were segregated in the F2 and F3 generations of Koshihikari×Isehikari. Next, the linkage relationship of a single late-maturing gene with the SSR markers on the long arm of chromosome 3 was inferred by using late-maturing homozygous F2 segregants. Moreover, genetic analyses of late maturity were conducted through the process of six times of continuous backcross with Koshihikari as a recurrent parent by using the late-maturing homozygous F3 line as a nonrecurrent parent, thus developing a late-maturing isogenic Koshihikari (BC6F2). As a result, we elucidated a single late-maturing gene with incomplete dominance that caused the 14-day maturation delay of Koshihikari. The whole-genome sequencing was conducted on both of Koshihikari and the late-maturing isogenic Koshihikari. Then, the SNP call was conducted as the reference genome of Koshihikari. Finally, a single SNP was identified in the key gene Hd16 of the late-maturing isogenic Koshihikari.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Atul Sharma ◽  
Pranjal Jain ◽  
Ashraf Mahgoub ◽  
Zihan Zhou ◽  
Kanak Mahadik ◽  
...  

Abstract Background Sequencing technologies are prone to errors, making error correction (EC) necessary for downstream applications. EC tools need to be manually configured for optimal performance. We find that the optimal parameters (e.g., k-mer size) are both tool- and dataset-dependent. Moreover, evaluating the performance (i.e., Alignment-rate or Gain) of a given tool usually relies on a reference genome, but quality reference genomes are not always available. We introduce Lerna for the automated configuration of k-mer-based EC tools. Lerna first creates a language model (LM) of the uncorrected genomic reads, and then, based on this LM, calculates a metric called the perplexity metric to evaluate the corrected reads for different parameter choices. Next, it finds the one that produces the highest alignment rate without using a reference genome. The fundamental intuition of our approach is that the perplexity metric is inversely correlated with the quality of the assembly after error correction. Therefore, Lerna leverages the perplexity metric for automated tuning of k-mer sizes without needing a reference genome. Results First, we show that the best k-mer value can vary for different datasets, even for the same EC tool. This motivates our design that automates k-mer size selection without using a reference genome. Second, we show the gains of our LM using its component attention-based transformers. We show the model’s estimation of the perplexity metric before and after error correction. The lower the perplexity after correction, the better the k-mer size. We also show that the alignment rate and assembly quality computed for the corrected reads are strongly negatively correlated with the perplexity, enabling the automated selection of k-mer values for better error correction, and hence, improved assembly quality. We validate our approach on both short and long reads. Additionally, we show that our attention-based models have significant runtime improvement for the entire pipeline—18$$\times$$ × faster than previous works, due to parallelizing the attention mechanism and the use of JIT compilation for GPU inferencing. Conclusion Lerna improves de novo genome assembly by optimizing EC tools. Our code is made available in a public repository at: https://github.com/icanforce/lerna-genomics.


2022 ◽  
Author(s):  
Karl Johan Westrin ◽  
Warren W Kretzschmar ◽  
Olof Emanuelsson

Motivation: Transcriptome assembly from RNA sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate reconstruction ability of transcript isoforms. This impedes the study of alternative splicing, in particular for lowly expressed isoforms. Result: We present the de novo transcript isoform assembler ClusTrast, which clusters a set of guiding contigs by similarity, aligns short reads to the guiding contigs, and assembles each clustered set of short reads individually. We tested ClusTrast on datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. An appreciable fraction were reconstructed to at least 95% of their length. We suggest that ClusTrast will be useful for studying alternative splicing in the absence of a reference genome. Availability and implementation: The code and usage instructions are available at https://github.com/karljohanw/clustrast.


2021 ◽  
Author(s):  
Ran Li ◽  
Mian Gong ◽  
Xinmiao Zhang ◽  
Fei Wang ◽  
Zhenyu Liu ◽  
...  

Structural variations (SVs) are a major contributor of genetic diversity and phenotypic variations, however their prevalence and functions in domestic animals are largely unexplored. Here, we assembled 26 haplotype-resolved genome assemblies from 13 genetically diverse sheep breeds using PacBio HiFi sequencing. We then constructed an ovine graph pan-genome and demonstrated its advantage in discovering 142,593 biallelic SVs (Insertions and deletions), 7,028 divergent alleles and 13,419 multiallelic variations with high accuracy and sensitivity. To link the SVs to genotypes, we genotyped the SVs in 687 resequenced individuals of domestic and wild sheep using a graph-based approach and identified numerous population-stratified variants, of which expression-associated SVs were detected by integrating RNA-seq data. Taking the varying sheep tail morphology as example, we located a putative causative insertion in HOXB13 gene responsible for the long tail and reported multiple large SVs associated with the fat tail. Beyond generating a benchmark resource for ovine structural variants, our study also highlighted that the population genetics analysis based on graph pan-genome rather than reference genome will greatly benefit the animal genetic research.


2021 ◽  
Author(s):  
István Csabai ◽  
Krisztián Papp ◽  
Dávid Visontai ◽  
József Stéger ◽  
Norbert Solymosi

Abstract The COVID-19 pandemic has been going on for two years now and although many hypotheses have been put forward, its origin remain obscure. We investigated whether the huge public sequencing data archives’ samples collected earlier than the earliest known cases of the pandemic might contain traces of SARS-CoV-2. Here we report the bioinformatic analysis of a metagenome sample set collected from soil on King George Island, Antarctica between 2018-12-24 and 2019-01-13. It contains sequence fragments matching the SARS-CoV-2 reference genome with altogether more than half million nucleotides, covering the complete genome on average 17×. Preliminary phylogeny analysis places the sample close to the known earliest cases. The high sequence coverage rules out chance alignments from other species but possible laboratory contamination cannot be excluded. The sequence harbours a unique combination of mutations, unseen in other samples, so whatever its origin, it can add important piece of information to the puzzle of the ongoing pandemic.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yulin Bai ◽  
Jie Gong ◽  
Zhixiong Zhou ◽  
Bijun Li ◽  
Ji Zhao ◽  
...  

The Rock Bream (Oplegnathus fasciatus) is an economically important rocky reef fish of the Northwest Pacific Ocean. In recent years, it has been cultivated as an important edible fish in coastal areas of China. Despite its economic importance, genome-wide adaptions of domesticated O. fasciatus are largely unknown. Here we report a chromosome-level reference genome of female O. fasciatus (from the southern population in the subtropical region) using the PacBio single molecule sequencing technique (SMRT) and High-through chromosome conformation capture (Hi-C) technologies. The genome was assembled into 120 contigs with a total length of 732.95 Mb and a contig N50 length of 27.33 Mb. After chromosome-level scaffolding, 24 chromosomes with a total length of 723.22 Mb were constructed. Moreover, a total of 27,015 protein-coding genes and 5,880 ncRNAs were annotated in the reference genome. This reference genome of O. fasciatus will provide an important resource not only for basic ecological and population genetic studies but also for dissect artificial selection mechanisms in marine aquaculture.


2021 ◽  
Author(s):  
Matej Lexa ◽  
Monika Cechova ◽  
Son Hoang Nguyen ◽  
Pavel Jedlicka ◽  
Viktor Tokan ◽  
...  

The role of repetitive DNA in the 3D organization of the interphase nucleus in plant cells is a subject of intensive study. High-throughput chromosome conformation capture (Hi-C) is a sequencing-based method detecting the proximity of DNA segments in nuclei. We combined Hi-C data, plant reference genome data and tools for the characterization of genomic repeats to build a Nextflow pipeline identifying and quantifying the contacts of specific repeats revealing the preferential homotypic interactions of ribosomal DNA, DNA transposons and some LTR retrotransposon families. We provide a novel way to analyze the organization of repetitive elements in the 3D nucleus.


2021 ◽  
Author(s):  
Cristian Cuevas Caballe ◽  
Joan Ferrer Obiol ◽  
Joel Vizueta ◽  
Meritxell Genovart ◽  
Jacob Gonzalez-Solis ◽  
...  

The Balearic shearwater (Puffinus mauretanicus) is the most threatened seabird in Europe. The fossil record suggests that human colonisation of the Balearic Islands resulted in a sharp decrease of the population size. Currently, populations continue to be decimated mainly due to predation by introduced mammals and bycatch in longline fisheries, and some studies predict their extinction by 2070. We present the first high-quality reference genome for the species which was obtained by a combination of short and long-read sequencing. Our hybrid assembly includes 4,169 scaffolds, with a scaffold N50 of 2.1 Mbp, a genome length of 1.2 Gbp, and BUSCO completeness of 96%, which is amongst the highest across sequenced avian species. This reference genome allowed us to study critical aspects relevant to the conservation status of the species, such as an evaluation of overall heterozygosity levels and the reconstruction of its historical demography. Our phylogenetic analysis using whole-genome information resolves current uncertainties in the order Procellariiformes systematics. Comparative genomics analyses uncover a set of candidate genes that may have played an important role into the adaptation to a pelagic lifestyle of Procellariiformes, including those for the enhancement of fishing capabilities, night vision and the development of natriuresis. This reference genome will be the keystone for future developments of genetic tools in conservation efforts for this Critically Endangered species.


Sign in / Sign up

Export Citation Format

Share Document