high quality sequence
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 22)

H-INDEX

7
(FIVE YEARS 2)

Author(s):  
Lin Ma ◽  
Xiao Wang ◽  
Min Yan ◽  
Fang Liu ◽  
Shuxing Zhang ◽  
...  

Abstract Background Common vetch (Vicia sativa L.) is an annual legume with excellent suitability in cold and dry regions. Despite its great applied potential, the genomic information regarding common vetch currently remains unavailable. Methods and results In the present study, the whole genome survey of common vetch was performed using the next-generation sequencing (NGS). A total of 79.84 Gbp high quality sequence data were obtained and assembled into 3,754,145 scaffolds with an N50 length of 3556 bp. According to the K-mer analyses, the genome size, heterozygosity rate and GC content of common vetch genome were estimated to be 1568 Mbp, 0.4345 and 35%, respectively. In addition, a total of 76,810 putative simple sequence repeats (SSRs) were identified. Among them, dinucleotide was the most abundant SSR type (44.94%), followed by Tri- (35.82%), Tetra- (13.22%), Penta- (4.47%) and Hexanucleotide (1.54%). Furthermore, a total of 58,175 SSR primer pairs were designed and ten of them were validated in Chinese common vetch. Further analysis showed that Chinese common vetch harbored high genetic diversity and could be clustered into two main subgroups. Conclusion This is the first report about the genome features of common vetch, and the information will help to design whole genome sequencing strategies. The newly identified SSRs in this study provide basic molecular markers for germplasm characterization, genetic diversity and QTL mapping studies for common vetch.


2021 ◽  
Vol 43 (3) ◽  
pp. 1282-1292
Author(s):  
Tianyan Yang ◽  
Xinxin Huang ◽  
Zijun Ning ◽  
Tianxiang Gao

Harpadon nehereus forms one of the most important commercial fisheries along the Bay of Bengal and the southeast coast of China. In this study, the genome-wide survey dataset first produced using next-generation sequencing (NGS) was used to provide general information on the genome size, heterozygosity and repeat sequence ratio of H. nehereus. About 68.74 GB of high-quality sequence data were obtained in total and the genome size was estimated to be 1315 Mb with the 17-mer frequency distribution. The sequence repeat ratio and heterozygosity were calculated to be 52.49% and 0.67%, respectively. A total of 1,027,651 microsatellite motifs were identified and dinucleotide repeat was the most dominant simple sequence repeat (SSR) motif with a frequency of 54.35%. As a by-product of whole genome sequencing, the mitochondrial genome is a powerful tool to investigate the evolutionary relationships between H. nehereus and its relatives. The maximum likelihood (ML) phylogenetic tree was constructed according to the concatenated matrix of amino acids translated from the 13 protein-coding genes (PCGs). Monophyly of two species of the genus Harpadon was revealed in the present study and they formed a monophyletic clade with Saurida with a high bootstrap value of 100%. The results would help to push back the frontiers of genomics and open the doors of molecular diversity as well as conservation genetics studies on this species.


2021 ◽  
Author(s):  
Michelle Waycott ◽  
Jent Kornelis van Dijk ◽  
Ed Biffin

Novel multi-gene targeted capture probes have been developed with the objective of obtaining multi-locus high quality sequence reads across any angiosperm lineage. Using existing genomic and transcriptomic data, two independent single assay probe/bait sets have been developed, the first targeting conserved exons from 20 low copy nuclear genes (OzBaits_NR V1.0) and the second, 19 plastid gene regions (OZBaits_CP V1.0). These universal bait sets can efficiently generate DNA sequence data that are suitable for systematics and evolutionary studies of flowering plants. The bait sets can be ordered as Daicel-Arbor Sciences custom myBaits. We demonstrate the utility of the bait set in consistently recovering the targeted genomic regions across an evolutionarily broad range of angiosperm taxa.


Author(s):  
Simon Lee ◽  
Loan T Nguyen ◽  
Ben J Hayes ◽  
Elizabeth M Ross

Abstract Motivation Trimming and filtering tools are useful in DNA sequencing analysis because they increase the accuracy of sequence alignments and thus the reliability of results. Oxford nanopore technologies (ONT) trimming and filtering tools are currently rudimentary, generally only filtering reads based on whole read average quality. This results in discarding reads that contain regions of high-quality sequence. Here, we propose Prowler, a trimmer that uses a window-based approach inspired by algorithms used to trim short read data. Importantly, we retain the phase and read length information by optionally replacing trimmed sections with Ns. Results Prowler was applied to mammalian and bacterial datasets, to assess its effect on alignment and assembly, respectively. Compared to data filtered with Nanofilt, alignments of data trimmed with Prowler had lower error rates and more mapped reads. Assemblies of Prowler trimmed data had a lower error rate than those filtered with Nanofilt; however, this came at some cost to assembly contiguity. Availability and implementation Prowler is implemented in Python and is available at https://github.com/ProwlerForNanopore/ProwlerTrimmer. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
lin ma ◽  
Xiao Wang ◽  
Min Yan ◽  
Fang Liu ◽  
Xuemin Wang

Abstract Common vetch (Vicia sativa L.) is an annual legume with excellent suitability in cold and dry regions. Despite its great applied potential, the genomic information regarding common vetch currently remains unavailable. In the present study, the whole genome survey of common vetch was performed using the next-generation sequencing (NGS). A total of 79.84 Gbp high quality sequence data were obtained and assembled into 3,754,145 scaffolds with an N50 length of 3,556 bp. According to the K-mer analyses, the genome size, heterozygosity rate and GC content of common vetch genome were estimated to be 1,568 Mbp, 0.4345% and 35%, respectively. In addition, a total of 76,810 putative simple sequence repeats (SSRs) were identified. Among them, dinucleotide was the most abundant SSR type (44.94%), followed by Tri- (35.82%), Tetra- (13.22%), Penta- (4.47%) and Hexanucleotide (1.54%). Furthermore, a total of 58,175 SSR primer pairs were designed and ten of them were validated in Chinese common vetch. Further analysis showed that Chinese common vetch harbored high genetic diversity and could be clustered into two main subgroups. This is the first report about the genome features of common vetch, and the information will help to design whole genome sequencing strategies. The newly identified SSRs in this study provide basic molecular markers for germplasm characterization, genetic diversity and QTL mapping studies for common vetch.


2021 ◽  
Author(s):  
Bikash Shrestha ◽  
Badri Adhikari

Background: A high-quality sequence alignment (SA) is the most important input feature for accurate protein structure prediction. For a protein sequence, there are many methods to generate a SA. However, when given a choice of more than one SA for a protein sequence, there are no methods to predict which SA may lead to more accurate models without actually building the models. In this work, we describe a method to predict the quality of a protein's SA. Methods: We created our own dataset by generating a variety of SAs for a set of 1,351 representative proteins and investigated various deep learning architectures to predict the local distance difference test (lDDT) scores of distance maps predicted with SAs as the input. These lDDT scores serve as indicators of the quality of the SAs. Results: Using two independent test datasets consisting of CASP13 and CASP14 targets, we show that our method is effective for scoring and ranking SAs when a pool of SAs is available for a protein sequence. With an example, we further discuss that SA selection using our method can lead to improved structure prediction.


2021 ◽  
Author(s):  
Simon Lee ◽  
Loan T. Nguyen ◽  
Ben J. Hayes ◽  
Elizabeth M Ross

Motivation: Quality control (QC) tools are critical in DNA sequencing analysis because they increase the accuracy of sequence alignments and thus the reliability of results. Oxford Nanopore Technologies (ONT) QC is currently rudimentary, generally based on whole read average quality. This results in discarding reads that contain regions of high quality sequence. Here we propose Prowler, a multi-window approach inspired by algorithms used to QC short read data. Importantly, we retain the phase and read length information by optionally replacing trimmed sections with Ns. Results: Prowler was applied to mammalian and bacterial datasets, to assess effects on alignment and assembly respectively. Compared to Nanofilt, alignments of data QCed with Prowler had lower error rates and more mapped reads. Assemblies of Prowler QCed data had a lower error rate than Nanofilt QCed data however this came at some cost to assembly contiguity. Availability and implementation: Prowler is implemented in Python and is available at: https://github.com/ProwlerForNanopore/ProwlerTrimmer Contact: [email protected]


Author(s):  
Vera Pavese ◽  
Emile Cavalet Giorsa ◽  
Lorenzo Barchi ◽  
Alberto Acquadro ◽  
Daniela Torello Marinoni ◽  
...  

Abstract The European hazelnut (Corylus avellana L.; 2n=2x=22) is a worldwide economically important tree nut that is cross-pollinated due to sporophytic incompatibility. Therefore, any individual plant is highly heterozygous. Cultivars are clonally propagated using mound layering, rooted suckers and micropropagation. In recent years, the interest in this crop has increased, due to a growing demand related to the recognized health benefits of nut consumption. C. avellana cv ‘Tonda Gentile delle Langhe’ (‘TGdL’) is well-known for its high kernel quality, and the premium price paid for this cultivar is an economic benefit for producers in northern Italy. Assembly of a high-quality genome is a difficult task in many plant species because of the high level of heterozygosity. We assembled a chromosome-level genome sequence of ′TGdL′ with a two-step approach. First, 10X Genomics Chromium Technology was used to create a high-quality sequence, which was then assembled into scaffolds with cv ′Tombul′ genome as the reference. Eleven pseudomolecules were obtained, corresponding to 11 chromosomes. A total of 11,046 scaffolds remained unplaced, representing 11% of the genome (46,504,161 bp). Gene prediction, performed with Maker-P software, identified 27,791 genes (AED ≤ 0.4 and 92% of BUSCO completeness), whose function was analysed with BlastP and InterProScan software. To characterise ‘TGdL’ specific genetic mechanisms, Orthofinder was used to detect orthologs between hazelnut and closely related species. The ‘TGdL’ genome sequence is expected to be a powerful tool to understand hazelnut genetics and allow detection of markers/genes for important traits to be used in targeted breeding programs.


2021 ◽  
pp. 290-302
Author(s):  
Amitha Mithra V. Sevanthi ◽  
Prashant Kale ◽  
Chandra Prakash ◽  
M. K. Ramkumar ◽  
Neera Yadav ◽  
...  

Abstract The Indian initiative for creating mutant resources in rice has generated 87,000 mutants in the background of a popular drought- and heat-tolerant upland cultivar, Nagina 22 (N22), through EMS mutagenesis. So far, 541 macro-mutants from this resource have been identified, maintained in the mutant garden and characterized in detail based on 44 descriptors pertaining to distinctness, uniformity and stability (DUS) of rice and other agronomic parameters. The similarity index of the mutants was more than 0.6 for nearly 90% of the mutants with respect to DUS descriptors, further establishing the validity of the mutants. The available high-quality sequence resource of N22 has been improved by reducing the gaps by 0.02% in the coding sequence (CDS) region. This was made possible using the newly synthesized whole-genome data of N22 which helped to remove 9006 'Ns' and replace 12,746 existing nucleotides with the accurate ones. These sequence and morphological details have been updated in the mutant database 'EMSgardeN22'. Further, 1058 mutants have been identified for low-P tolerance, tolerance to sheath blight, blast, drought, heat, higher photosynthetic efficiency and agronomic and root traits from this resource. A novel herbicide-tolerant (imazethapyr) mutant earlier identified and characterized from this resource is now being used in introgressing the herbicide-tolerant trait in eight major rice varieties in India. Further, robust and simpler screening systems have been tested for studying low-P tolerance of the mutants. A grain-size mutant, heat-tolerant mutant, drought-tolerant mutant, stay-green mutant and low-P tolerant and water-use efficient high-root-volume mutants have been characterized at morphological and molecular levels. A brief account of all these mutants, the entire mutant resource and the elaborate trait-based screenings is presented in this chapter.


2020 ◽  
Vol 25 (4) ◽  
Author(s):  
Michael J. Kamdar ◽  
J. William Efcavitch

This article provides an overview of the emerging technology of enzymatic DNA synthesis, which holds the promise of making the business of writing DNA cost-effective, faster, sustainable, and more accurate compared to the traditional DNA synthesis method of phosphoramidite chemistry. Enzymatic DNA synthesis lends itself to various business models to realize the enormous opportunities across established and emerging industries that can be transformed with the reliable and affordable creation of long, high-quality, sequence specific DNA or, in the case of DNA data storage, the template-independent creation of DNA in nontoxic solutions without the need for post-synthesis processing. This review includes a discussion of potential verticals, such as life sciences – which includes gene editing, synthetic biology, precision medicine, DNA nanotechnology, and RNA vaccine development – as well as DNA data storage. Enzymatic DNA synthesis is being rapidly advanced to a commercial reality, with the first enzymatically synthesized DNA products to enter the market in the next year.


Sign in / Sign up

Export Citation Format

Share Document