reference sequence Latest Research Papers

Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes

Frontiers in Genetics ◽

10.3389/fgene.2021.821715 ◽

2022 ◽

Vol 12 ◽

Author(s):

Aaron J. Robinson ◽

Hajnalka E. Daligault ◽

Julia M. Kelliher ◽

Erick S. LeBrun ◽

Patrick S. G. Chain

Keyword(s):

Ribosomal Rna ◽

National Institutes Of Health ◽

Reference Sequence ◽

Rrna Genes ◽

High Similarity ◽

Sequencing Data ◽

Bacterial Genomes ◽

Chloroplast Genomes ◽

Sequence Similarities ◽

Bacterial Sequence

Public sequencing databases are invaluable resources to biological researchers, but assessing data veracity as well as the curation and maintenance of such large collections of data can be challenging. Genomes of eukaryotic organelles, such as chloroplasts and other plastids, are particularly susceptible to assembly errors and misrepresentations in these databases due to their close evolutionary relationships with bacteria, which may co-occur within the same environment, as can be the case when sequencing plants. Here, based on sequence similarities with bacterial genomes, we identified several suspicious chloroplast assemblies present in the National Institutes of Health (NIH) Reference Sequence (RefSeq) collection. Investigations into these chloroplast assemblies reveal examples of erroneous integration of bacterial sequences into chloroplast ribosomal RNA (rRNA) loci, often within the rRNA genes, presumably due to the high similarity between plastid and bacterial rRNAs. The bacterial lineages identified within the examined chloroplasts as the most likely source of contamination are either known associates of plants, or co-occur in the same environmental niches as the examined plants. Modifications to the methods used to process untargeted ‘raw’ shotgun sequencing data from whole genome sequencing efforts, such as the identification and removal of bacterial reads prior to plastome assembly, could eliminate similar errors in the future.

Biochemical and structural characterization of quizalofop-resistant wheat acetyl-CoA carboxylase

Scientific Reports ◽

10.1038/s41598-021-04280-x ◽

2022 ◽

Vol 12 (1) ◽

Author(s):

Raven Bough ◽

Franck E. Dayan

Keyword(s):

Amino Acid ◽

Winter Wheat ◽

Amino Acid Substitution ◽

Reference Sequence ◽

Cross Resistance ◽

Homozygous Mutation ◽

Acetyl Coa Carboxylase ◽

Acetyl Coa ◽

Nucleotide Mutation

AbstractA novel nucleotide mutation in ACC1 resulting in an alanine to valine amino acid substitution in acetyl-CoA carboxylase (ACCase) at position 2004 of the Alopecurus myosuroides reference sequence (A2004V) imparts quizalofop resistance in wheat. Genotypes endowed with the homozygous mutation in one or two ACC1 homoeologs are seven- and 68-fold more resistant to quizalofop than a wildtype winter wheat in greenhouse experiments, respectively. In vitro ACCase activities in soluble protein extracts from these varieties are 3.8- and 39.4-fold more resistant to quizalofop with the homozygous mutation in either one or two genomes, relative to the wildtype. The A2004V mutation does not alter the specific activity of wheat ACCase, suggesting that this resistance trait does not affect the catalytic functions of ACCase. Modeling of wildtype and quizalofop-resistant wheat ACCase demonstrates that the A2004V amino acid substitution causes a reduction in the volume of the binding pocket that hinders quizalofop’s interaction with ACCase. Docking studies confirm that the mutation reduces the binding affinity of quizalofop. Interestingly, the models suggest that the A2004V mutation does not affect haloxyfop binding. Follow up in vivo and in vitro experiments reveal that the mutation, in fact, imparts negative cross-resistance to haloxyfop, with quizalofop-resistant varieties exhibiting higher sensitivity to haloxyfop than the wildtype winter wheat line.

Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence

Journal of Molecular Diagnostics ◽

10.1016/j.jmoldx.2021.10.013 ◽

2022 ◽

Author(s):

Christopher A. Miller ◽

Jason R. Walker ◽

Travis L. Jensen ◽

William F. Hooper ◽

Robert S. Fulton ◽

...

Keyword(s):

Reference Sequence

Application of Environmental DNA (eDNA) Metabarcoding Method to Identify Threatened Sulawesi Mammal Based on 12S rRNA Gene

HAYATI Journal of Biosciences ◽

10.4308/hjb.29.1.114-121 ◽

2021 ◽

Vol 29 (1) ◽

pp. 114-121

Author(s):

Bambang Suryobroto ◽

Ahmad Abdul Jabbar ◽

Puji Rianti

Keyword(s):

Species Identification ◽

High Throughput Sequencing ◽

Genetic Material ◽

Environmental Dna ◽

Reference Sequence ◽

12S Rrna ◽

Rrna Gene ◽

Mammal Species ◽

Terrestrial Mammals ◽

Salt Licks

Species detection and identification is a crucial steps in biodiversity assessment. Traditional methods are often invasive and resource intensive. The number of studies demonstrating successful of eDNA metabarcoding approach in species identification has increased rapidly in recent years. Some of large terrestrial mammals have reportedly utilize natural salt licks as a source of minerals in the diet and its genetic material left in the environment can be used to identify species from this site. An eDNA metabarcoding protocol had been carried out to identify Sulawesi mammals from Adudu natural salt-licks, Nantu Wildlife Reserve, Gorontalo. Environmental DNA were extracted from water samples, Amplicon libraries were prepared by PCR amplification and Illumina MiSeq high throughput sequencing. Reads processing and taxonomic assignment carried out in two bioinformatics packages, PipeCraft-1.0 and OBITools-2.11. Two endangered Sulawesi mammals species had been identified, i.e. lowland anoa (Bubalus depressicornis) and babirusa (Babyrousa babyrussa). The accuracy of mammal species identification using eDNA metabarcoding is affected by rigorous experimental procedures, DNA marker reliability, and availability of reference sequence database.

LABRADOR—A Computational Workflow for Virus Detection in High-Throughput Sequencing Data

Viruses ◽

10.3390/v13122541 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2541

Author(s):

Izabela Fabiańska ◽

Stefan Borutzki ◽

Benjamin Richter ◽

Hon Q. Tran ◽

Andreas Neubert ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

De Novo ◽

Virus Detection ◽

Third Party ◽

Reference Sequence ◽

Clinical Samples ◽

Sequencing Data

High-throughput sequencing (HTS) allows detection of known and unknown viruses in samples of broad origin. This makes HTS a perfect technology to determine whether or not the biological products, such as vaccines are free from the adventitious agents, which could support or replace extensive testing using various in vitro and in vivo assays. Due to bioinformatics complexities, there is a need for standardized and reliable methods to manage HTS generated data in this field. Thus, we developed LABRADOR—an analysis pipeline for adventitious virus detection. The pipeline consists of several third-party programs and is divided into two major parts: (i) direct reads classification based on the comparison of characteristic profiles between reads and sequences deposited in the database supported with alignment of to the best matching reference sequence and (ii) de novo assembly of contigs and their classification on nucleotide and amino acid levels. To meet the requirements published in guidelines for biologicals’ safety we generated a custom nucleotide database with viral sequences. We tested our pipeline on publicly available HTS datasets and showed that LABRADOR can reliably detect viruses in mixtures of model viruses, vaccines and clinical samples.

Reconstructing mitochondrial genomes from ancient DNA through iterative mapping: an evaluation of software, parameters, and bait reference

10.1101/2021.12.16.472923 ◽

2021 ◽

Author(s):

Michael Vincent Westbury ◽

Eline D Lorenzen

Keyword(s):

Ancient Dna ◽

Pairwise Distance ◽

Reference Sequence ◽

Sequence Length ◽

Close Relative ◽

Mitochondrial Genomes ◽

Phylogenetic Distance ◽

Extinct Species ◽

Mapping Software ◽

Dna 2

(1) Within evolutionary biology, mitochondrial genomes (mitogenomes) provide useful insights at both population and species level. Several approaches are available to assemble mitogenomes. However, most are not suitable for divergent, extinct species, due to the requirement of a reference mitogenome from a conspecific or close relative, and relatively high-quality DNA. (2) Iterative mapping can overcome the lack of a close reference sequence, and has been applied to an array of extinct species. Despite its widespread use, the accuracy of the reconstructed assemblies are yet to be comprehensively assessed. Here, we investigated the influence of mapping software (BWA or MITObim), parameters, and bait reference phylogenetic distance on the accuracy of the reconstructed assembly using two simulated datasets: (i) spotted hyena and various mammalian bait references, and (ii) southern cassowary and various avian bait references. Specifically, we assessed the accuracy of results through pairwise distance (PWD) to the reference conspecific mitogenome, number of incorrectly inserted base pairs (bp), and total length of the reconstructed assembly. (3) We found large discrepancies in the accuracy of reconstructed assemblies using different mapping software, parameters, and bait references. PWD to the reference conspecific mitogenome, which reflected the level of incorrect base calls, was consistently higher with BWA than MITObim. The same was observed for the number of incorrectly inserted bp. In contrast, the total sequence length was lower. Overall, the most accurate results were obtained with MITObim using mismatch values of 3 or 5, and the phylogenetically closest bait reference sequence. Accuracy could be further improved by combining results from multiple bait references. (4) We present the first comprehensive investigation of how mapping software, parameters, and bait reference influence mitogenome reconstruction from ancient DNA through iterative mapping. Our study provides information on how mitogenomes are best reconstructed from divergent, short-read data. By obtaining the most accurate reconstruction possible, one can be more confident as to the reliability of downstream analyses, and the evolutionary inferences made from them.

Identification and Validation of Stable Quantitative Trait Loci for SDS-Sedimentation Volume in Common Wheat (Triticum aestivum L.)

Frontiers in Plant Science ◽

10.3389/fpls.2021.747775 ◽

2021 ◽

Vol 12 ◽

Author(s):

Shuai Tian ◽

Minghu Zhang ◽

Jinghui Li ◽

Shaozhe Wen ◽

Chan Bi ◽

...

Keyword(s):

Sodium Dodecyl Sulfate ◽

Common Wheat ◽

Quantitative Trait ◽

Positional Cloning ◽

Sodium Dodecyl ◽

Snp Markers ◽

Reference Sequence ◽

Phenotypic Variance ◽

Dodecyl Sulfate ◽

Sedimentation Volume

Sodium dodecyl sulfate-sedimentation volume is an important index to evaluate the gluten strength of common wheat and is closely related to baking quality. In this study, a total of 15 quantitative trait locus (QTL) for sodium dodecyl sulfate (SDS)-sedimentation volume (SSV) were identified by using a high-density genetic map including 2,474 single-nucleotide polymorphism (SNP) markers, which was constructed with a doubled haploid (DH) population derived from the cross between Non-gda3753 (ND3753) and Liangxing99 (LX99). Importantly, four environmentally stable QTLs were detected on chromosomes 1A, 2D, and 5D, respectively. Among them, the one with the largest effect was identified on chromosome 1A (designated as QSsv.cau-1A.1) explaining up to 39.67% of the phenotypic variance. Subsequently, QSsv.cau-1A.1 was dissected into two QTLs named as QSsv.cau-1A.1.1 and QSsv.cau-1A.1.2 by saturating the genetic linkage map of the chromosome 1A. Interestedly, favorable alleles of these two loci were from different parents. Due to the favorable allele of QSsv.cau-1A.1.1 was from the high-value parents ND3753 and revealed higher genetic effect, which explained 25.07% of the phenotypic variation, mapping of this locus was conducted by using BC3F1 and BC3F2 populations. By comparing the CS reference sequence, the physical interval of QSsv.cau-1A.1.1 was delimited into 14.9 Mb, with 89 putative high-confidence annotated genes. SSVs of different recombinants between QSsv.cau-1A.1.1 and QSsv.cau-1A.1 detected from DH and BC3F2 populations showed that these two loci had an obvious additive effect, of which the combination of two favorable loci had the high SSV, whereas recombinants with unfavorable loci had the lowest. These results provide further insight into the genetic basis of SSV and QSsv.cau-1A.1.1 will be an ideal target for positional cloning and wheat breeding programs.

Envisioning the next human genome reference

Disease Models & Mechanisms ◽

10.1242/dmm.049426 ◽

2021 ◽

Vol 14 (12) ◽

Author(s):

Monkol Lek ◽

Elaine R. Mardis

Keyword(s):

Precision Medicine ◽

Human Genome ◽

Reference Sequence ◽

Ethnic Representation ◽

Genomic Studies ◽

Human Genome Reference Sequence ◽

Human Genome Reference

Summary: We provide an Editorial perspective on approaches to improve ethnic representation in the human genome reference sequence, enabling its widespread use in genomic studies and precision medicine to benefit all peoples.

Assessing Bos taurus introgression in the UOA Bos indicus assembly

Genetics Selection Evolution ◽

10.1186/s12711-021-00688-1 ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Maulana M. Naji ◽

Yuri T. Utsunomiya ◽

Johann Sölkner ◽

Benjamin D. Rosen ◽

Gábor Mészáros

Keyword(s):

Bos Taurus ◽

Sequence Data ◽

Variant Calling ◽

Principal Component ◽

Reference Sequence ◽

Sequencing Analysis ◽

Single Nucleotide Variants ◽

Reference Allele ◽

Brahman Cattle ◽

Reference Genomes

Abstract Background Reference genomes are essential in the analysis of genomic data. As the cost of sequencing decreases, multiple reference genomes are being produced within species to alleviate problems such as low mapping accuracy and reference allele bias in variant calling that can be associated with the alignment of divergent samples to a single reference individual. The latest reference sequence adopted by the scientific community for the analysis of cattle data is ARS_UCD1.2, built from the DNA of a Hereford cow (Bos taurus taurus—B. taurus). A complementary genome assembly, UOA_Brahman_1, was recently built to represent the other cattle subspecies (Bos taurus indicus—B. indicus) from a Brahman cow haplotype to further support analysis of B. indicus data. In this study, we aligned the sequence data of 15 B. taurus and B. indicus breeds to each of these references. Results The alignment of B. taurus individuals against UOA_Brahman_1 detected up to five million more single-nucleotide variants (SNVs) compared to that against ARS_UCD1.2. Similarly, the alignment of B. indicus individuals against ARS_UCD1.2 resulted in one and a half million more SNVs than that against UOA_Brahman_1. The number of SNVs with nearly fixed alternative alleles also increased in the alignments with cross-subspecies. Interestingly, the alignment of B. taurus cattle against UOA_Brahman_1 revealed regions with a smaller than expected number of counts of SNVs with nearly fixed alternative alleles. Since B. taurus introgression represents on average 10% of the genome of Brahman cattle, we suggest that these regions comprise taurine DNA as opposed to indicine DNA in the UOA_Brahman_1 reference genome. Principal component and admixture analyses using genotypes inferred from this region support these taurine-introgressed loci. Overall, the flagged taurine segments represent 13.7% of the UOA_Brahman_1 assembly. The genes located within these segments were previously reported to be under positive selection in Brahman cattle, and include functional candidate genes implicated in feed efficiency, development and immunity. Conclusions We report a list of taurine segments that are in the UOA_Brahman_1 assembly, which will be useful for the interpretation of interesting genomic features (e.g., signatures of selection, runs of homozygosity, increased mutation rate, etc.) that could appear in future re-sequencing analysis of indicine cattle.

Genetic Variability and Phylogeny of Human Papillomavirus Type 16 Based On E6, E7 and L1 Genes in Central China

10.21203/rs.3.rs-1090848/v1 ◽

2021 ◽

Author(s):

Shuizhong Han ◽

Xiaochuan Wang ◽

Xiaojing Wang ◽

Shuaijun Wang ◽

Li Ma

Keyword(s):

B Cell ◽

Phylogenetic Trees ◽

Central China ◽

Reference Sequence ◽

B Cell Epitopes ◽

Hpv16 E6 ◽

Synonymous Mutations ◽

Secondary Structure Analysis ◽

E6 And E7 ◽

E7 Proteins

Abstract In the current study, a total of 74 single-infected HPV16 samples from females attending the gynecological outpatient clinic in four cities of Henan province were collected and applied to the L1, E6 and E7 sequencing. Variations of the HPV16 L1, E6 and E7 genes were characterized by comparison with reference sequence and the secondary structure analysis were conducted. Phylogenetic trees based on the L1 and E6-E7 sequences were constructed separately. B-cell epitopes of the HPV16 E6 and E7 proteins were predicted further. A total of thirty-seven novel variations, including twenty L1 genes and seventeen E6-E7 genes were identified. Compared with the reference sequence, twenty-eight variations (1.8%, 28/1596) were identified in L1 gene sequences and 10/28 (35.7%) were non-synonymous mutations. For E6-E7 sequences, twenty-five novel gene changes (including 16 mutations (3.4%, 16/477) in E6 gene, 9 mutations (3.0%, 9/297) in E7 gene) were found, 18/25 (72.0%) were non-synonymous and 10/28 (35.7%) were non-synonymous mutations. Phylogenetic analysis showed that 56.8% (42/74) of the samples were A1 sublineages, 37.8% (28/74) were A4, 4.1% (3/74) were A3 and 1.4% (1/74) was A2 sublineages. On the prediction of B-cell epitopes, seven potent epitopes for E6 and four for E7 were identified. Amino mutations, including L90V, R62K, R142Q and F76L in E6, S63F and N29S/H in E7 changed the score. HPV16 variants prevalent in the central China belong to European A1 sublineages. Sequences of HPV16 L1, E6 and E7 in this study may provide assistant for the improvement of HPV vaccines.

reference sequence
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes

Biochemical and structural characterization of quizalofop-resistant wheat acetyl-CoA carboxylase

Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence

Application of Environmental DNA (eDNA) Metabarcoding Method to Identify Threatened Sulawesi Mammal Based on 12S rRNA Gene

LABRADOR—A Computational Workflow for Virus Detection in High-Throughput Sequencing Data

Reconstructing mitochondrial genomes from ancient DNA through iterative mapping: an evaluation of software, parameters, and bait reference

Identification and Validation of Stable Quantitative Trait Loci for SDS-Sedimentation Volume in Common Wheat (Triticum aestivum L.)

Envisioning the next human genome reference

Assessing Bos taurus introgression in the UOA Bos indicus assembly

Genetic Variability and Phylogeny of Human Papillomavirus Type 16 Based On E6, E7 and L1 Genes in Central China

Export Citation Format

reference sequenceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Multiple Cases of Bacterial Sequence Erroneously Incorporated Into Publicly Available Chloroplast Genomes

Biochemical and structural characterization of quizalofop-resistant wheat acetyl-CoA carboxylase

Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence

Application of Environmental DNA (eDNA) Metabarcoding Method to Identify Threatened Sulawesi Mammal Based on 12S rRNA Gene

LABRADOR—A Computational Workflow for Virus Detection in High-Throughput Sequencing Data

Reconstructing mitochondrial genomes from ancient DNA through iterative mapping: an evaluation of software, parameters, and bait reference

Identification and Validation of Stable Quantitative Trait Loci for SDS-Sedimentation Volume in Common Wheat (Triticum aestivum L.)

Envisioning the next human genome reference

Assessing Bos taurus introgression in the UOA Bos indicus assembly

Genetic Variability and Phylogeny of Human Papillomavirus Type 16 Based On E6, E7 and L1 Genes in Central China

reference sequence
Recently Published Documents