scholarly journals Combining protein-based transcriptome assembly, and efficient MinION long read sequencing for targeted transcript sequencing in orphan species. Validation on herbicide targets and low copy number genes in Gymnosperms, Juncaceae and Pteridophyta

2020 ◽  
Author(s):  
Dyfed Lloyd Evans

AbstractOrphan species that are evolutionarily distant from their closest sequenced/assembled neighbour provide a significant challenge in terms of gene or transcript assembly for functional analysis. This is because 30% sequence divergence from the closest available reference sequence means that, even with a complete genome or transcriptome sequence, mapping-based or reference-based approaches to gene assembly and gene identification break down.A new approach is required for reference-guided gene and transcript assembly in such orphan species, or species that are evolutionarily very divergent from their closest relatives. When annotating genes, the protein sequence is often preferred as it diverges less than the DNA/RNA sequence and it is often simpler to find meaningful homology at the protein level. This greater conservation of protein sequence across evolutionary time also makes proteins a prime candidate for use as the basis for sequence assembly. A protein-based pipeline was developed for transcript assembly between distantly related species. This was tested on three evolutionarily divergent species with little sequence information available for them and for which the closest genome representatives were at least 40 million years divergent as well as one species (Azolla filiculoides) for which a genome assembly is available. All the species have the potential to be weeds and herbicide targets were chosen as functional genes, whilst low copy number genes were chosen for evolutionary studies. Transcriptomic sequences were assembled using a bait and assemble strategy and final assemblies were verified by direct sequencing.

2020 ◽  
Author(s):  
Michal Levin ◽  
Marion Scheibe ◽  
Falk Butter

Abstract BackgroundThe process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-Seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. ResultsCombining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6,200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. ConclusionsWe show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.


Genome ◽  
2007 ◽  
Vol 50 (9) ◽  
pp. 871-875 ◽  
Author(s):  
C.J. Coyne ◽  
M.T. McClendon ◽  
J.G. Walling ◽  
G.M. Timmerman-Vaughan ◽  
S. Murray ◽  
...  

Pea ( Pisum sativum L.) has a genome of about 4 Gb that appears to share conserved synteny with model legumes having genomes of 0.2–0.4 Gb despite extensive intergenic expansion. Pea plant inventory (PI) accession 269818 has been used to introgress genetic diversity into the cultivated germplasm pool. The aim here was to develop pea bacterial artificial chromosome (BAC) libraries that would enable the isolation of genes involved in plant disease resistance or control of economically important traits. The BAC libraries encompassed about 3.2 haploid genome equivalents consisting of partially HindIII-digested DNA fragments with a mean size of 105 kb that were inserted in 1 of 2 vectors. The low-copy oriT-based T-DNA vector (pCLD04541) library contained 55 680 clones. The single-copy oriS-based vector (pIndigoBAC-5) library contained 65 280 clones. Colony hybridization of a universal chloroplast probe indicated that about 1% of clones in the libraries were of chloroplast origin. The presence of about 0.1% empty vectors was inferred by white/blue colony plate counts. The usefulness of the libraries was tested by 2 replicated methods. First, high-density filters were probed with low copy number sequences. Second, BAC plate-pool DNA was used successfully to PCR amplify 7 of 9 published pea resistance gene analogs (RGAs) and several other low copy number pea sequences. Individual BAC clones encoding specific sequences were identified. Therefore, the HindIII BAC libraries of pea, based on germplasm accession PI 269818, will be useful for the isolation of genes underlying disease resistance and other economically important traits.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Michal Levin ◽  
Marion Scheibe ◽  
Falk Butter

Abstract Background The process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. Results Combining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. Conclusions We show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.


2020 ◽  
Author(s):  
Michal Levin ◽  
Marion Scheibe ◽  
Falk Butter

Abstract Background The process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. Results Combining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6,200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. Conclusions We show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.


2020 ◽  
Author(s):  
Michal Levin ◽  
Marion Scheibe ◽  
Falk Butter

Abstract Background The process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-Seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. Results Combining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6,200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. Conclusions We show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 283
Author(s):  
Eyal Seroussi

Determination of the relative copy numbers of mixed molecular species in nucleic acid samples is often the objective of biological experiments, including Single-Nucleotide Polymorphism (SNP), indel and gene copy-number characterization, and quantification of CRISPR-Cas9 base editing, cytosine methylation, and RNA editing. Standard dye-terminator chromatograms are a widely accessible, cost-effective information source from which copy-number proportions can be inferred. However, the rate of incorporation of dye terminators is dependent on the dye type, the adjacent sequence string, and the secondary structure of the sequenced strand. These variable rates complicate inferences and have driven scientists to resort to complex and costly quantification methods. Because these complex methods introduce their own biases, researchers are rethinking whether rectifying distortions in sequencing trace files and using direct sequencing for quantification will enable comparable accurate assessment. Indeed, recent developments in software tools (e.g., TIDE, ICE, EditR, BEEP and BEAT) indicate that quantification based on direct Sanger sequencing is gaining in scientific acceptance. This commentary reviews the common obstacles in quantification and the latest insights and developments relevant to estimating copy-number proportions based on direct Sanger sequencing, concluding that bidirectional sequencing and sophisticated base calling are the keys to identifying and avoiding sequence distortions.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 1065
Author(s):  
Reinhard Mischke ◽  
Julia Metzger ◽  
Ottmar Distl

Congenital fibrinogen disorders are very rare in dogs. Cases of afibrinogenemia have been reported in Bernese Mountain, Bichon Frise, Cocker Spaniel, Collie, Lhasa Apso, Viszla, and St. Bernard dogs. In the present study, we examined four miniature wire-haired Dachshunds with afibrinogenemia and ascertained their pedigree. Homozygosity mapping and a genome-wide association study identified a candidate genomic region at 50,188,932–64,187,680 bp on CFA15 harboring FGB (fibrinogen beta chain), FGA (fibrinogen alpha chain), and FGG (fibrinogen gamma-B chain). Sanger sequencing of all three fibrinogen genes in two cases and validation of the FGA-associated mutation (FGA:g.6296delT, NC_006597.3:g.52240694delA, rs1152388481) in pedigree members showed a perfect co-segregation with afibrinogenemia-affected phenotypes, obligate carriers, and healthy animals. In addition, the rs1152388481 variant was validated in 393 Dachshunds and samples from 33 other dog breeds. The rs1152388481 variant is predicted to modify the protein sequence of both FGA transcripts (FGA201:p.Ile486Met and FGA-202:p.Ile555Met) leading to proteins truncated by 306 amino acids. The present data provide evidence for a novel FGA truncating frameshift mutation that is very likely to explain the cases of severe bleeding due to afibrinogenemia in a Dachshund family. This mutation has already been spread in Dachshunds through carriers before cases were ascertained. Genetic testing allows selective breeding to prevent afibrinogenemia-affected puppies in the future.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Liuyang Fu ◽  
Qian Wang ◽  
Lina Li ◽  
Tao Lang ◽  
Junjia Guo ◽  
...  

Abstract Background Chromosomal variants play important roles in crop breeding and genetic research. The development of single-stranded oligonucleotide (oligo) probes simplifies the process of fluorescence in situ hybridization (FISH) and facilitates chromosomal identification in many species. Genome sequencing provides rich resources for the development of oligo probes. However, little progress has been made in peanut due to the lack of efficient chromosomal markers. Until now, the identification of chromosomal variants in peanut has remained a challenge. Results A total of 114 new oligo probes were developed based on the genome-wide tandem repeats (TRs) identified from the reference sequences of the peanut variety Tifrunner (AABB, 2n = 4x = 40) and the diploid species Arachis ipaensis (BB, 2n = 2x = 20). These oligo probes were classified into 28 types based on their positions and overlapping signals in chromosomes. For each type, a representative oligo was selected and modified with green fluorescein 6-carboxyfluorescein (FAM) or red fluorescein 6-carboxytetramethylrhodamine (TAMRA). Two cocktails, Multiplex #3 and Multiplex #4, were developed by pooling the fluorophore conjugated probes. Multiplex #3 included FAM-modified oligo TIF-439, oligo TIF-185-1, oligo TIF-134-3 and oligo TIF-165. Multiplex #4 included TAMRA-modified oligo Ipa-1162, oligo Ipa-1137, oligo DP-1 and oligo DP-5. Each cocktail enabled the establishment of a genome map-based karyotype after sequential FISH/genomic in situ hybridization (GISH) and in silico mapping. Furthermore, we identified 14 chromosomal variants of the peanut induced by radiation exposure. A total of 28 representative probes were further chromosomally mapped onto the new karyotype. Among the probes, eight were mapped in the secondary constrictions, intercalary and terminal regions; four were B genome-specific; one was chromosome-specific; and the remaining 15 were extensively mapped in the pericentric regions of the chromosomes. Conclusions The development of new oligo probes provides an effective set of tools which can be used to distinguish the various chromosomes of the peanut. Physical mapping by FISH reveals the genomic organization of repetitive oligos in peanut chromosomes. A genome map-based karyotype was established and used for the identification of chromosome variations in peanut following comparisons with their reference sequence positions.


Sign in / Sign up

Export Citation Format

Share Document