scholarly journals More from less: Genome skimming for nuclear markers for animal phylogenomics, a case study using decapod crustaceans

2021 ◽  
Vol 41 (2) ◽  
Author(s):  
Mun Hua Tan ◽  
Han Ming Gan ◽  
Heather Bracken-Grissom ◽  
Tin-Yam Chan ◽  
Frederic Grandjean ◽  
...  

Abstract Low coverage genome sequencing is rapid and cost-effective for recovering complete mitochondrial genomes for crustacean phylogenomics. The recovery of high-copy-number nuclear genes, including histone H3, 18S and 28S ribosomal RNAs, is also possible using this approach based on our research with freshwater crayfishes (Astacidea). We explored the potential of genome skimming (GS) to recover additional nuclear genes from shallow sequencing projects using decapod crustaceans. Using an in silico-baited approach, we recovered three additional core histone genes (H2A, H2B, and H4) from our low-coverage decapod dataset (99 species, 69 genera, 38 families, 10 infraorders). Phylogenetic analyses using various combinations of mitochondrial and nuclear genes for the entire decapod dataset and a subset of 40 species of crayfishes showed that the evolutionary rates for different classes of genes varied widely. A very high level of congruence was nevertheless found between trees from the six nuclear genes and those derived from the mitogenome sequences for freshwater crayfish. These findings indicate that nuclear genes recovered from the same genome skimming datasets designed to obtain mitogenomes can be used to support more robust and comprehensive phylogenetic analyses. Further, a search for additional intron-less nuclear genes identified several high-copy-number genes across the decapod dataset, and recovery of NaK, PEPCK, and GAPDH gene fragments is possible at slightly elevated coverage, suggesting the potential and utility of GS in recovering even more nuclear genetic information for phylogenetic studies from these inexpensive and increasingly abundant datasets.

2020 ◽  
Author(s):  
Mun Hua Tan ◽  
Han Ming Gan ◽  
Heather Bracken-Grissom ◽  
Tin-Yam Chan ◽  
Frederic Grandjean ◽  
...  

AbstractLow coverage genome sequencing is rapid and cost-effective for recovering complete mitochondrial genomes for animal phylogenomics. The recovery of high copy number nuclear genes, including histone H3, 18S and 28S ribosomal RNAs, is also possible using this approach. In this study, we explore the potential of the genome skimming (GS) to recover additional nuclear genes from shallow sequencing projects. Using an in silico baited approach, we recover three additional core histone genes (H2A, H2B and H4) from our existing collection of low coverage decapod crustacean dataset (99 species, 69 genera, 38 families, 10 infraorders). Phylogenetic analyses based on various combinations of mitochondrial and nuclear genes for the entire decapod dataset and 40 species of crayfish (Infraorder Astacidea) found that the evolutionary rates for different classes of genes varied widely. The highlight being a very high level of congruence found between trees from the six nuclear genes and those derived from the mitogenome sequences for freshwater crayfish. These findings indicate that nuclear genes recovered from the same genome skimming datasets designed to obtain mitogenomes can be used to support more robust and comprehensive phylogenetic analyses. Further, a search for additional intron-less nuclear genes identified several high copy number genes across the decapod dataset and recovery of NaK, PEPCK and GAPDH gene fragments is possible at slightly elevated coverage, suggesting the potential and utility of GS in recovering even more nuclear genetic information for phylogenetic studies from these inexpensive and increasingly abundant datasets.


1987 ◽  
Vol 12 (7) ◽  
pp. 503-509 ◽  
Author(s):  
Hans Koff ◽  
Cornelia Schmidt ◽  
Gerlinde Wiesenberger ◽  
Carlo Schmelzer

2021 ◽  
Author(s):  
Bin-Bin Liu ◽  
Zhi-Yao Ma ◽  
Chen Ren ◽  
Richard G.J. Hodel ◽  
Miao Sun ◽  
...  

With the decreasing cost and availability of many newly developed bioinformatics pipelines, next-generation sequencing (NGS) has revolutionized plant systematics in recent years. Genome skimming has been widely used to obtain high-copy fractions of the genomes, including plastomes, mitochondrial DNA (mtDNA), and nuclear ribosomal DNA (nrDNA). In this study, through simulations, we evaluated optimal (minimum) sequencing depth and performance for recovering single-copy nuclear genes (SCNs) from genome skimming data, by subsampling genome resequencing data and generating 10 datasets with different sequencing coverage in silico. We tested the performance of the four datasets (plastome, nrDNA, mtDNA, and SCNs) obtained from genome skimming based on phylogenetic analyses of the Vitis clade at the genus-level and Vitaceae at the family-level, respectively. Our results showed that optimal minimum sequencing depth for high-quality SCNs assembly via genome skimming was about 10x coverage. Without the steps of synthesizing baits and enrichment experiments, we showcase that deep genome skimming (DGS) is effective for capturing large datasets of SCNs, in addition to plastomes, mtDNA, and entire nrDNA repeats, and may serve as an economical alternative to the widely used target enrichment Hyb-Seq approach.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 283
Author(s):  
Eyal Seroussi

Determination of the relative copy numbers of mixed molecular species in nucleic acid samples is often the objective of biological experiments, including Single-Nucleotide Polymorphism (SNP), indel and gene copy-number characterization, and quantification of CRISPR-Cas9 base editing, cytosine methylation, and RNA editing. Standard dye-terminator chromatograms are a widely accessible, cost-effective information source from which copy-number proportions can be inferred. However, the rate of incorporation of dye terminators is dependent on the dye type, the adjacent sequence string, and the secondary structure of the sequenced strand. These variable rates complicate inferences and have driven scientists to resort to complex and costly quantification methods. Because these complex methods introduce their own biases, researchers are rethinking whether rectifying distortions in sequencing trace files and using direct sequencing for quantification will enable comparable accurate assessment. Indeed, recent developments in software tools (e.g., TIDE, ICE, EditR, BEEP and BEAT) indicate that quantification based on direct Sanger sequencing is gaining in scientific acceptance. This commentary reviews the common obstacles in quantification and the latest insights and developments relevant to estimating copy-number proportions based on direct Sanger sequencing, concluding that bidirectional sequencing and sophisticated base calling are the keys to identifying and avoiding sequence distortions.


Animals ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 899
Author(s):  
Fotis Pappas ◽  
Christos Palaiokostas

Incorporation of genomic technologies into fish breeding programs is a modern reality, promising substantial advances regarding the accuracy of selection, monitoring the genetic diversity and pedigree record verification. Single nucleotide polymorphism (SNP) arrays are the most commonly used genomic tool, but the investments required make them unsustainable for emerging species, such as Arctic charr (Salvelinus alpinus), where production volume is low. The requirement to genotype a large number of animals for breeding practices necessitates cost effective genotyping approaches. In the current study, we used double digest restriction site-associated DNA (ddRAD) sequencing of either high or low coverage to genotype Arctic charr from the Swedish national breeding program and performed analytical procedures to assess their utility in a range of tasks. SNPs were identified and used for deciphering the genetic structure of the studied population, estimating genomic relationships and implementing an association study for growth-related traits. Missing information and underestimation of heterozygosity in the low coverage set were limiting factors in genetic diversity and genomic relationship analyses, where high coverage performed notably better. On the other hand, the high coverage dataset proved to be valuable when it comes to identifying loci that are associated with phenotypic traits of interest. In general, both genotyping strategies offer sustainable alternatives to hybridization-based genotyping platforms and show potential for applications in aquaculture selective breeding.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinping Fan ◽  
Guanghao Luo ◽  
Yu S. Huang

Abstract Background Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. Results We introduce Accucopy, a method to infer total copy numbers (TCNs) and allele-specific copy numbers (ASCNs) from challenging low-purity and low-coverage tumor samples. Accucopy adopts many robust statistical techniques such as kernel smoothing of coverage differentiation information to discern signals from noise and combines ideas from time-series analysis and the signal-processing field to derive a range of estimates for the period in a histogram of coverage differentiation information. Statistical learning models such as the tiered Gaussian mixture model, the expectation–maximization algorithm, and sparse Bayesian learning were customized and built into the model. Accucopy is implemented in C++ /Rust, packaged in a docker image, and supports non-human samples, more at http://www.yfish.org/software/. Conclusions We describe Accucopy, a method that can predict both TCNs and ASCNs from low-coverage low-purity tumor sequencing data. Through comparative analyses in both simulated and real-sequencing samples, we demonstrate that Accucopy is more accurate than Sclust, ABSOLUTE, and Sequenza.


Genetics ◽  
1994 ◽  
Vol 137 (2) ◽  
pp. 407-422 ◽  
Author(s):  
E A Vallen ◽  
W Ho ◽  
M Winey ◽  
M D Rose

Abstract KAR1 encodes an essential component of the yeast spindle pole body (SPB) that is required for karyogamy and SPB duplication. A temperature-sensitive mutation, kar1-delta 17, mapped to a region required for SPB duplication and for localization to the SPB. To identify interacting SPB proteins, we isolated 13 dominant mutations and 3 high copy number plasmids that suppressed the temperature sensitivity of kar1-delta 17. Eleven extragenic suppressor mutations mapped to two linkage groups, DSK1 and DSK2. The extragenic suppressors were specific for SPB duplication and did not suppress karyogamy-defective alleles. The major class, DSK1, consisted of mutations in CDC31. CDC31 is required for SPB duplication and encodes a calmodulin-like protein that is most closely related to caltractin/centrin, a protein associated with the Chlamydomonas basal body. The high copy number suppressor plasmids contained the wild-type CDC31 gene. One CDC31 suppressor allele conferred a temperature-sensitive defect in SPB duplication, which was counter-suppressed by recessive mutations in KAR1. In spite of the evidence for a direct interaction, the strongest CDC31 alleles, as well as both DSK2 alleles, suppressed a complete deletion of KAR1. However, the CDC31 alleles also made the cell supersensitive to KAR1 gene dosage, arguing against a simple bypass mechanism of suppression. We propose a model in which Kar1p helps localize Cdc31p to the SPB and that Cdc31p then initiates SPB duplication via interaction with a downstream effector.


Genetics ◽  
2003 ◽  
Vol 164 (2) ◽  
pp. 685-697 ◽  
Author(s):  
Edward K Kentner ◽  
Michael L Arnold ◽  
Susan R Wessler

Abstract The Louisiana iris species Iris brevicaulis and I. fulva are morphologically and karyotypically distinct yet frequently hybridize in nature. A group of high-copy-number TY3/gypsy-like retrotransposons was characterized from these species and used to develop molecular markers that take advantage of the abundance and distribution of these elements in the large iris genome. The copy number of these IRRE elements (for iris retroelement), is ∼1 × 105, accounting for ∼6–10% of the ∼10,000-Mb haploid Louisiana iris genome. IRRE elements are transcriptionally active in I. brevicaulis and I. fulva and their F1 and backcross hybrids. The LTRs of the elements are more variable than the coding domains and can be used to define several distinct IRRE subfamilies. Transposon display or S-SAP markers specific to two of these subfamilies have been developed and are highly polymorphic among wild-collected individuals of each species. As IRRE elements are present in each of 11 iris species tested, the marker system has the potential to provide valuable comparative data on the dynamics of retrotransposition in large plant genomes.


Sign in / Sign up

Export Citation Format

Share Document