scholarly journals HORmon: automated annotation of human centromeres

2021 ◽  
Author(s):  
Olga Kunyavskaya ◽  
Tatiana Dvorkina ◽  
Andrey V. Bzikadze ◽  
Ivan Alexandrov ◽  
Pavel A. Pevzner

Recent advances in long-read sequencing opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. They also emphasized the need for centromere annotation (partitioning human centromeres into monomers and higher-order repeats (HORs)). Even though there was a half-century-long series of semi-manual studies of centromere architecture, a rigorous centromere annotation algorithm is still lacking. Moreover, an automated centromere annotation is a prerequisite for studies of genetic diseases associated with centromeres, and evolutionary studies of centromeres across multiple species. Although the monomer decomposition (transforming a centromere into a monocentromere written in the monomer alphabet) and the HOR decomposition (representing a monocentromere in the alphabet of HORs) are currently viewed as two separate problems, we demonstrate that they should be integrated into a single framework in such a way that HOR (monomer) inference affects monomer (HOR) inference. We thus developed the HORmon algorithm that integrates the monomer/HOR inference and automatically generates the human monomers/HORs that are largely consistent with the previous semi-manual inference.

The essays collected in this book represent recent advances in our understanding of speech acts-actions like asserting, asking, and commanding that speakers perform when producing an utterance. The study of speech acts spans disciplines, and embraces both the theoretical and scientific concerns proper to linguistics and philosophy as well as the normative questions that speech acts raise for our politics, our societies, and our ethical lives generally. It is the goal of this book to reflect the diversity of current thinking on speech acts as well as to bring these conversations together, so that they may better inform one another. Topics explored in this book include the relationship between sentence grammar and speech act potential; the fate of traditional frameworks in speech act theory, such as the content-force distinction and the taxonomy of speech acts; and the ways in which speech act theory can illuminate the dynamics of hostile and harmful speech. The book takes stock of well over a half century of thinking about speech acts, bringing this classicwork in linewith recent developments in semantics and pragmatics, and pointing the way forward to further debate and research.


2020 ◽  
Vol 56 (69) ◽  
pp. 9916-9936 ◽  
Author(s):  
He-Ye Zhou ◽  
Qian-Shou Zong ◽  
Ying Han ◽  
Chuan-Feng Chen

Recent advances in various types of higher order rotaxanes with precisely controlled architectures are summarized in this feature article.


2020 ◽  
Author(s):  
Yuxuan Yuan ◽  
Philipp E. Bayer ◽  
Robyn Anderson ◽  
HueyTyng Lee ◽  
Chon-Kit Kenneth Chan ◽  
...  

AbstractRecent advances in long-read sequencing have the potential to produce more complete genome assemblies using sequence reads which can span repetitive regions. However, overlap based assembly methods routinely used for this data require significant computing time and resources. Here, we have developed RefKA, a reference-based approach for long read genome assembly. This approach relies on breaking up a closely related reference genome into bins, aligning k-mers unique to each bin with PacBio reads, and then assembling each bin in parallel followed by a final bin-stitching step. During benchmarking, we assembled the wheat Chinese Spring (CS) genome using publicly available PacBio reads in parallel in 168 wall hours on a 250 CPU system. The maximum RAM used was 300 Gb and the computing time was 42,000 CPU hours. The approach opens applications for the assembly of other large and complex genomes with much-reduced computing requirements. The RefKA pipeline is available at https://github.com/AppliedBioinformatics/RefKA


2018 ◽  
Author(s):  
Satomi Mitsuhashi ◽  
Martin C Frith ◽  
Takeshi Mizuguchi ◽  
Satoko Miyatake ◽  
Tomoko Toyota ◽  
...  

AbstractTandemly repeated sequences are highly mutable and variable features of genomes. Tandem repeat expansions are responsible for a growing list of human diseases, even though it is hard to determine tandem repeat sequences with current DNA sequencing technology. Recent long-read technologies are promising, because the DNA reads are often longer than the repetitive regions, but are hampered by high error rates. Here, we report robust detection of human repeat expansions from careful alignments of long (PacBio and nanopore) reads to a reference genome. Our method (tandem-genotypes) is robust to systematic sequencing errors, inexact repeats with fuzzy boundaries, and low sequencing coverage. By comparing to healthy controls, we can prioritize pathological expansions within the top 10 out of 700000 tandem repeats in the genome. This may help to elucidate the many genetic diseases whose causes remain unknown.


Genes ◽  
2018 ◽  
Vol 9 (12) ◽  
pp. 598 ◽  
Author(s):  
Gregory A. Taylor ◽  
Heather Kirk ◽  
Lauren Coombe ◽  
Shaun D. Jackman ◽  
Justin Chu ◽  
...  

The grizzly bear (Ursus arctos ssp. horribilis) represents the largest population of brown bears in North America. Its genome was sequenced using a microfluidic partitioning library construction technique, and these data were supplemented with sequencing from a nanopore-based long read platform. The final assembly was 2.33 Gb with a scaffold N50 of 36.7 Mb, and the genome is of comparable size to that of its close relative the polar bear (2.30 Gb). An analysis using 4104 highly conserved mammalian genes indicated that 96.1% were found to be complete within the assembly. An automated annotation of the genome identified 19,848 protein coding genes. Our study shows that the combination of the two sequencing modalities that we used is sufficient for the construction of highly contiguous reference quality mammalian genomes. The assembled genome sequence and the supporting raw sequence reads are available from the NCBI (National Center for Biotechnology Information) under the bioproject identifier PRJNA493656, and the assembly described in this paper is version QXTK01000000.


2020 ◽  
Vol 8 (1) ◽  
pp. 71-90 ◽  
Author(s):  
Caroline B. Albertin ◽  
Oleg Simakov

Cephalopods are resourceful marine predators that have fascinated generations of researchers as well as the public owing to their advanced behavior, complex nervous system, and significance in evolutionary studies. Recent advances in genomics have accelerated the pace of cephalopod research. Many traditional areas focusing on evolution, development, behavior, and neurobiology, primarily on the morphological level, are now transitioning to molecular approaches. This review addresses the recent progress and impact of genomic and other molecular resources on research in cephalopods. We outline several key directions in which significant progress in cephalopod research is expected and discuss its impact on our understanding of the genetic background behind cephalopod biology and beyond.


1967 ◽  
Vol 42 (4) ◽  
pp. 471-513 ◽  
Author(s):  
F. Clark Howell

Sign in / Sign up

Export Citation Format

Share Document