Identification of essential sequence motifs in the node/notochord enhancer of Foxa2 ( Hnf3β ) gene that are conserved across vertebrate species

AbstractLong non-coding RNAs (lncRNAs) are a heterogeneous class of genes that do not code for proteins. Since lncRNAs (or a fraction thereof) are expected to be functional, many efforts have been dedicated to catalog lncRNAs in numerous organisms, but our knowledge of lncRNAs in non vertebrate species remains very limited. Here, we annotated lncRNAs using transcriptomic data from the same larval stage of four Caenorhabditis species. The number of annotated lncRNAs in self-fertile nematodes was lower than in out-crossing species. We used a combination of approaches to identify putatively homologous lncRNAs: synteny, sequence conservation, and structural conservation. We classified a total of 1,532 out of 7,635 genes from the four species into families of lncRNAs with conserved synteny and expression at the larval stage, suggesting that a large fraction of the predicted lncRNAs may be species specific. Despite both sequence and local secondary structure seem to be poorly conserved, sequences within families frequently shared BLASTn hits and short sequence motifs, which were more likely to be unpaired in the predicted structures. We provide the first multi-species catalog of lncRNAs in nematodes and identify groups of lncRNAs with conserved synteny and expression, that share exposed motifs.

Download Full-text

Faculty Opinions recommendation of Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1011310.179213 ◽

2003 ◽

Author(s):

Janet Thornton

Keyword(s):

Sequence Diversity ◽

Automatic Identification ◽

Sequence Motifs ◽

Definition Of

Download Full-text

Assessing the Red List Index for vertebrate species in China

Biodiversity Science ◽

10.3724/sp.j.1003.2014.14085 ◽

2014 ◽

Vol 22 (5) ◽

pp. 589

Author(s):

Cui Peng ◽

Xu Haigen ◽

Wu Jun ◽

Ding Hui ◽

Cao Mingchang ◽

...

Keyword(s):

Red List ◽

Vertebrate Species

Download Full-text

Computational Approaches to Predict the Non-canonical DNAs

Current Bioinformatics ◽

10.2174/1574893614666190126143438 ◽

2019 ◽

Vol 14 (6) ◽

pp. 470-479 ◽

Cited By ~ 3

Author(s):

Nazia Parveen ◽

Amen Shamim ◽

Seunghee Cho ◽

Kyeong Kyu Kim

Keyword(s):

Computational Methods ◽

Genetic Instability ◽

Computational Prediction ◽

Structure And Function ◽

Sequence Motifs ◽

Computational Approaches ◽

Functional Roles ◽

And Function ◽

Genetic Events ◽

Insight Into

Background: Although most nucleotides in the genome form canonical double-stranded B-DNA, many repeated sequences transiently present as non-canonical conformations (non-B DNA) such as triplexes, quadruplexes, Z-DNA, cruciforms, and slipped/hairpins. Those noncanonical DNAs (ncDNAs) are not only associated with many genetic events such as replication, transcription, and recombination, but are also related to the genetic instability that results in the predisposition to disease. Due to the crucial roles of ncDNAs in cellular and genetic functions, various computational methods have been implemented to predict sequence motifs that generate ncDNA. Objective: Here, we review strategies for the identification of ncDNA motifs across the whole genome, which is necessary for further understanding and investigation of the structure and function of ncDNAs. Conclusion: There is a great demand for computational prediction of non-canonical DNAs that play key functional roles in gene expression and genome biology. In this study, we review the currently available computational methods for predicting the non-canonical DNAs in the genome. Current studies not only provide an insight into the computational methods for predicting the secondary structures of DNA but also increase our understanding of the roles of non-canonical DNA in the genome.

Download Full-text

Analysis of the Genetic Variability of Virulence-Related Loci in Epidemic Clones of Methicillin-Resistant Staphylococcus aureus

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.49.1.366-379.2005 ◽

2005 ◽

Vol 49 (1) ◽

pp. 366-379 ◽

Cited By ~ 41

Author(s):

A. R. Gomes ◽

S. Vinga ◽

M. Zavolan ◽

H. de Lencastre

Keyword(s):

Staphylococcus Aureus ◽

Genetic Variability ◽

Amino Acid Level ◽

Point Of View ◽

Sequence Motifs ◽

Related Factors ◽

Specific Sequence ◽

Methicillin Resistant ◽

Sequence Types ◽

R Domain

ABSTRACT Methicillin-resistant Staphylococcus aureus (MRSA) isolates have previously been classified into major epidemic clonal types by pulsed-field gel electrophoresis in combination with multilocus sequence typing (MLST) and staphylococcal cassette chromosome mec typing. We aimed to investigate whether genetic variability in potentially polymorphic domains of virulence-related factors could provide another level of differentiation in a diverse collection of epidemic MRSA clones. The target regions of strains representative of epidemic clones and genetically related methicillin-susceptible S. aureus isolates from the 1960s that were sequenced included the R domains of clfA and clfB; the D, W, and M regions of fnbA and fnbB; and three regions in the agr operon. Sequence variation ranged from very conserved regions, such as those for RNAIII and the agr interpromoter region, to the highly polymorphic R regions of the clf genes. The sequences of the clf R domains could be grouped into six major sequence types on the basis of the sequences in their 3′ regions. Six sequence types were also observed for the fnb sequences at the amino acid level. From an evolutionary point of view, it was interesting that a small DNA stretch at the 3′ clf R-domain sequence and the fnb sequences agreed with the results of MLST for this set of strains. In particular, clfB R-domain sequences, which had a high discriminatory capacity and with which the types distinguished were congruent with those obtained by other molecular typing methods, have potential for use for the typing of S. aureus. Clone- and strain-specific sequence motifs in the clf and fnb genes may represent useful additions to a typing methodology with a DNA array.

Download Full-text

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Bioinformatics ◽

10.1093/bioinformatics/btab083 ◽

2021 ◽

Author(s):

Yanrong Ji ◽

Zhihan Zhou ◽

Han Liu ◽

Ramana V Davuluri

Keyword(s):

Dna Sequences ◽

Regulatory Elements ◽

Ease Of Use ◽

Fine Tuning ◽

Supplementary Information ◽

Sequence Motifs ◽

Semantic Relationship ◽

Accurate Identification ◽

Conserved Sequence ◽

Genome Wide

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Comprehensive Phylogenetic and Bioinformatics Survey of Lectins in the Fungal Kingdom

Journal of Fungi ◽

10.3390/jof7060453 ◽

2021 ◽

Vol 7 (6) ◽

pp. 453

Author(s):

Annie Lebreton ◽

François Bonnardel ◽

Yu-Cheng Dai ◽

Anne Imberty ◽

Francis M. Martin ◽

...

Keyword(s):

Evolutionary Relationship ◽

Large Family ◽

Gene Families ◽

Carbohydrate Binding ◽

Sequence Motifs ◽

Fungal Lectin ◽

Quaternary Structures ◽

High Degree ◽

Carbohydrate Ligand ◽

Fungal Kingdom

Fungal lectins are a large family of carbohydrate-binding proteins with no enzymatic activity. They play fundamental biological roles in the interactions of fungi with their environment and are found in many different species across the fungal kingdom. In particular, their contribution to defense against feeders has been emphasized, and when secreted, lectins may be involved in the recognition of bacteria, fungal competitors and specific host plants. Carbohydrate specificities and quaternary structures vary widely, but evidence for an evolutionary relationship within the different classes of fungal lectins is supported by a high degree of amino acid sequence identity. The UniLectin3D database contains 194 fungal lectin 3D structures, of which 129 are characterized with a carbohydrate ligand. Using the UniLectin3D lectin classification system, 109 lectin sequence motifs were defined to screen 1223 species deposited in the genomic portal MycoCosm of the Joint Genome Institute. The resulting 33,485 putative lectin sequences are organized in MycoLec, a publicly available and searchable database. These results shed light on the evolution of the lectin gene families in fungi.

Download Full-text

Context-specific action of macrolide antibiotics on the eukaryotic ribosome

Nature Communications ◽

10.1038/s41467-021-23068-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Maxim S. Svetlov ◽

Timm O. Koller ◽

Sezen Meydan ◽

Vaishnavi Shankar ◽

Dorota Klepacki ◽

...

Keyword(s):

Amino Acid Sequences ◽

Macrolide Antibiotics ◽

Single Mutation ◽

Sequence Motifs ◽

Eukaryotic Translation ◽

Eukaryotic Ribosome ◽

Cellular Translation ◽

Distinct Sequence ◽

Exit Tunnel ◽

Context Specific

AbstractMacrolide antibiotics bind in the nascent peptide exit tunnel of the bacterial ribosome and prevent polymerization of specific amino acid sequences, selectively inhibiting translation of a subset of proteins. Because preventing translation of individual proteins could be beneficial for the treatment of human diseases, we asked whether macrolides, if bound to the eukaryotic ribosome, would retain their context- and protein-specific action. By introducing a single mutation in rRNA, we rendered yeast Saccharomyces cerevisiae cells sensitive to macrolides. Cryo-EM structural analysis showed that the macrolide telithromycin binds in the tunnel of the engineered eukaryotic ribosome. Genome-wide analysis of cellular translation and biochemical studies demonstrated that the drug inhibits eukaryotic translation by preferentially stalling ribosomes at distinct sequence motifs. Context-specific action markedly depends on the macrolide structure. Eliminating macrolide-arrest motifs from a protein renders its translation macrolide-tolerant. Our data illuminate the prospects of adapting macrolides for protein-selective translation inhibition in eukaryotic cells.

Download Full-text

The minimal essential sequence for a major cell type-specific adhesion site (CS1) within the alternatively spliced type III connecting segment domain of fibronectin is leucine-aspartic acid-valine

Journal of Biological Chemistry ◽

10.1016/s0021-9258(18)98588-1 ◽

1991 ◽

Vol 266 (23) ◽

pp. 15075-15079 ◽

Cited By ~ 4

Author(s):

A. Komoriya ◽

L.J. Green ◽

M. Mervic ◽

S.S. Yamada ◽

K.M. Yamada ◽

...

Keyword(s):

Aspartic Acid ◽

Cell Type ◽

Type Iii ◽

Major Cell Type ◽

Alternatively Spliced ◽

Cell Type Specific ◽

Adhesion Site ◽

Essential Sequence

Download Full-text

Temporal and Spatial Blood Feeding Patterns of Urban Mosquitoes in the San Juan Metropolitan Area, Puerto Rico

Insects ◽

10.3390/insects12020129 ◽

2021 ◽

Vol 12 (2) ◽

pp. 129

Author(s):

Matthew W. Hopken ◽

Limarie J. Reyes-Torres ◽

Nicole Scavo ◽

Antoinette J. Piaggio ◽

Zaid Abdo ◽

...

Keyword(s):

Puerto Rico ◽

Aedes Aegypti ◽

Metropolitan Area ◽

Culex Quinquefasciatus ◽

Urban Ecosystems ◽

Life Cycles ◽

Pathogen Transmission ◽

San Juan ◽

Vertebrate Species ◽

Blood Meals

Urban ecosystems are a patchwork of habitats that host a broad diversity of animal species. Insects comprise a large portion of urban biodiversity which includes many pest species, including those that transmit pathogens. Mosquitoes (Diptera: Culicidae) inhabit urban environments and rely on sympatric vertebrate species to complete their life cycles, and in this process transmit pathogens to animals and humans. Given that mosquitoes feed upon vertebrates, they can also act as efficient samplers that facilitate detection of vertebrate species that utilize urban ecosystems. In this study, we analyzed DNA extracted from mosquito blood meals collected temporally in multiple neighborhoods of the San Juan Metropolitan Area, Puerto Rico to evaluate the presence of vertebrate fauna. DNA was collected from 604 individual mosquitoes that represented two common urban species, Culex quinquefasciatus (n = 586) and Aedes aegypti (n = 18). Culex quinquefasciatus fed on 17 avian taxa (81.2% of blood meals), seven mammalian taxa (17.9%), and one reptilian taxon (0.85%). Domestic chickens dominated these blood meals both temporally and spatially, and no statistically significant shift from birds to mammals was detected. Aedes aegypti blood meals were from a less diverse group, with two avian taxa (11.1%) and three mammalian taxa (88.9%) identified. The blood meals we identified provided a snapshot of the vertebrate community in the San Juan Metropolitan Area and have potential implications for vector-borne pathogen transmission.

Download Full-text