sequence information Latest Research Papers

Phylogenetic analysis of phytochrome A gene from Lablab purpureus (L.) Sweet

Journal of Genetic Engineering and Biotechnology ◽

10.1186/s43141-021-00295-z ◽

2022 ◽

Vol 20 (1) ◽

Author(s):

Stuti Krishna ◽

Kaushal Modha ◽

Vipulkumar Parekh ◽

Ritesh Patel ◽

Digvijay Chauhan

Keyword(s):

Phylogenetic Analysis ◽

Glycine Max ◽

Binding Property ◽

Sequence Information ◽

Phytochrome A ◽

Lablab Purpureus ◽

Helix Loop Helix ◽

Dna Binding Property ◽

Exon 1 ◽

Developmental Responses

Abstract Background Phytochromes are the best characterized photoreceptors that perceive Red (R)/Far-Red (FR) signals and mediate key developmental responses in plants. It is well established that photoperiodic control of flowering is regulated by PHY A (phytochrome A) gene. So far, the members of PHY A gene family remains unexplored in Lablab purpureus, and therefore, their functions are still not deciphered. PHYA3 is the homologue of phytochrome A and known to be involved in dominant suppression of flowering under long day conditions by downregulating florigens in Glycine max. The present study is the first effort to identify and characterize any photoreceptor gene (PHYA3, in this study) in Lablab purpureus and decipher its phylogeny with related legumes. Results PHYA3 was amplified in Lablab purpureus cv GNIB-21 (photo-insensitive and determinate) by utilizing primers designed from GmPHYA3 locus of Glycine max. This study was successful in partially characterizing PHYA3 in Lablab purpureus (LprPHYA3) which is 2 kb longer and belongs to exon 1 region of PHYA3 gene. Phylogenetic analysis of the nucleotide and protein sequences of PHYA genes through MEGA X delineated the conservation and evolution of Lablab purpureus PHYA3 (LprPHYA3) probably from PHYA genes of Vigna unguiculata, Glycine max and Vigna angularis. A conserved basic helix-loop-helix motif bHLH69 was predicted having DNA binding property. Domain analysis of GmPHYA protein and predicted partial protein sequence corresponding to exon-1 of LprPHYA3 revealed the presence of conserved domains (GAF and PAS domains) in Lablab purpureus similar to Glycine max. Conclusion Partial characterization of LprPHYA3 would facilitate the identification of complete gene in Lablab purpureus utilizing sequence information from phylogenetically related species of Fabaceae. This would allow screening of allelic variants for LprPHYA3 locus and their role in photoperiod responsive flowering. The present study could aid in modulating photoperiod responsive flowering in Lablab purpureus and other related legumes in near future through genome editing.

Download Full-text

Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features

Computational and Mathematical Methods in Medicine ◽

10.1155/2022/7493834 ◽

2022 ◽

Vol 2022 ◽

pp. 1-7

Author(s):

Mujiexin Liu ◽

Hui Chen ◽

Dong Gao ◽

Cai-Yi Ma ◽

Zhao-Yue Zhang

Keyword(s):

Helicobacter Pylori ◽

Membrane Proteins ◽

Cross Validation ◽

Cost Effective ◽

Vital Role ◽

Support Vector ◽

Sequence Information ◽

Common Risk Factor ◽

Proposed Model ◽

H Pylori

Helicobacter pylori (H. pylori) is the most common risk factor for gastric cancer worldwide. The membrane proteins of the H. pylori are involved in bacterial adherence and play a vital role in the field of drug discovery. Thus, an accurate and cost-effective computational model is needed to predict the uncharacterized membrane proteins of H. pylori. In this study, a reliable benchmark dataset consisted of 114 membrane and 219 nonmembrane proteins was constructed based on UniProt. A support vector machine- (SVM-) based model was developed for discriminating H. pylori membrane proteins from nonmembrane proteins by using sequence information. Cross-validation showed that our method achieved good performance with an accuracy of 91.29%. It is anticipated that the proposed model will be useful for the annotation of H. pylori membrane proteins and the development of new anti-H. pylori agents.

Download Full-text

Identification of Enzymatic Active Sites with Unsupervised Language Modeling

10.33774/chemrxiv-2021-m20gg-v2 ◽

2022 ◽

Author(s):

Loïc Kwate Dassi ◽

Matteo Manica ◽

Daniel Probst ◽

Philippe Schwaller ◽

Yves Gaetan Nana Teukam ◽

...

Keyword(s):

Language Processing ◽

Active Site ◽

Active Sites ◽

Functional Characterization ◽

3D Structure ◽

Ground Truth ◽

Sequence Information ◽

Docking Simulations ◽

Domain Specific ◽

Language Representation

The first decade of genome sequencing saw a surge in the characterization of proteins with unknown functionality. Even still, more than 20% of proteins in well-studied model animals have yet to be identified, making the discovery of their active site one of biology's greatest puzzle. Herein, we apply a Transformer architecture to a language representation of bio-catalyzed chemical reactions to learn the signal at the base of the substrate-active site atomic interactions. The language representation comprises a reaction simplified molecular-input line-entry system (SMILES) for substrate and products, complemented with amino acid (AA) sequence information for the enzyme. We demonstrate that by creating a custom tokenizer and a score based on attention values, we can capture the substrate-active site interaction signal and utilize it to determine the active site position in unknown protein sequences, unraveling complicated 3D interactions using just 1D representations. This approach exhibits remarkable results and can recover, with no supervision, 31.51% of the active site when considering co-crystallized substrate-enzyme structures as a ground-truth, vastly outperforming approaches based on sequence similarities only. Our findings are further corroborated by docking simulations on the 3D structure of few enzymes. This work confirms the unprecedented impact of natural language processing and more specifically of the Transformer architecture on domain-specific languages, paving the way to effective solutions for protein functional characterization and bio-catalysis engineering.

Download Full-text

De novo assembly and inferred functional annotation of the transcriptome of Heterosigma akashiwo

10.22541/au.164112034.43396317/v2 ◽

2022 ◽

Author(s):

Masanao Sato ◽

Masahide Seki ◽

Yutaka Suzuki ◽

Shoko Ueki

Keyword(s):

Functional Annotation ◽

De Novo ◽

Molecular Level ◽

Transcriptome Assembly ◽

Practical Interest ◽

Heterosigma Akashiwo ◽

Sequence Information ◽

Biological Characterization ◽

Nucleotide Database

Heterosigma akashiwo is a eukaryotic, cosmopolitan, and unicellular alga (class: Raphidophyceae), and produces fish-killing blooms. There is a substantial scientific and practical interest in its ecophysiological characteristics that determine bloom dynamics and its adaptation to broad climate zones. A well-annotated genomic/genetic sequence information enables researchers to characterize organisms using modern molecular technology. The Chloroplast and the mitochondrial genome sequences and transcriptome sequence assembly (TSA) datasets with limited sizes for H. akashiwo are available in NCBI nucleotide database on December 2021: there is no doubt that more genetic information of the species will greatly enhance the progress of biological characterization of the species. Here, we conducted H. akashiwo RNA sequencing, a de novo transcriptome assembly (NCBI TSA ICRV01) of a large number of high-quality short-read sequences, and the functional annotation of predicted genes. Based on our transcriptome, we confirmed that the organism possesses genes that were predicted to function in phagocytosis, supporting the earlier observations of H. akashiwo bacterivory. Along with its capability for photosynthesis, the mixotrophy of H. akashiwo may partially explain its high adaptability to various environmental conditions. Our study here will provide an important toehold to decipher H. akashiwo ecophysiology at a molecular level.

Download Full-text

De novo assembly and inferred functional annotation of the transcriptome of Heterosigma akashiwo

10.22541/au.164112034.43396317/v1 ◽

2022 ◽

Author(s):

Masanao Sato ◽

Masahide Seki ◽

Yutaka Suzuki ◽

Shoko Ueki

Keyword(s):

Functional Annotation ◽

De Novo ◽

Molecular Level ◽

Transcriptome Assembly ◽

Practical Interest ◽

Heterosigma Akashiwo ◽

Sequence Information ◽

Biological Characterization ◽

Nucleotide Database

Heterosigma akashiwo is a eukaryotic, cosmopolitan, and unicellular alga (class: Raphidophyceae), and produces fish-killing blooms. There is a substantial scientific and practical interest in its ecophysiological characteristics that determine bloom dynamics and its adaptation to broad climate zones. A well-annotated genomic/genetic sequence information enables researchers to characterize organisms using modern molecular technology. The Chloroplast and the mitochondrial genome sequences and transcriptome sequence assembly (TSA) datasets with limited sizes for H. akashiwo are available in NCBI nucleotide database on December 2021: there is no doubt that more genetic information of the species will greatly enhance the progress of biological characterization of the species. Here, we conducted H. akashiwo RNA sequencing, a de novo transcriptome assembly (NCBI TSA ICRV01) of a large number of high-quality short-read sequences, and the functional annotation of predicted genes. Based on our transcriptome, we confirmed that the organism possesses genes that were predicted to function in phagocytosis, supporting the earlier observations of H. akashiwo bacterivory. Along with its capability for photosynthesis, the mixotrophy of H. akashiwo may partially explain its high adaptability to various environmental conditions. Our study here will provide an important toehold to decipher H. akashiwo ecophysiology at a molecular level.

Download Full-text

Deep learning program to predict protein functions based on sequence information

MethodsX ◽

10.1016/j.mex.2022.101622 ◽

2022 ◽

pp. 101622

Author(s):

Chang Woo Ko ◽

June Huh ◽

Jong Wan Park

Keyword(s):

Deep Learning ◽

Sequence Information ◽

Learning Program ◽

Protein Functions

Download Full-text

Protein Subcellular Localization Based on Evolutionary Information and Segmented Distribution

Mathematical Problems in Engineering ◽

10.1155/2021/8629776 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Danyu Jin ◽

Ping Zhu

Keyword(s):

Subcellular Localization ◽

Conditional Entropy ◽

Subcellular Location ◽

New Drugs ◽

Experimental Comparison ◽

Evolutionary Information ◽

Support Vector ◽

Sequence Information ◽

Protein Subcellular Localization ◽

Protein Subcellular Location

The prediction of protein subcellular localization not only is important for the study of protein structure and function but also can facilitate the design and development of new drugs. In recent years, feature extraction methods based on protein evolution information have attracted much attention and made good progress. Based on the protein position-specific score matrix (PSSM) obtained by PSI-BLAST, PSSM-GSD method is proposed according to the data distribution characteristics. In order to reflect the protein sequence information as much as possible, AAO method, PSSM-AAO method, and PSSM-GSD method are fused together. Then, conditional entropy-based classifier chain algorithm and support vector machine are used to locate multilabel proteins. Finally, we test Gpos-mPLoc and Gneg-mPLoc datasets, considering the severe imbalance of data, and select SMOTE algorithm to expand a few sample; the experiment shows that the AAO + PSSM ∗ method in the paper achieved 83.1% and 86.8% overall accuracy, respectively. After experimental comparison of different methods, AAO + PSSM ∗ has good performance and can effectively predict protein subcellular location.

Download Full-text

Novel Expansion of Matrix Metalloproteases in the Laboratory Axolotl (Ambystoma mexicanum) and Other Salamander Species

Frontiers in Ecology and Evolution ◽

10.3389/fevo.2021.786263 ◽

2021 ◽

Vol 9 ◽

Author(s):

Nour Al Haj Baddar ◽

Nataliya Timoshevskaya ◽

Jeramiah J. Smith ◽

Houfu Guo ◽

S. Randal Voss

Keyword(s):

Ambystoma Mexicanum ◽

Matrix Metalloproteases ◽

Sequence Information ◽

The Novel ◽

Tail Regeneration ◽

Extracellular Milieu ◽

Comprehensive Survey ◽

Protein Components ◽

Salamander Species ◽

Mmp Genes

Matrix metalloprotease (MMP) genes encode endopeptidases that cleave protein components of the extracellular matrix (ECM) as well as non-ECM proteins. Here we report the results of a comprehensive survey of MMPs in the laboratory axolotl and other representative salamanders. Surprisingly, 28 MMPs were identified in salamanders and 9 MMP paralogs were identified as unique to the axolotl and other salamander taxa, with several of these presenting atypical amino acid insertions not observed in other tetrapod vertebrates. Furthermore, as assessed by sequence information, all of the novel salamander MMPs are of the secreted type, rather than cell membrane anchored. This suggests that secreted type MMPs expanded uniquely within salamanders to presumably execute catalytic activities in the extracellular milieu. To facilitate future studies of salamander-specific MMPs, we annotated transcriptional information from published studies of limb and tail regeneration. Our analysis sets the stage for comparative studies to understand why MMPs expanded uniquely within salamanders.

Download Full-text

Compatible or incompatible? DSI, open access, and benefit-sharing

10.31235/osf.io/nw8g9 ◽

2021 ◽

Author(s):

Rodrigo Sara ◽

Andrew Lee Hufton ◽

Amber Hartman Scholz

Keyword(s):

Open Access ◽

Scientific Community ◽

Biological Diversity ◽

Benefit Sharing ◽

Convention On Biological Diversity ◽

Sequence Information ◽

Policy Options ◽

Access And Benefit Sharing ◽

Key Points ◽

Differences Of Opinion

The scientific community has a strong tradition of sharing digital sequence information (DSI) in an unrestricted manner through public databases. While this tradition of “open access” sharing has many benefits, it has created tension in the context of the Convention on Biological Diversity (CBD). Differences of opinion on open access to DSI underlie key points of divergence in ongoing negotiations. The CBD has provided a set of policy options for DSI, but they are not granular enough to assess whether they are compatible with open access principles. Here, we explain what open access to DSI means in practice, assess the CBD DSI policy options through a more granular, technical lens, and discuss which policy options best enable open access. We show that de-coupled benefit-sharing mechanisms for DSI are the most compatible with open access practices and multilateral mechanisms, in general, are the most suited for benefit-sharing if fully de-coupled mechanisms become politically unrealistic.

Download Full-text

The ground truth of the Data-Iceberg: Correct Meta-data

10.1101/2021.12.17.473021 ◽

2021 ◽

Author(s):

Aylin Caliskan ◽

Seema Dangwal ◽

Thomas Dandekar

Keyword(s):

Data Collection ◽

Ground Truth ◽

Molecular Data ◽

Patient Data ◽

Sequence Information ◽

Timely Manner ◽

Meta Data ◽

Data Flood

Biological molecular data such as sequence information increase so rapidly that detailed metadata, describing the process and conditions of data collection as well as proper labelling and typing of the data become ever more important to avoid mistakes and erroneous labeling. Starting from a striking example of wrong labelling of patient data recently published in Nature, we advocate measures to improve software metadata and controls in a timely manner to not rapidly loose quality in the ever-growing data flood.

Download Full-text

sequence information
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Phylogenetic analysis of phytochrome A gene from Lablab purpureus (L.) Sweet

Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features

Identification of Enzymatic Active Sites with Unsupervised Language Modeling

De novo assembly and inferred functional annotation of the transcriptome of Heterosigma akashiwo

De novo assembly and inferred functional annotation of the transcriptome of Heterosigma akashiwo

Deep learning program to predict protein functions based on sequence information

Protein Subcellular Localization Based on Evolutionary Information and Segmented Distribution

Novel Expansion of Matrix Metalloproteases in the Laboratory Axolotl (Ambystoma mexicanum) and Other Salamander Species

Compatible or incompatible? DSI, open access, and benefit-sharing

The ground truth of the Data-Iceberg: Correct Meta-data

Export Citation Format

sequence informationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Phylogenetic analysis of phytochrome A gene from Lablab purpureus (L.) Sweet

Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features

Identification of Enzymatic Active Sites with Unsupervised Language Modeling

De novo assembly and inferred functional annotation of the transcriptome of Heterosigma akashiwo

De novo assembly and inferred functional annotation of the transcriptome of Heterosigma akashiwo

Deep learning program to predict protein functions based on sequence information

Protein Subcellular Localization Based on Evolutionary Information and Segmented Distribution

Novel Expansion of Matrix Metalloproteases in the Laboratory Axolotl (Ambystoma mexicanum) and Other Salamander Species

Compatible or incompatible? DSI, open access, and benefit-sharing

The ground truth of the Data-Iceberg: Correct Meta-data

sequence information
Recently Published Documents