domain discovery
Recently Published Documents


TOTAL DOCUMENTS

30
(FIVE YEARS 9)

H-INDEX

5
(FIVE YEARS 1)

Author(s):  
Carlos P Cantalapiedra ◽  
Ana Hernández-Plaza ◽  
Ivica Letunic ◽  
Peer Bork ◽  
Jaime Huerta-Cepas

Abstract Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows for: (i) de novo gene prediction from raw contigs, (ii) built-in pairwise orthology prediction, (iii) fast protein domain discovery, and (iv) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.


2021 ◽  
Author(s):  
Carlos P Cantalapiedra ◽  
Ana Hernandez-Plaza ◽  
Ivica Letunic ◽  
Peer Bork ◽  
Jaime Huerta-Cepas

Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows: (i) de novo gene prediction from raw contigs, (ii) built-in pairwise orthology prediction, (iii) fast protein domain discovery, and (iv) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.


2021 ◽  
Vol 191 ◽  
pp. 107983
Author(s):  
Isaias Martinez-Yelmo ◽  
Joaquin Alvarez-Horcajo ◽  
Juan Antonio Carral ◽  
Diego Lopez-Pajares

2020 ◽  
Author(s):  
Ning Liu ◽  
Wai Yee Low ◽  
Hamid Alinejad-Rokny ◽  
Stephen Pederson ◽  
Timothy Sadlon ◽  
...  

AbstractEukaryotic genomes are highly organised within the nucleus of a cell, allowing widely dispersed regulatory elements such as enhancers to interact with gene promoters through physical contacts in three-dimensional space. Recent chromosome conformation capture methodologies such as Hi-C have enabled the analysis of interacting regions of the genome providing a valuable insight into the three-dimensional organisation of the chromatin in the nucleus, including chromosome compartmentalisation and gene expression. Complicating the analysis of Hi-C data however is the massive amount of identified interactions, many of which do not directly drive gene function, thus hindering the identification of potentially biologically functional 3D interactions. In this review, we collate and examine the downstream analysis of Hi-C data with particular focus on methods that identify significant functional interactions. We classify three groups of approaches; structurally-associated domain discovery methods e.g. topologically-associated domains and compartments, detection of statistically significant interactions via background models, and the use of epigenomic data integration to identify functional interactions. Careful use of these three approaches is crucial to successfully identifying functional interactions within the genome.


2020 ◽  
Vol 24 (8) ◽  
pp. 1655-1659 ◽  
Author(s):  
Joaquin Alvarez-Horcajo ◽  
Elisa Rojas ◽  
Isaias Martinez-Yelmo ◽  
Marco Savi ◽  
Diego Lopez-Pajares

2020 ◽  
Author(s):  
Elena Tea Russo ◽  
Alessandro Laio ◽  
Marco Punta

As the UniProt database approaches the 200 million entries' mark, the vast majority of proteins it contains lack any experimental validation of their functions. In this context, the identification of homologous relationships between proteins remains the single most widely applicable tool for generating functional and structural hypotheses in silico. Although many databases exist that classify proteins and protein domains into homologous families, large sections of the sequence space remain unassigned. We introduce DPCfam, a new unsupervised procedure that uses sequence alignments and Density Peak Clustering to automatically classify homologous protein regions. Here, we present a proof-of-principle experiment based on the analysis of two clans from the Pfam protein family database. Our tests indicate that DPCfam automatically-generated clusters are generally evolutionary accurate corresponding to one or more Pfam families and that they cover a significant fraction of known homologs. Overall, DPCfam shows potential both for assisting manual annotation efforts (domain discovery, detection of classification inconsistencies, improvement of family coverage and boosting of clan membership) and as a stand-alone tool for unsupervised classification of sparsely annotated protein datasets such as those from environmental metagenomics studies (domain discovery, analysis of domain diversity). Algorithm implementation used in this paper is available at https://gitlab.com/ETRu/dpcfam (Requires Python 3, C++ compiler and runs on Linux systems.); data are available at https://zenodo.org/record/3934399


2020 ◽  
Vol 13 (7) ◽  
pp. 953-967
Author(s):  
Masayo Ota ◽  
Heiko Müller ◽  
Juliana Freire ◽  
Divesh Srivastava
Keyword(s):  

Author(s):  
Raziyeh sadat Seyyed Khorasani ◽  
Ali Fathi ◽  
Nayyer Zaki Dizachi

The relational analysis is one of the well-known types of constructivist semantics, which considers the Sense explanation of vocabulary based on the inter-relations of linguistic units. The word Ḥaqq is one of the fundamental key concepts in the Qur'anic worldview, whose semantic domain discovery is very important in the Quran semantic network. Because by discovering the Sense relations of Ḥaqq, its synonym, hyponym and antonym field of vocabulary is drawn, and the position of this concept is determined in the Qur'anic worldview. This research, using descriptive-analytic method, based on the achievements of constructivist semantics, and with the help of semantic components of Ḥaqq, has explained its Sense relations and concluded that Ḥaqq in the holy Qur'an has a hyponym relation with some words such as the Prophet, the Book, and the promise and it has a synonym relation with some words like just and honesty; and it has a polysemy relation with some words like interest and benefit, wājib and tawḥīd. Void, false and obscene are in complementary opposition with Ḥaqq and suspicion, rebellion and oppression are in connotational opposition with the word Ḥaqq.


2018 ◽  
Vol 22 (5) ◽  
pp. 949-981 ◽  
Author(s):  
Bilal Abu-Salih ◽  
Pornpit Wongthongtham ◽  
Kit Yan Chan

Purpose This paper aims to obtain the domain of the textual content generated by users of online social network (OSN) platforms. Understanding a users’ domain (s) of interest is a significant step towards addressing their domain-based trustworthiness through an accurate understanding of their content in their OSNs. Design/methodology/approach This study uses a Twitter mining approach for domain-based classification of users and their textual content. The proposed approach incorporates machine learning modules. The approach comprises two analysis phases: the time-aware semantic analysis of users’ historical content incorporating five commonly used machine learning classifiers. This framework classifies users into two main categories: politics-related and non-politics-related categories. In the second stage, the likelihood predictions obtained in the first phase will be used to predict the domain of future users’ tweets. Findings Experiments have been conducted to validate the mechanism proposed in the study framework, further supported by the excellent performance of the harnessed evaluation metrics. The experiments conducted verify the applicability of the framework to an effective domain-based classification for Twitter users and their content, as evident in the outstanding results of several performance evaluation metrics. Research limitations/implications This study is limited to an on/off domain classification for content of OSNs. Hence, we have selected a politics domain because of Twitter’s popularity as an opulent source of political deliberations. Such data abundance facilitates data aggregation and improves the results of the data analysis. Furthermore, the currently implemented machine learning approaches assume that uncertainty and incompleteness do not affect the accuracy of the Twitter classification. In fact, data uncertainty and incompleteness may exist. In the future, the authors will formulate the data uncertainty and incompleteness into fuzzy numbers which can be used to address imprecise, uncertain and vague data. Practical implications This study proposes a practical framework comprising significant implications for a variety of business-related applications, such as the voice of customer/voice of market, recommendation systems, the discovery of domain-based influencers and opinion mining through tracking and simulation. In particular, the factual grasp of the domains of interest extracted at the user level or post level enhances the customer-to-business engagement. This contributes to an accurate analysis of customer reviews and opinions to improve brand loyalty, customer service, etc. Originality/value This paper fills a gap in the existing literature by presenting a consolidated framework for Twitter mining that aims to uncover the deficiency of the current state-of-the-art approaches to topic distillation and domain discovery. The overall approach is promising in the fortification of Twitter mining towards a better understanding of users’ domains of interest.


Sign in / Sign up

Export Citation Format

Share Document