Probing Multi-way Chromatin Interaction with Hypergraph Representation Learning

Probing multi-way chromatin interaction with hypergraph representation learning

10.1101/2020.01.22.916171 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ruochi Zhang ◽

Jian Ma

Keyword(s):

Genome Organization ◽

Representation Learning ◽

Higher Order ◽

Computational Method ◽

Chromatin Interaction ◽

Interaction Data ◽

3D Genome ◽

Chromatin Interactions ◽

Single Nucleus ◽

And Function

AbstractAdvances in high-throughput mapping of 3D genome organization have enabled genome-wide characterization of chromatin interactions. However, proximity ligation based mapping approaches for pairwise chromatin interaction such as Hi-C cannot capture multi-way interactions, which are informative to delineate higher-order genome organization and gene regulation mechanisms at single-nucleus resolution. The very recent development of ligation-free chromatin interaction mapping methods such as SPRITE and ChIA-Drop has offered new opportunities to uncover simultaneous interactions involving multiple genomic loci within the same nuclei. Unfortunately, methods for analyzing multi-way chromatin interaction data are significantly underexplored. Here we develop a new computational method, called MATCHA, based on hypergraph representation learning where multi-way chromatin interactions are represented as hyperedges. Applications to SPRITE and ChIA-Drop data suggest that MATCHA is effective to denoise the data and make de novo predictions of multi-way chromatin interactions, reducing the potential false positives and false negatives from the original data. We also show that MATCHA is able to distinguish between multi-way interaction in a single nucleus and combination of pairwise interactions in a cell population. In addition, the embeddings from MATCHA reflect 3D genome spatial localization and function. MATCHA provides a promising framework to significantly improve the analysis of multi-way chromatin interaction data and has the potential to offer unique insights into higher-order chromosome organization and function.

Download Full-text

Computational Inference of DNA Folding Principles: From Data Management to Machine Learning

Special Topics in Information Technology - SpringerBriefs in Applied Sciences and Technology ◽

10.1007/978-3-030-85918-3_7 ◽

2022 ◽

pp. 79-88

Author(s):

Luca Nanni

Keyword(s):

Complex Analysis ◽

Hierarchical Structures ◽

Representation Learning ◽

Research Problem ◽

Graph Representation ◽

Chromatin Interaction ◽

Biological Research ◽

Chromatin Conformation ◽

Computational Framework ◽

Computational Resources

AbstractDNA is the molecular basis of life and would total about three meters if linearly untangled. To fit in the cell nucleus at the micrometer scale, DNA has, therefore, to fold itself into several layers of hierarchical structures, which are thought to be associated with functional compartmentalization of genomic features like genes and their regulatory elements. For this reason, understanding the mechanisms of genome folding is a major biological research problem. Studying chromatin conformation requires high computational resources and complex data analyses pipelines. In this chapter, we first present the PyGMQL software for interactive and scalable data exploration for genomic data. PyGMQL allows the user to inspect genomic datasets and design complex analysis pipelines. The software presents itself as a easy-to-use Python library and interacts seamlessly with other data analysis packages. We then use the software for the study of chromatin conformation data. We focus on the epigenetic determinants of Topologically Associating Domains (TADs), which are region of high self chromatin interaction. The results of this study highlight the existence of a “grammar of genome folding” which dictates the formation of TADs and boundaries, which is based on the CTCF insulator protein. Finally we focus on the relationship between chromatin conformation and gene expression, designing a graph representation learning model for the prediction of gene co-expression from gene topological features obtained from chromatin conformation data. We demonstrate a correlation between chromatin topology and co-expression, shedding a new light on this debated topic and providing a novel computational framework for the study of co-expression networks.

Download Full-text

MATCHA: Probing Multi-way Chromatin Interaction with Hypergraph Representation Learning

Cell Systems ◽

10.1016/j.cels.2020.04.004 ◽

2020 ◽

Vol 10 (5) ◽

pp. 397-407.e5

Author(s):

Ruochi Zhang ◽

Jian Ma

Keyword(s):

Representation Learning ◽

Chromatin Interaction

Download Full-text

Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation

10.21437/interspeech.2020-2524 ◽

2020 ◽

Author(s):

Sung-Lin Yeh ◽

Yun-Shao Lin ◽

Chi-Chun Lee

Keyword(s):

Emotion Recognition ◽

Representation Learning ◽

Speech Representation ◽

End To End

Download Full-text

Transfer-Representation Learning for Detecting Spoofing Attacks with Converted and Synthesized Speech in Automatic Speaker Verification System

10.21437/interspeech.2019-2014 ◽

2019 ◽

Cited By ~ 2

Author(s):

Su-Yu Chang ◽

Kai-Cheng Wu ◽

Chia-Ping Chen

Keyword(s):

Speaker Verification ◽

Representation Learning ◽

Verification System ◽

Synthesized Speech

Download Full-text

A Drug Target Interaction Prediction Based on LINE-RF Learning

Current Bioinformatics ◽

10.2174/1574893615666191227092453 ◽

2020 ◽

Vol 15 (7) ◽

pp. 750-757

Author(s):

Jihong Wang ◽

Yue Shi ◽

Xiaodan Wang ◽

Huiyou Chang

Keyword(s):

Network Topology ◽

Drug Target ◽

Large Scale ◽

Representation Learning ◽

New Drugs ◽

Combination Method ◽

Learning Methods ◽

Network Representation ◽

On Line ◽

Clinical Experiments

Background: At present, using computer methods to predict drug-target interactions (DTIs) is a very important step in the discovery of new drugs and drug relocation processes. The potential DTIs identified by machine learning methods can provide guidance in biochemical or clinical experiments. Objective: The goal of this article is to combine the latest network representation learning methods for drug-target prediction research, improve model prediction capabilities, and promote new drug development. Methods: We use large-scale information network embedding (LINE) method to extract network topology features of drugs, targets, diseases, etc., integrate features obtained from heterogeneous networks, construct binary classification samples, and use random forest (RF) method to predict DTIs. Results: The experiments in this paper compare the common classifiers of RF, LR, and SVM, as well as the typical network representation learning methods of LINE, Node2Vec, and DeepWalk. It can be seen that the combined method LINE-RF achieves the best results, reaching an AUC of 0.9349 and an AUPR of 0.9016. Conclusion: The learning method based on LINE network can effectively learn drugs, targets, diseases and other hidden features from the network topology. The combination of features learned through multiple networks can enhance the expression ability. RF is an effective method of supervised learning. Therefore, the Line-RF combination method is a widely applicable method.

Download Full-text

Digging for the truth: the case for active annotation in evaluating the credibility of online medical information (Preprint)

10.2196/preprints.25920 ◽

2020 ◽

Author(s):

Mikołaj Morzy ◽

Bartłomiej Balcerzak ◽

Adam Wierzbicki ◽

Adam Wierzbicki

Keyword(s):

Machine Learning ◽

Medical Information ◽

Representation Learning ◽

Training Dataset ◽

Highly Qualified ◽

Human In The Loop ◽

Annotation Process ◽

Comprehensive Framework ◽

Online Sources ◽

The Web

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.

Download Full-text

Scalable One-Pass Self-Representation Learning for Hyperspectral Band Selection

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2019.2890848 ◽

2019 ◽

Vol 57 (7) ◽

pp. 4360-4374 ◽

Cited By ~ 8

Author(s):

Xiaohui Wei ◽

Wen Zhu ◽

Bo Liao ◽

Lijun Cai

Keyword(s):

Representation Learning ◽

Band Selection

Download Full-text

Structure-Augmented Text Representation Learning for Efficient Knowledge Graph Completion

Proceedings of the Web Conference 2021 ◽

10.1145/3442381.3450043 ◽

2021 ◽

Author(s):

Bo Wang ◽

Tao Shen ◽

Guodong Long ◽

Tianyi Zhou ◽

Ying Wang ◽

...

Keyword(s):

Representation Learning ◽

Knowledge Graph ◽

Text Representation

Download Full-text

Attention-based Joint Representation Learning Network for Short text Classification

Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence ◽

10.1145/3404555.3404578 ◽

2020 ◽

Author(s):

Xinyue Liu ◽

Yexuan Tang

Keyword(s):

Text Classification ◽

Representation Learning ◽

Short Text ◽

Learning Network ◽

Joint Representation

Download Full-text