scholarly journals Modeling the Three-Dimensional Chromatin Structure from Hi-C Data with Transfer Learning

2021 ◽  
Author(s):  
Tristan Meynier Georges ◽  
Maria Anna Rapsomaniki

Recent studies have revealed the importance of three-dimensional (3D) chromatin structure in the regulation of vital biological processes. Contrary to protein folding, no experimental procedure that can directly determine ground-truth 3D chromatin coordinates exists. Instead, chromatin conformation is studied implicitly using high-throughput chromosome conformation capture (Hi-C) methods that quantify the frequency of all pairwise chromatin contacts. Computational methods that infer the 3D chromatin structure from Hi-C data are thus unsupervised, and limited by the assumption that contact frequency determines Euclidean distance. Inspired by recent developments in deep learning, in this work we explore the idea of transfer learning to address the crucial lack of ground-truth data for 3D chromatin structure inference. We present a novel method, Transfer learning Encoder for CHromatin 3D structure prediction (TECH-3D) that combines transfer learning with creative data generation procedures to reconstruct chromatin structure. Our work outperforms previous deep learning attempts for chromatin structure inference and exhibits similar results as state-of-the-art algorithms on many tests, without making any assumptions on the relationship between contact frequencies and Euclidean distances. Above all, TECH-3D presents a highly creative and novel approach, paving the way for future deep learning models.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kh Tohidul Islam ◽  
Sudanthi Wijewickrema ◽  
Stephen O’Leary

AbstractImage registration is a fundamental task in image analysis in which the transform that moves the coordinate system of one image to another is calculated. Registration of multi-modal medical images has important implications for clinical diagnosis, treatment planning, and image-guided surgery as it provides the means of bringing together complimentary information obtained from different image modalities. However, since different image modalities have different properties due to their different acquisition methods, it remains a challenging task to find a fast and accurate match between multi-modal images. Furthermore, due to reasons such as ethical issues and need for human expert intervention, it is difficult to collect a large database of labelled multi-modal medical images. In addition, manual input is required to determine the fixed and moving images as input to registration algorithms. In this paper, we address these issues and introduce a registration framework that (1) creates synthetic data to augment existing datasets, (2) generates ground truth data to be used in the training and testing of algorithms, (3) registers (using a combination of deep learning and conventional machine learning methods) multi-modal images in an accurate and fast manner, and (4) automatically classifies the image modality so that the process of registration can be fully automated. We validate the performance of the proposed framework on CT and MRI images of the head obtained from a publicly available registration database.


Algorithms ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 212
Author(s):  
Youssef Skandarani ◽  
Pierre-Marc Jodoin ◽  
Alain Lalande

Deep learning methods are the de facto solutions to a multitude of medical image analysis tasks. Cardiac MRI segmentation is one such application, which, like many others, requires a large number of annotated data so that a trained network can generalize well. Unfortunately, the process of having a large number of manually curated images by medical experts is both slow and utterly expensive. In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated data sets on which machine learning can successfully be trained. To do so, we gauged the performance of three segmentation models, namely U-Net, Attention U-Net, and ENet, trained with different loss functions on expert and non-expert ground truth for cardiac cine–MRI segmentation. Evaluation was done with classic segmentation metrics (Dice index and Hausdorff distance) as well as clinical measurements, such as the ventricular ejection fractions and the myocardial mass. The results reveal that generalization performances of a segmentation neural network trained on non-expert ground truth data is, to all practical purposes, as good as that trained on expert ground truth data, particularly when the non-expert receives a decent level of training, highlighting an opportunity for the efficient and cost-effective creation of annotations for cardiac data sets.


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Benjamin Zahneisen ◽  
Matus Straka ◽  
Shalini Bammer ◽  
Greg Albers ◽  
Roland Bammer

Introduction: Ruling out hemorrhage (stroke or traumatic) prior to administration of thrombolytics is critical for Code Strokes. A triage software that identifies hemorrhages on head CTs and alerts radiologists would help to streamline patient care and increase diagnostic confidence and patient safety. ML approach: We trained a deep convolutional network with a hybrid 3D/2D architecture on unenhanced head CTs of 805 patients. Our training dataset comprised 348 positive hemorrhage cases (IPH=245, SAH=67, Sub/Epi-dural=70, IVH=83) (128 female) and 457 normal controls (217 female). Lesion outlines were drawn by experts and stored as binary masks that were used as ground truth data during the training phase (random 80/20 train/test split). Diagnostic sensitivity and specificity were defined on a per patient study level, i.e. a single, binary decision for presence/absence of a hemorrhage on a patient’s CT scan. Final validation was performed in 380 patients (167 positive). Tool: The hemorrhage detection module was prototyped in Python/Keras. It runs on a local LINUX server (4 CPUs, no GPUs) and is embedded in a larger image processing platform dedicated to stroke. Results: Processing time for a standard whole brain CT study (3-5mm slices) was around 2min. Upon completion, an instant notification (by email and/or mobile app) was sent to users to alert them about the suspected presence of a hemorrhage. Relative to neuroradiologist gold standard reads the algorithm’s sensitivity and specificity is 90.4% and 92.5% (95% CI: 85%-94% for both). Detection of acute intracranial hemorrhage can be automatized by deploying deep learning. It yielded very high sensitivity/specificity when compared to gold standard reads by a neuroradiologist. Volumes as small as 0.5mL could be detected reliably in the test dataset. The software can be deployed in busy practices to prioritize worklists and alert health care professionals to speed up therapeutic decision processes and interventions.


Sequencing ◽  
2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Amitava Moulick ◽  
Debashis Mukhopadhyay ◽  
Shonima Talapatra ◽  
Nirmalya Ghoshal ◽  
Sarmistha Sen Raychaudhuri

Plantago ovata Forsk is a medicinally important plant. Metallothioneins are cysteine rich proteins involved in the detoxification of heavy metals. Molecular cloning and modeling of MT from P. ovata is not reported yet. The present investigation will describe the isolation, structure prediction, characterization, and expression under copper stress of type 2 metallothionein (MT2) from this species. The gene of the protein comprises three exons and two introns. The deduced protein sequence contains 81 amino acids with a calculated molecular weight of about 8.1 kDa and a theoretical pI value of 4.77. The transcript level of this protein was increased in response to copper stress. Homology modeling was used to construct a three-dimensional structure of P. ovata MT2. The 3D structure model of P. ovata MT2 will provide a significant clue for further structural and functional study of this protein.


2019 ◽  
Author(s):  
◽  
Oluwatosin Oluwadare

Sixteen years after the sequencing of the human genome, the Human Genome Project (HGP), and 17 years after the introduction of Chromosome Conformation Capture (3C) technologies, three-dimensional (3-D) inference and big data remains problematic in the field of genomics, and specifically, in the field of 3C data analysis. Three-dimensional inference involves the reconstruction of a genome's 3D structure or, in some cases, ensemble of structures from contact interaction frequencies extracted from a variant of the 3C technology called the Hi-C technology. Further questions remain about chromosome topology and structure; enhancer-promoter interactions; location of genes, gene clusters, and transcription factors; the relationship between gene expression and epigenetics; and chromosome visualization at a higher scale, among others. In this dissertation, four major contributions are described, first, 3DMax, a tool for chromosome and genome 3-D structure prediction from H-C data using optimization algorithm, second, GSDB, a comprehensive and common repository that contains 3D structures for Hi-C datasets from novel 3D structure reconstruction tools developed over the years, third, ClusterTAD, a method for topological associated domains (TAD) extraction from Hi-C data using unsupervised learning algorithm. Finally, we introduce a tool called, GenomeFlow, a comprehensive graphical tool to facilitate the entire process of modeling and analysis of 3D genome organization. It is worth noting that GenomeFlow and GSDB are the first of their kind in the 3D chromosome and genome research field. All the methods are available as software tools that are freely available to the scientific community.


2019 ◽  
Author(s):  
Aminur Rab Ratul ◽  
Marcel Turcotte ◽  
M. Hamed Mozaffari ◽  
WonSook Lee

AbstractProtein secondary structure is crucial to create an information bridge between the primary structure and the tertiary (3D) structure. Precise prediction of 8-state protein secondary structure (PSS) significantly utilized in the structural and functional analysis of proteins in bioinformatics. In this recent period, deep learning techniques have been applied in this research area and raise the Q8 accuracy remarkably. Nevertheless, from a theoretical standpoint, there still lots of room for improvement, specifically in 8-state (Q8) protein secondary structure prediction. In this paper, we presented two deep learning architecture, namely 1D-Inception and BD-LSTM, to improve the performance of 8-classes PSS prediction. The input of these two architectures is a carefully constructed feature matrix from the sequence features and profile features of the proteins. Firstly, 1D-Inception is a Deep convolutional neural network-based approach that was inspired by the InceptionV3 model and containing three inception modules. Secondly, BD-LSTM is a recurrent neural network model which including bidirectional LSTM layers. Our proposed 1D-Inception method achieved 76.65%, 71.18%, 76.86%, and 74.07% Q8 accuracy respectively on benchmark CullPdb6133, CB513, CASP10, and CASP11 datasets. Moreover, BD-LSTM acquired 74.71%, 69.49%, 74.07%, and 72.37% state-8 accuracy after evaluated on CullPdb6133, CB513, CASP10, and CASP11 datasets, respectively. Both these architectures enable the efficient processing of local and global interdependencies between amino acids to make an accurate prediction of each class is very beneficial in the deep neural network. To the best of our knowledge, experiment results of the 1D-Inception model demonstrate that it outperformed all the state-of-art methods on the benchmark CullPdb6133, CB513, and CASP10 datasets.


Author(s):  
Johannes Thomsen ◽  
Magnus B. Sletfjerding ◽  
Stefano Stella ◽  
Bijoya Paul ◽  
Simon Bo Jensen ◽  
...  

AbstractSingle molecule Förster Resonance energy transfer (smFRET) is a mature and adaptable method for studying the structure of biomolecules and integrating their dynamics into structural biology. The development of high throughput methodologies and the growth of commercial instrumentation have outpaced the development of rapid, standardized, and fully automated methodologies to objectively analyze the wealth of produced data. Here we present DeepFRET, an automated standalone solution based on deep learning, where the only crucial human intervention in transiting from raw microscope images to histogram of biomolecule behavior, is a user-adjustable quality threshold. Integrating all standard features of smFRET analysis, DeepFRET will consequently output common kinetic information metrics for biomolecules. We validated the utility of DeepFRET by performing quantitative analysis on simulated, ground truth, data and real smFRET data. The accuracy of classification by DeepFRET outperformed human operators and current commonly used hard threshold and reached >95% precision accuracy only requiring a fraction of the time (<1% as compared to human operators) on ground truth data. Its flawless and rapid operation on real data demonstrates its wide applicability. This level of classification was achieved without any preprocessing or parameter setting by human operators, demonstrating DeepFRET’s capacity to objectively quantify biomolecular dynamics. The provided a standalone executable based on open source code capitalises on the widespread adaptation of machine learning and may contribute to the effort of benchmarking smFRET for structural biology insights.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7893 ◽  
Author(s):  
Simone Macrì ◽  
Romain J.G. Clément ◽  
Chiara Spinello ◽  
Maurizio Porfiri

Zebrafish (Danio rerio) have recently emerged as a valuable laboratory species in the field of behavioral pharmacology, where they afford rapid and precise high-throughput drug screening. Although the behavioral repertoire of this species manifests along three-dimensional (3D), most of the efforts in behavioral pharmacology rely on two-dimensional (2D) projections acquired from a single overhead or front camera. We recently showed that, compared to a 3D scoring approach, 2D analyses could lead to inaccurate claims regarding individual and social behavior of drug-free experimental subjects. Here, we examined whether this conclusion extended to the field of behavioral pharmacology by phenotyping adult zebrafish, acutely exposed to citalopram (30, 50, and 100 mg/L) or ethanol (0.25%, 0.50%, and 1.00%), in the novel tank diving test over a 6-min experimental session. We observed that both compounds modulated the time course of general locomotion and anxiety-related profiles, the latter being represented by specific behaviors (erratic movements and freezing) and avoidance of anxiety-eliciting areas of the test tank (top half and distance from the side walls). We observed that 2D projections of 3D trajectories (ground truth data) may introduce a source of unwanted variation in zebrafish behavioral phenotyping. Predictably, both 2D views underestimate absolute levels of general locomotion. Additionally, while data obtained from a camera positioned on top of the experimental tank are similar to those obtained from a 3D reconstruction, 2D front view data yield false negative findings.


2020 ◽  
Author(s):  
Aashish Jain ◽  
Genki Terashi ◽  
Yuki Kagaya ◽  
Sai Raghavendra Maddhuri Venkata Subramaniya ◽  
Charles Christoffer ◽  
...  

ABSTRACTProtein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. The model is trained in a multi-task fashion to also predict backbone and orientation angles further improving the inter-residue distance prediction. We show that AttentiveDist outperforms the top methods for contact prediction in the CASP13 structure prediction competition. To aid in structure modeling we also developed two new deep learning-based sidechain center distance and peptide-bond nitrogen-oxygen distance prediction models. Together these led to a 12% increase in TM-score from the best server method in CASP13 for structure prediction.


Sign in / Sign up

Export Citation Format

Share Document