scholarly journals Deep-learning with synthetic data enables automated picking of cryo-EM particle images of biological macromolecules

Author(s):  
Ruijie Yao ◽  
Jiaqiang Qian ◽  
Qiang Huang

Abstract Motivation Single-particle cryo-electron microscopy (cryo-EM) has become a powerful technique for determining 3D structures of biological macromolecules at near-atomic resolution. However, this approach requires picking huge numbers of macromolecular particle images from thousands of low-contrast, high-noisy electron micrographs. Although machine-learning methods were developed to get rid of this bottleneck, it still lacks universal methods that could automatically picking the noisy cryo-EM particles of various macromolecules. Results Here, we present a deep-learning segmentation model that employs fully convolutional networks trained with synthetic data of known 3D structures, called PARSED (PARticle SEgmentation Detector). Without using any experimental information, PARSED could automatically segment the cryo-EM particles in a whole micrograph at a time, enabling faster particle picking than previous template/feature-matching and particle-classification methods. Applications to six large public cryo-EM datasets clearly validated its universal ability to pick macromolecular particles of various sizes. Thus, our deep-learning method could break the particle-picking bottleneck in the single-particle analysis, and thereby accelerates the high-resolution structure determination by cryo-EM. Availability and implementation The PARSED package and user manual for noncommercial use are available as Supplementary Material (in the compressed file: parsed_v1.zip). Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Vol 294 (5) ◽  
pp. 1602-1608 ◽  
Author(s):  
Xiunan Yi ◽  
Eric J. Verbeke ◽  
Yiran Chang ◽  
Daniel J. Dickinson ◽  
David W. Taylor

Cryo-electron microscopy (cryo-EM) has become an indispensable tool for structural studies of biological macromolecules. Two additional predominant methods are available for studying the architectures of multiprotein complexes: 1) single-particle analysis of purified samples and 2) tomography of whole cells or cell sections. The former can produce high-resolution structures but is limited to highly purified samples, whereas the latter can capture proteins in their native state but has a low signal-to-noise ratio and yields lower-resolution structures. Here, we present a simple, adaptable method combining microfluidic single-cell extraction with single-particle analysis by EM to characterize protein complexes from individual Caenorhabditis elegans embryos. Using this approach, we uncover 3D structures of ribosomes directly from single embryo extracts. Moreover, we investigated structural dynamics during development by counting the number of ribosomes per polysome in early and late embryos. This approach has significant potential applications for counting protein complexes and studying protein architectures from single cells in developmental, evolutionary, and disease contexts.


Crystals ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. 580
Author(s):  
Victor R.A. Dubach ◽  
Albert Guskov

X-ray crystallography and single-particle analysis cryogenic electron microscopy are essential techniques for uncovering the three-dimensional structures of biological macromolecules. Both techniques rely on the Fourier transform to calculate experimental maps. However, one of the crucial parameters, resolution, is rather broadly defined. Here, the methods to determine the resolution in X-ray crystallography and single-particle analysis are summarized. In X-ray crystallography, it is becoming increasingly more common to include reflections discarded previously by traditionally used standards, allowing for the inclusion of incomplete and anisotropic reflections into the refinement process. In general, the resolution is the smallest lattice spacing given by Bragg’s law for a particular set of X-ray diffraction intensities; however, typically the resolution is truncated by the user during the data processing based on certain parameters and later it is used during refinement. However, at which resolution to perform such a truncation is not always clear and this makes it very confusing for the novices entering the structural biology field. Furthermore, it is argued that the effective resolution should be also reported as it is a more descriptive measure accounting for anisotropy and incompleteness of the data. In single particle cryo-EM, the situation is not much better, as multiple ways exist to determine the resolution, such as Fourier shell correlation, spectral signal-to-noise ratio and the Fourier neighbor correlation. The most widely accepted is the Fourier shell correlation using a threshold of 0.143 to define the resolution (so-called “gold-standard”), although it is still debated whether this is the correct threshold. Besides, the resolution obtained from the Fourier shell correlation is an estimate of varying resolution across the density map. In reality, the interpretability of the map is more important than the numerical value of the resolution.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Takanori Nakane ◽  
Dari Kimanius ◽  
Erik Lindahl ◽  
Sjors HW Scheres

Macromolecular complexes that exhibit continuous forms of structural flexibility pose a challenge for many existing tools in cryo-EM single-particle analysis. We describe a new tool, called multi-body refinement, which models flexible complexes as a user-defined number of rigid bodies that move independently from each other. Using separate focused refinements with iteratively improved partial signal subtraction, the new tool generates improved reconstructions for each of the defined bodies in a fully automated manner. Moreover, using principal component analysis on the relative orientations of the bodies over all particle images in the data set, we generate movies that describe the most important motions in the data. Our results on two test cases, a cytoplasmic ribosome from Plasmodium falciparum, and the spliceosomal B-complex from yeast, illustrate how multi-body refinement can be useful to gain unique insights into the structure and dynamics of large and flexible macromolecular complexes.


2020 ◽  
Vol 21 (S21) ◽  
Author(s):  
Adil Al-Azzawi ◽  
Anes Ouadou ◽  
Ye Duan ◽  
Jianlin Cheng

Abstract Background Cryo-EM data generated by electron tomography (ET) contains images for individual protein particles in different orientations and tilted angles. Individual cryo-EM particles can be aligned to reconstruct a 3D density map of a protein structure. However, low contrast and high noise in particle images make it challenging to build 3D density maps at intermediate to high resolution (1–3 Å). To overcome this problem, we propose a fully automated cryo-EM 3D density map reconstruction approach based on deep learning particle picking. Results A perfect 2D particle mask is fully automatically generated for every single particle. Then, it uses a computer vision image alignment algorithm (image registration) to fully automatically align the particle masks. It calculates the difference of the particle image orientation angles to align the original particle image. Finally, it reconstructs a localized 3D density map between every two single-particle images that have the largest number of corresponding features. The localized 3D density maps are then averaged to reconstruct a final 3D density map. The constructed 3D density map results illustrate the potential to determine the structures of the molecules using a few samples of good particles. Also, using the localized particle samples (with no background) to generate the localized 3D density maps can improve the process of the resolution evaluation in experimental maps of cryo-EM. Tested on two widely used datasets, Auto3DCryoMap is able to reconstruct good 3D density maps using only a few thousand protein particle images, which is much smaller than hundreds of thousands of particles required by the existing methods. Conclusions We design a fully automated approach for cryo-EM 3D density maps reconstruction (Auto3DCryoMap). Instead of increasing the signal-to-noise ratio by using 2D class averaging, our approach uses 2D particle masks to produce locally aligned particle images. Auto3DCryoMap is able to accurately align structural particle shapes. Also, it is able to construct a decent 3D density map from only a few thousand aligned particle images while the existing tools require hundreds of thousands of particle images. Finally, by using the pre-processed particle images, Auto3DCryoMap reconstructs a better 3D density map than using the original particle images.


2018 ◽  
Author(s):  
M. Kazemi ◽  
C. O. S. Sorzano ◽  
A. Des Georges ◽  
J. M. Carazo ◽  
J. Vargas

AbstractCryo-electron microscopy using single particle analysis requires the computational averaging of thousands of projection images captured from identical macromolecules. However, macromolecules usually present some degree of flexibility showing different conformations. Computational approaches are then required to classify heterogeneous single particle images into homogeneous sets corresponding to different structural states. Nonetheless, sometimes the attainable resolution of reconstructions obtained from these smaller homogeneous sets is compromised because of reduced number of particles or lack of images at certain macromolecular orientations. In these situations, the current solution to improve map resolution is returning to the electron microscope and collect more data. In this work, we present a fast approach to partially overcome this limitation for heterogeneous data sets. Our method is based on deforming and then moving particles between different conformations using an optical flow approach. Particles are then merged into a unique conformation obtaining reconstructions with improved resolution, contrast and signal-to-noise ratio, then, partially circumventing many issues that impact obtaining high quality reconstructions from small data sets. We present experimental results that show clear improvements in the quality of obtained 3D maps, however, there are also limits to this approach, which we discuss in the manuscript.


Author(s):  
Ali Punjani ◽  
Haowei Zhang ◽  
David J. Fleet

AbstractSingle particle cryo-EM is a powerful method for studying proteins and other biological macromolecules. Many of these molecules comprise regions with varying structural properties including disorder, flexibility, and partial occupancy. These traits make computational 3D reconstruction from 2D images challenging. Detergent micelles and lipid nanodiscs, used to keep membrane proteins in solution, are common examples of locally disordered structures that can negatively affect existing iterative refinement algorithms which assume rigidity (or spatial uniformity). We introduce a cross-validation approach to derive non-uniform refinement, an algorithm that automatically regularizes 3D density maps during iterative refinement to account for spatial variability, yielding dramatically improved resolution and 3D map quality. We find that in common iterative refinement methods, regularization using spatially uniform filtering operations can simultaneously over- and under-regularize local regions of a 3D map. In contrast, non-uniform refinement removes noise in disordered regions while retaining signal useful for aligning particle images. Our results include state-of-the-art resolution 3D reconstructions of multiple membrane proteins with molecular weight as low as 90kDa. These results demonstrate that higher resolutions and improved 3D density map quality can be achieved even for small membrane proteins, an important use case for single particle cryo-EM, both in structural biology and drug discovery. Non-uniform refinement is implemented in the cryoSPARC software package and has already been used successfully in several notable structural studies.


2020 ◽  
Vol 36 (20) ◽  
pp. 5045-5053
Author(s):  
Moritz Hess ◽  
Maren Hackenberg ◽  
Harald Binder

Abstract Motivation Following many successful applications to image data, deep learning is now also increasingly considered for omics data. In particular, generative deep learning not only provides competitive prediction performance, but also allows for uncovering structure by generating synthetic samples. However, exploration and visualization is not as straightforward as with image applications. Results We demonstrate how log-linear models, fitted to the generated, synthetic data can be used to extract patterns from omics data, learned by deep generative techniques. Specifically, interactions between latent representations learned by the approaches and generated synthetic data are used to determine sets of joint patterns. Distances of patterns with respect to the distribution of latent representations are then visualized in low-dimensional coordinate systems, e.g. for monitoring training progress. This is illustrated with simulated data and subsequently with cortical single-cell gene expression data. Using different kinds of deep generative techniques, specifically variational autoencoders and deep Boltzmann machines, the proposed approach highlights how the techniques uncover underlying structure. It facilitates the real-world use of such generative deep learning techniques to gain biological insights from omics data. Availability and implementation The code for the approach as well as an accompanying Jupyter notebook, which illustrates the application of our approach, is available via the GitHub repository: https://github.com/ssehztirom/Exploring-generative-deep-learning-for-omics-data-by-using-log-linear-models. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Xiunan Yi ◽  
Eric J. Verbeke ◽  
Yiran Chang ◽  
Daniel J. Dickinson ◽  
David W. Taylor

AbstractCryo-electron microscopy has become an indispensable tool for structural studies of biological macromolecules. There are two predominant methods for studying the architectures of multi-protein complexes: (1) single particle analysis of purified samples and (2) tomography of whole cells or cell sections. The former can produce high-resolution structures but is limited to highly purified samples, while the latter can capture proteins in their native state but is hindered by a low signal-to-noise ratio and results in lower-resolution structures. Here, we present a method combining microfluidic single cell extraction with single particle analysis by electron microscopy to characterize protein complexes from individual C. elegans embryos. Using this approach, we uncover three-dimensional structures of ribosomes directly from single embryo extracts. In addition, we investigate structural dynamics during development by counting the number of ribosomes per polysome in early and late embyros. This approach has significant potential applications for counting protein complexes and studying protein architectures from single cells in developmental, evolutionary and disease contexts.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Blesson George ◽  
Anshul Assaiya ◽  
Robin J. Roy ◽  
Ajit Kembhavi ◽  
Radha Chauhan ◽  
...  

AbstractParticle identification and selection, which is a prerequisite for high-resolution structure determination of biological macromolecules via single-particle cryo-electron microscopy poses a major bottleneck for automating the steps of structure determination. Here, we present a generalized deep learning tool, CASSPER, for the automated detection and isolation of protein particles in transmission microscope images. This deep learning tool uses Semantic Segmentation and a collection of visually prepared training samples to capture the differences in the transmission intensities of protein, ice, carbon, and other impurities found in the micrograph. CASSPER is a semantic segmentation based method that does pixel-level classification and completely eliminates the need for manual particle picking. Integration of Contrast Limited Adaptive Histogram Equalization (CLAHE) in CASSPER enables high-fidelity particle detection in micrographs with variable ice thickness and contrast. A generalized CASSPER model works with high efficiency on unseen datasets and can potentially pick particles on-the-fly, enabling data processing automation.


Sign in / Sign up

Export Citation Format

Share Document