scholarly journals Using Sequence and Structure Information to Annotate Gene and Protein Function

2020 ◽  
Vol 118 (3) ◽  
pp. 44a
Author(s):  
Benjamin R. Litterer ◽  
Kejue Jia ◽  
Sayane Shome ◽  
Robert L. Jernigan
Author(s):  
Amelia Villegas-Morcillo ◽  
Stavros Makrodimitris ◽  
Roeland C H J van Ham ◽  
Angel M Gomez ◽  
Victoria Sanchez ◽  
...  

Abstract Motivation Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. Availability and implementation Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Diego Gauto ◽  
Leandro Estrozi ◽  
Charles Schwieters ◽  
Gregory Effantin ◽  
Pavel Macek ◽  
...  

Atomic-resolution structure determination is the key requirement for understanding protein function. Cryo-EM and NMR spectroscopy both provide structural information, but currently cryo-EM does not routinely give access to atomic-level structural data, and, generally, NMR structure determination is restricted to small (<30 kDa) proteins. We introduce an integrated structure determination approach that simultaneously uses NMR and EM data to overcome the limits of each of these methods. The approach enabled determination of the high-resolution structure of the 468 kDa large dodecameric aminopeptidase TET2 to a precision and accuracy below 1 Angstrom by combining secondary-structure information obtained from near-complete magic-angle-spinning NMR assignments of the 39 kDa-large subunits, distance restraints from backbone amides and specifically labelled methyl groups, and a 4.1 Angstrom resolution EM map. The resulting structure exceeds current standards of NMR and EM structure determination in terms of molecular weight and precision. Importantly, the approach is successful even in cases where only medium-resolution (up to 8 Angstrom) cryo-EM data are available, thus paving avenues for the structure determination of challenging biological assemblies.


2014 ◽  
Vol 1004-1005 ◽  
pp. 853-856
Author(s):  
Hai Xia Long ◽  
Shu Lei Wu ◽  
Yan Lv

Protein structure prediction is a challenging field strongly associated with protein function and evolution determination, which is crucial for biologists. Despite significant process made in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. In this study, we have developed a method for protein structure prediction based on 7-state HMM which can reduce the number of states using secondary structure information about proteins for each fold. The QPSO is an efficient optimization algorithm which is used to train profile HMM. Experiment results show that the proposed method is reasonable.


2021 ◽  
Author(s):  
Boqiao Lai ◽  
Jinbo Xu

Experimental protein function annotation does not scale with the fast-growing sequence databases. Only a tiny fraction (<0.1%) of protein sequences in UniProtKB has experimentally determined functional annotations. Computational methods may predict protein function in a high-throughput way, but its accuracy is not very satisfactory. Based upon recent breakthroughs in protein structure prediction and protein language models, we develop GAT-GO, a graph attention network (GAT) method that may substantially improve protein function prediction by leveraging predicted inter-residue contact graphs and protein sequence embedding. Our experimental results show that GAT-GO greatly outperforms the latest sequence- and structure-based deep learning methods. On the PDB-mmseqs testset where the train and test proteins share <15% sequence identity, GAT-GO yields Fmax(maximum F-score) 0.508, 0.416, 0.501, and AUPRC(area under the precision-recall curve) 0.427, 0.253, 0.411 for the MFO, BPO, CCO ontology domains, respectively, much better than homology-based method BLAST (Fmax 0.117,0.121,0.207 and AUPRC 0.120, 0.120, 0.163). On the PDB-cdhit testset where the training and test proteins share higher sequence identity, GAT-GO obtains Fmax 0.637, 0.501, 0.542 for the MFO, BPO, CCO ontology domains, respectively, and AUPRC 0.662, 0.384, 0.481, significantly exceeding the just-published graph convolution method DeepFRI, which has Fmax 0.542, 0.425, 0.424 and AUPRC 0.313, 0.159, 0.193.


Author(s):  
Kenneth H. Downing ◽  
Hu Meisheng ◽  
Hans-Rudolf Went ◽  
Michael A. O'Keefe

With current advances in electron microscope design, high resolution electron microscopy has become routine, and point resolutions of better than 2Å have been obtained in images of many inorganic crystals. Although this resolution is sufficient to resolve interatomic spacings, interpretation generally requires comparison of experimental images with calculations. Since the images are two-dimensional representations of projections of the full three-dimensional structure, information is invariably lost in the overlapping images of atoms at various heights. The technique of electron crystallography, in which information from several views of a crystal is combined, has been developed to obtain three-dimensional information on proteins. The resolution in images of proteins is severely limited by effects of radiation damage. In principle, atomic-resolution, 3D reconstructions should be obtainable from specimens that are resistant to damage. The most serious problem would appear to be in obtaining high-resolution images from areas that are thin enough that dynamical scattering effects can be ignored.


Author(s):  
J. Gjønnes ◽  
N. Bøe ◽  
K. Gjønnes

Structure information of high precision can be extracted from intentsity details in convergent beam patterns like the one reproduced in Fig 1. From low order reflections for small unit cell crystals,bonding charges, ionicities and atomic parameters can be derived, (Zuo, Spence and O’Keefe, 1988; Zuo, Spence and Høier 1989; Gjønnes, Matsuhata and Taftø, 1989) , but extension to larger unit cell ma seem difficult. The disks must then be reduced in order to avoid overlap calculations will become more complex and intensity features often less distinct Several avenues may be then explored: increased computational effort in order to handle the necessary many-parameter dynamical calculations; use of zone axis intensities at symmetry positions within the CBED disks, as in Figure 2 measurement of integrated intensity across K-line segments. In the last case measurable quantities which are well defined also from a theoretical viewpoint can be related to a two-beam like expression for the intensity profile:With as an effective Fourier potential equated to a gap at the dispersion surface, this intensity can be integrated across the line, with kinematical and dynamical limits proportional to and at low and high thickness respctively (Blackman, 1939).


Author(s):  
Kjersti Gjønnes ◽  
Jon Gjønnes

Electron diffraction intensities can be obtained at large scattering angles (sinθ/λ ≥ 2.0), and thus structure information can be collected in regions of reciprocal space that are not accessable with other diffraction methods. LACBED intensities in this range can be utilized for determination of accurate temperature factors or for refinement of coordinates. Such high index reflections can usually be treated kinematically or as a pertubed two-beam case. Application to Y Ba2Cu3O7 shows that a least square refinememt based on integrated intensities can determine temperature factors or coordinates.LACBED patterns taken in the (00l) systematic row show an easily recognisable pattern of narrow bands from reflections in the range 15 < l < 40 (figure 1). Integrated intensities obtained from measured intensity profiles after subtraction of inelastic background (figure 2) were used in the least square fit for determination of temperature factors and refinement of z-coordinates for the Ba- and Cu-atoms.


Author(s):  
G. Y. Fan ◽  
J. M. Cowley

It is well known that the structure information on the specimen is not always faithfully transferred through the electron microscope. Firstly, the spatial frequency spectrum is modulated by the transfer function (TF) at the focal plane. Secondly, the spectrum suffers high frequency cut-off by the aperture (or effectively damping terms such as chromatic aberration). While these do not have essential effect on imaging crystal periodicity as long as the low order Bragg spots are inside the aperture, although the contrast may be reversed, they may change the appearance of images of amorphous materials completely. Because the spectrum of amorphous materials is continuous, modulation of it emphasizes some components while weakening others. Especially the cut-off of high frequency components, which contribute to amorphous image just as strongly as low frequency components can have a fundamental effect. This can be illustrated through computer simulation. Imaging of a whitenoise object with an electron microscope without TF limitation gives Fig. 1a, which is obtained by Fourier transformation of a constant amplitude combined with random phases generated by computer.


Sign in / Sign up

Export Citation Format

Share Document