When Homologous Sequences Meet Structural Decoys: Accurate Contact Prediction by tFold in CASP14

Author(s):  
Tao Shen ◽  
Jiaxiang Wu ◽  
Haidong Lan ◽  
Liangzhen Zheng ◽  
Jianguo Pei ◽  
...  
2019 ◽  
Vol 36 (7) ◽  
pp. 2105-2112 ◽  
Author(s):  
Chengxin Zhang ◽  
Wei Zheng ◽  
S M Mortuza ◽  
Yang Li ◽  
Yang Zhang

Abstract Motivation The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence alignment (MSA) collection. This is particularly challenging when large whole-genome and metagenome databases are involved. Results We developed DeepMSA, a new open-source method for sensitive MSA construction, which has homologous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. The practical usefulness of the pipeline was examined in three large-scale benchmark experiments based on 614 non-redundant proteins. First, DeepMSA was utilized to generate MSAs for residue-level contact prediction by six coevolution and deep learning-based programs, which resulted in an accuracy increase in long-range contacts by up to 24.4% compared to the default programs. Next, multiple threading programs are performed for homologous structure identification, where the average TM-score of the template alignments has over 7.5% increases with the use of the new DeepMSA profiles. Finally, DeepMSA was used for secondary structure prediction and resulted in statistically significant improvements in the Q3 accuracy. It is noted that all these improvements were achieved without re-training the parameters and neural-network models, demonstrating the robustness and general usefulness of the DeepMSA in protein structural bioinformatics applications, especially for targets without homologous templates in the PDB library. Availability and implementation https://zhanglab.ccmb.med.umich.edu/DeepMSA/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Badri Adhikari

AbstractBackgroundExciting new opportunities have arisen to solve the protein contact prediction problem from the progress in neural networks and the availability of a large number of homologous sequences through high-throughput sequencing. In this work, we study how deep convolutional neural network methods (ConvNets) may be best designed and developed to solve this long-standing problem.MethodWith publicly available datasets, we designed and trained various ConvNet architectures. We tested several recent deep learning techniques including wide residual networks, dropouts, and dilated convolutions. We studied the improvements in the precision of medium-range and long-range contacts, and compared the performance of our best architectures with the ones used in existing state-of-the-art methods.ResultsThe proposed ConvNet architectures predict contacts with significantly more precision than the architectures used in several state-of-the-art methods. When trained using the DeepCov dataset consisting of 3,456 proteins and tested on PSICOV dataset of 150 proteins, our architectures achieve up to 15% higher precision when L/2 long-range contacts are evaluated. Similarly, when trained using the DNCON2 dataset consisting of 1,426 proteins and tested on 84 protein domains in the CASP12 dataset, our single network achieves 4.8% higher precision than the ensembled DNCON2 method when top L long-range contacts are evaluated. DEEPCON will be made publicly available athttps://github.com/badriadhikari/DEEPCON/.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Wendy M. Billings ◽  
Connor J. Morris ◽  
Dennis Della Corte

AbstractThe prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures. We propose that a powerful concept from deep learning, called ensembling, can increase the accuracy of protein contact predictions by combining the outputs of different neural network models. We show that ensembling the predictions made by different groups at the recent Critical Assessment of Protein Structure Prediction (CASP13) outperforms all individual groups. Further, we show that contacts derived from the distance predictions of three additional deep neural networks—AlphaFold, trRosetta, and ProSPr—can be substantially improved by ensembling all three networks. We also show that ensembling these recent deep neural networks with the best CASP13 group creates a superior contact prediction tool. Finally, we demonstrate that two ensembled networks can successfully differentiate between the folds of two highly homologous sequences. In order to build further on these findings, we propose the creation of a better protein contact benchmark set and additional open-source contact prediction methods.


Author(s):  
Kun Lee ◽  
Jingyi Si ◽  
Ricai Han ◽  
Wei Zhang ◽  
Bingbing Tan ◽  
...  

There are more supports for the view that human papillomavirus (HPV) infection might be an etiological factor in the development of cervical cancer when the association of persistent condylomata is considered. Biopsies from 318 cases with squamous cell carcinoma of uterine cervix, 48 with cervical and vulvar condylomata, 14 with cervical intraepithelial neoplasia (CIN), 34 with chronic cervicitis and 24 normal cervical epithelium were collected from 5 geographic regions of China with different cervical cancer mortalities. All specimens were prepared for Dot blot, Southern blot and in situ DNA-DNA hybridizations by using HPV-11, 16, 18 DNA labelled with 32P and 3H as probes to detect viral homologous sequences in samples. Among them, 32 cases with cervical cancer, 27 with condyloma and 10 normal cervical epitheliums were randomly chosen for comparative EM observation. The results showed that: 1), 192 out of 318 (60.4%) cases of cervical cancer were positive for HPV-16 DNA probe (Table I)


Genetics ◽  
1996 ◽  
Vol 144 (1) ◽  
pp. 317-328 ◽  
Author(s):  
Sheri P Kernodle ◽  
John G Scandalios

Abstract Two highly similar cytosolic Cu/Zn Sod (Sod4 and Sod4A) genes have been isolated from maize. Sod4A contains eight exons and seven introns. The Sod4 partial sequence contains five introns. The introns in both genes are located in the same position and have highly homologous sequences in several regions. The largest intron (>1200 bp) interrupts the 5′ leader sequence. The presence of different regulatory motifs in the promoter region of each gene may indicate distinct responses to various conditions. Zymogram and RNA blot analyses show that Sod4 and Sod4A are expressed in all tissues of the maize plant. The developmental profiles of Sod4 and Sod4A mRNA accumulation differ in scutella during sporophytic development. RNA blot analysis of the respective Sod mRNAs indicates a differential, tissue-specific response of each gene to certain stressors. RNA isolated from stem tissue of ethephon-treated seedlings shows an increase in the Sod4 but not the Sod4A transcript while there is no change in transcripts of either gene in leaves or roots. There is differential mRNA accumulation between the two genes in leaf and stem tissue of paraquat-treated seedlings. Other agents that can cause oxidative stress were also tested for differential expression of the genes.


Genetics ◽  
1997 ◽  
Vol 145 (3) ◽  
pp. 563-572 ◽  
Author(s):  
Takafumi Mukaihara ◽  
Masatoshi Enomoto

Deletion formation between the 5′-mostly homologous sequences and between the 3′-homeologous sequences of the two Salmonella typhimurium flagellin genes was examined using plasmid-based deletion-detection systems in various Escherichia coli genetic backgrounds. Deletions in plasmid pLC103 occur between the 5′ sequences, but not between the 3′ sequences, in both RecA-independent and RecA-dependent ways. Because the former is predominant, deletion formation in a recA background depends on the length of homologous sequences between the two genes. Deletion rates were enhanced 30- to 50-fold by the mismatch repair defects, mutS, mutL and uvrD, and 250-fold by the ssb-3 allele, but the effect of the mismatch defects was canceled by the ΔrecA allele. Rates of the deletion between the 3′ sequences in plasmid pLC107 were enhanced 17- to 130-fold by ssb alleles, but not by other alleles. For deletions in pLC107, 96% of the endpoints in the recA+ background and 88% in ΔrecA were in the two hot spots of the 60- and 33-nucleotide (nt) homologous sequences, whereas in the ssb-3 background >50% of the endpoints were in four- to 14-nt direct repeats dispersed in the entire 3′ sequences. The deletion formation between the homeologous sequences is RecA-independent but depends on the length of consecutive homologies. The mutant ssb allele lowers this dependency and results in the increase in deletion rates. Roles of mutant SSB are discussed with relation to misalignment in replication slippage.


Sign in / Sign up

Export Citation Format

Share Document