scholarly journals Fragon: rapid high-resolution structure determination from ideal protein fragments

2018 ◽  
Vol 74 (3) ◽  
pp. 205-214 ◽  
Author(s):  
Huw T. Jenkins

Correctly positioning ideal protein fragments by molecular replacement presents an attractive method for obtaining preliminary phases when no template structure for molecular replacement is available. This has been exploited in several existing pipelines. This paper presents a new pipeline, namedFragon, in which fragments (ideal α-helices or β-strands) are placed usingPhaserand the phases calculated from these coordinates are then improved by the density-modification methods provided byACORN. The reliable scoring algorithm provided byACORNidentifies success. In these cases, the resulting phases are usually of sufficient quality to enable automated model building of the entire structure.Fragonwas evaluated against two test sets comprising mixed α/β folds and all-β folds at resolutions between 1.0 and 1.7 Å. Success rates of 61% for the mixed α/β test set and 30% for the all-β test set were achieved. In almost 70% of successful runs, fragment placement and density modification took less than 30 min on relatively modest four-core desktop computers. In all successful runs the best set of phases enabled automated model building withARP/wARPto complete the structure.

2015 ◽  
Vol 71 (2) ◽  
pp. 304-312 ◽  
Author(s):  
Rojan Shrestha ◽  
Kam Y. J. Zhang

Ab initiophasing withde novomodels has become a viable approach for structural solution from protein crystallographic diffraction data. This approach takes advantage of the known protein sequence information, predictsde novomodels and uses them for structure determination by molecular replacement. However, even the current state-of-the-artde novomodelling method has a limit as to the accuracy of the model predicted, which is sometimes insufficient to be used as a template for successful molecular replacement. A fragment-assembly phasing method has been developed that starts from an ensemble of low-accuracyde novomodels, disassembles them into fragments, places them independently in the crystallographic unit cell by molecular replacement and then reassembles them into a whole structure that can provide sufficient phase information to enable complete structure determination by automated model building. Tests on ten protein targets showed that the method could solve structures for eight of these targets, although the predictedde novomodels cannot be used as templates for successful molecular replacement since the best model for each target is on average more than 4.0 Å away from the native structure. The method has extended the applicability of theab initiophasing byde novomodels approach. The method can be used to solve structures when the bestde novomodels are still of low accuracy.


Crystals ◽  
2020 ◽  
Vol 10 (4) ◽  
pp. 280
Author(s):  
Maria Cristina Burla ◽  
Benedetta Carrozzini ◽  
Giovanni Luca Cascarano ◽  
Carmelo Giacovazzo ◽  
Giampiero Polidori

Obtaining high-quality models for nucleic acid structures by automated model building programs (AMB) is still a challenge. The main reasons are the rather low resolution of the diffraction data and the large number of rotatable bonds in the main chains. The application of the most popular and documented AMB programs (e.g., PHENIX.AUTOBUILD, NAUTILUS and ARP/wARP) may provide a good assessment of the state of the art. Quite recently, a cyclic automated model building (CAB) package was described; it is a new AMB approach that makes the use of BUCCANEER for protein model building cyclic without modifying its basic algorithms. The applications showed that CAB improves the efficiency of BUCCANEER. The success suggested an extension of CAB to nucleic acids—in particular, to check if cyclically including NAUTILUS in CAB may improve its effectiveness. To accomplish this task, CAB algorithms designed for protein model building were modified to adapt them to the nucleic acid crystallochemistry. CAB was tested using 29 nucleic acids (DNA and RNA fragments). The phase estimates obtained via molecular replacement (MR) techniques were automatically submitted to phase refinement and then used as input for CAB. The experimental results from CAB were compared with those obtained by NAUTILUS, ARP/wARP and PHENIX.AUTOBUILD.


2012 ◽  
Vol 68 (4) ◽  
pp. 391-403 ◽  
Author(s):  
Axel T. Brunger ◽  
Debanu Das ◽  
Ashley M. Deacon ◽  
Joanna Grant ◽  
Thomas C. Terwilliger ◽  
...  

Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.


2019 ◽  
Vol 75 (8) ◽  
pp. 753-763 ◽  
Author(s):  
Grzegorz Chojnowski ◽  
Joana Pereira ◽  
Victor S. Lamzin

The performance of automated model building in crystal structure determination usually decreases with the resolution of the experimental data, and may result in fragmented models and incorrect side-chain assignment. Presented here are new methods for machine-learning-based docking of main-chain fragments to the sequence and for their sequence-independent connection using a dedicated library of protein fragments. The combined use of these new methods noticeably increases sequence coverage and reduces fragmentation of the protein models automatically built with ARP/wARP.


2021 ◽  
Vol 11 (5) ◽  
pp. 2039
Author(s):  
Hyunseok Shin ◽  
Sejong Oh

In machine learning applications, classification schemes have been widely used for prediction tasks. Typically, to develop a prediction model, the given dataset is divided into training and test sets; the training set is used to build the model and the test set is used to evaluate the model. Furthermore, random sampling is traditionally used to divide datasets. The problem, however, is that the performance of the model is evaluated differently depending on how we divide the training and test sets. Therefore, in this study, we proposed an improved sampling method for the accurate evaluation of a classification model. We first generated numerous candidate cases of train/test sets using the R-value-based sampling method. We evaluated the similarity of distributions of the candidate cases with the whole dataset, and the case with the smallest distribution–difference was selected as the final train/test set. Histograms and feature importance were used to evaluate the similarity of distributions. The proposed method produces more proper training and test sets than previous sampling methods, including random and non-random sampling.


2014 ◽  
Vol 529 ◽  
pp. 359-363
Author(s):  
Xi Lei Huang ◽  
Mao Xiang Yi ◽  
Lin Wang ◽  
Hua Guo Liang

A novel concurrent core test approach is proposed to reduce the test cost of SoC. Before test, a novel test set sharing strategy is proposed to obtain a minimum size of merged test set by merging the test sets corresponding to cores under test (CUT).Moreover, it can be used in conjunction with general compression/decompression techniques to further reduce test data volume (TDV). During test, the proposed vector separating device which is composed of a set of simple combinational logical circuit (CLC) is designed for separating the vector from the merged test set to the correspondent test core. This approach does not add any test vector for each core and can test synchronously to reduce test application time (TAT). Experimental results for ISCAS’ 89 benchmarks have been rproven the efficiency of the proposed approach.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 8536-8536
Author(s):  
Gouji Toyokawa ◽  
Fahdi Kanavati ◽  
Seiya Momosaki ◽  
Kengo Tateishi ◽  
Hiroaki Takeoka ◽  
...  

8536 Background: Lung cancer is the leading cause of cancer-related death in many countries, and its prognosis remains unsatisfactory. Since treatment approaches differ substantially based on the subtype, such as adenocarcinoma (ADC), squamous cell carcinoma (SCC) and small cell lung cancer (SCLC), an accurate histopathological diagnosis is of great importance. However, if the specimen is solely composed of poorly differentiated cancer cells, distinguishing between histological subtypes can be difficult. The present study developed a deep learning model to classify lung cancer subtypes from whole slide images (WSIs) of transbronchial lung biopsy (TBLB) specimens, in particular with the aim of using this model to evaluate a challenging test set of indeterminate cases. Methods: Our deep learning model consisted of two separately trained components: a convolutional neural network tile classifier and a recurrent neural network tile aggregator for the WSI diagnosis. We used a training set consisting of 638 WSIs of TBLB specimens to train a deep learning model to classify lung cancer subtypes (ADC, SCC and SCLC) and non-neoplastic lesions. The training set consisted of 593 WSIs for which the diagnosis had been determined by pathologists based on the visual inspection of Hematoxylin-Eosin (HE) slides and of 45 WSIs of indeterminate cases (64 ADCs and 19 SCCs). We then evaluated the models using five independent test sets. For each test set, we computed the receiver operator curve (ROC) area under the curve (AUC). Results: We applied the model to an indeterminate test set of WSIs obtained from TBLB specimens that pathologists had not been able to conclusively diagnose by examining the HE-stained specimens alone. Overall, the model achieved ROC AUCs of 0.993 (confidence interval [CI] 0.971-1.0) and 0.996 (0.981-1.0) for ADC and SCC, respectively. We further evaluated the model using five independent test sets consisting of both TBLB and surgically resected lung specimens (combined total of 2490 WSIs) and obtained highly promising results with ROC AUCs ranging from 0.94 to 0.99. Conclusions: In this study, we demonstrated that a deep learning model could be trained to predict lung cancer subtypes in indeterminate TBLB specimens. The extremely promising results obtained show that if deployed in clinical practice, a deep learning model that is capable of aiding pathologists in diagnosing indeterminate cases would be extremely beneficial as it would allow a diagnosis to be obtained sooner and reduce costs that would result from further investigations.


Author(s):  
Giovanni Luca Cascarano ◽  
Carmelo Giacovazzo

CAB, a recently described automated model-building (AMB) program, has been modified to work effectively with nucleic acids. To this end, several new algorithms have been introduced and the libraries have been updated. To reduce the input average phase error, ligand heavy atoms are now located before starting the CAB interpretation of the electron-density maps. Furthermore, alternative approaches are used depending on whether the ligands belong to the target or to the model chain used in the molecular-replacement step. Robust criteria are then applied to decide whether the AMB model is acceptable or whether it must be modified to fit prior information on the target structure. In the latter case, the model chains are rearranged to fit prior information on the target chains. Here, the performance of the new AMB program CAB applied to various nucleic acid structures is discussed. Other well documented programs such as Nautilus, ARP/wARP and phenix.autobuild were also applied and the experimental results are described.


Author(s):  
André Maletzke ◽  
Waqar Hassan ◽  
Denis dos Reis ◽  
Gustavo Batista

Quantification is a task similar to classification in the sense that it learns from a labeled training set. However, quantification is not interested in predicting the class of each observation, but rather measure the class distribution in the test set. The community has developed performance measures and experimental setups tailored to quantification tasks. Nonetheless, we argue that a critical variable, the size of the test sets, remains ignored. Such disregard has three main detrimental effects. First, it implicitly assumes that quantifiers will perform equally well for different test set sizes. Second, it increases the risk of cherry-picking by selecting a test set size for which a particular proposal performs best. Finally, it disregards the importance of designing methods that are suitable for different test set sizes. We discuss these issues with the support of one of the broadest experimental evaluations ever performed, with three main outcomes. (i) We empirically demonstrate the importance of the test set size to assess quantifiers. (ii) We show that current quantifiers generally have a mediocre performance on the smallest test sets. (iii) We propose a metalearning scheme to select the best quantifier based on the test size that can outperform the best single quantification method.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Wei Yang ◽  
Junkai Zhou

With the advent of the era of big data, great changes have taken place in the insurance industry, gradually entering the field of Internet insurance, and a large amount of insurance data has been accumulated. How to realize the innovation of insurance services through insurance data is crucial to the development of the insurance industry. Therefore, this paper proposes a ciphertext retrieval technology based on attribute encryption (HP-CPABKS) to realize the rapid retrieval and update of insurance data on the premise of ensuring the privacy of insurance information and puts forward an innovative insurance service based on cloud computing. The results show that 97.35% of users are successfully identified in test set A and 98.77% of users are successfully identified in test set B, and the recognition success rate of the four test sets is higher than 97.00%; when the number of challenges is 720, the modified data block is less than 9%; the total number of complaints is reduced from 1300 to 249; 99.19% of users are satisfied with the innovative insurance service; the number of the insured is increased significantly. To sum up, the insurance innovation service based on cloud computing insurance data can improve customer satisfaction, increase the number of policyholders, reduce the number of complaints, and achieve a more successful insurance service innovation. This study provides a reference for the precision marketing of insurance services.


Sign in / Sign up

Export Citation Format

Share Document