scholarly journals ProTstab – predictor for cellular protein stability

BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Yang Yang ◽  
Xuesong Ding ◽  
Guanchen Zhu ◽  
Abhishek Niroula ◽  
Qiang Lv ◽  
...  

Abstract Background Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples. Results We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well suited for large scale prediction of protein stabilities. Conclusions The Pearson’s correlation coefficient was 0.793 in 10-fold cross validation and 0.763 in independent blind test. The corresponding values for mean absolute error are 0.024 and 0.036, respectively. Comparison with a previously published method indicated ProTstab to have superior performance. We used the method to predict stabilities of all the remaining proteins in the entire human proteome and then correlated the predicted stabilities to protein chain lengths of isoforms and to localizations of proteins.

2021 ◽  
Vol 22 (15) ◽  
pp. 8027
Author(s):  
Yang Yang ◽  
Lianjie Zeng ◽  
Mauno Vihinen

Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein–solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.


Author(s):  
S. Blaser ◽  
J. Meyer ◽  
S. Nebiker ◽  
L. Fricker ◽  
D. Weber

Abstract. Advances in digitalization technologies lead to rapid and massive changes in infrastructure management. New collaborative processes and workflows require detailed, accurate and up-to-date 3D geodata. Image-based web services with 3D measurement functionality, for example, transfer dangerous and costly inspection and measurement tasks from the field to the office workplace. In this contribution, we introduced an image-based backpack mobile mapping system and new georeferencing methods for capture previously inaccessible outdoor locations. We carried out large-scale performance investigations at two different test sites located in a city centre and in a forest area. We compared the performance of direct, SLAM-based and image-based georeferencing under demanding real-world conditions. Both test sites include areas with restricted GNSS reception, poor illumination, and uniform or ambiguous geometry, which create major challenges for reliable and accurate georeferencing. In our comparison of georeferencing methods, image-based georeferencing improved the median precision of coordinate measurement over direct georeferencing by a factor of 10–15 to 3 mm. Image-based georeferencing also showed a superior performance in terms of absolute accuracies with results in the range from 4.3 cm to 13.2 cm. Our investigations showed a great potential for complementing 3D image-based geospatial web-services of cities as well as for creating such web services for forest applications. In addition, such accurately georeferenced 3D imagery has an enormous potential for future visual localization and augmented reality applications.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Akihito Kikuchi ◽  
Toshimichi Ikemura ◽  
Takashi Abe

With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method’s suitability for efficient knowledge discovery from big sequence data.


Nanoscale ◽  
2014 ◽  
Vol 6 (6) ◽  
pp. 3268-3273 ◽  
Author(s):  
Zhongchao bai ◽  
Zhicheng Ju ◽  
Chunli Guo ◽  
Yitai Qian ◽  
Bin Tang ◽  
...  

3D hierarchical mesoporous NiO microspheres were scalably synthesized by a thermal decomposition method; they exhibit superior performance as anode materials for LIBs.


2021 ◽  
Vol 11 (22) ◽  
pp. 10803
Author(s):  
Jiagang Song ◽  
Yunwu Lin ◽  
Jiayu Song ◽  
Weiren Yu ◽  
Leyuan Zhang

Mass multimedia data with geographical information (geo-multimedia) are collected and stored on the Internet due to the wide application of location-based services (LBS). How to find the high-level semantic relationship between geo-multimedia data and construct efficient index is crucial for large-scale geo-multimedia retrieval. To combat this challenge, the paper proposes a deep cross-modal hashing framework for geo-multimedia retrieval, termed as Triplet-based Deep Cross-Modal Retrieval (TDCMR), which utilizes deep neural network and an enhanced triplet constraint to capture high-level semantics. Besides, a novel hybrid index, called TH-Quadtree, is developed by combining cross-modal binary hash codes and quadtree to support high-performance search. Extensive experiments are conducted on three common used benchmarks, and the results show the superior performance of the proposed method.


2014 ◽  
Vol 22 (2) ◽  
pp. 59-74 ◽  
Author(s):  
Alex D. Breslow ◽  
Ananta Tiwari ◽  
Martin Schulz ◽  
Laura Carrington ◽  
Lingjia Tang ◽  
...  

Co-location, where multiple jobs share compute nodes in large-scale HPC systems, has been shown to increase aggregate throughput and energy efficiency by 10–20%. However, system operators disallow co-location due to fair-pricing concerns, i.e., a pricing mechanism that considers performance interference from co-running jobs. In the current pricing model, application execution time determines the price, which results in unfair prices paid by the minority of users whose jobs suffer from co-location. This paper presents POPPA, a runtime system that enables fair pricing by delivering precise online interference detection and facilitates the adoption of supercomputers with co-locations. POPPA leverages a novel shutter mechanism – a cyclic, fine-grained interference sampling mechanism to accurately deduce the interference between co-runners – to provide unbiased pricing of jobs that share nodes. POPPA is able to quantify inter-application interference within 4% mean absolute error on a variety of co-located benchmark and real scientific workloads.


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Yanhua Li ◽  
Kui Xiao ◽  
Cong Huang ◽  
Jin Wang ◽  
Ming Gao ◽  
...  

Abstract Potassium-ion batteries (PIBs) are attractive for grid-scale energy storage due to the abundant potassium resource and high energy density. The key to achieving high-performance and large-scale energy storage technology lies in seeking eco-efficient synthetic processes to the design of suitable anode materials. Herein, a spherical sponge-like carbon superstructure (NCS) assembled by 2D nanosheets is rationally and efficiently designed for K+ storage. The optimized NCS electrode exhibits an outstanding rate capability, high reversible specific capacity (250 mAh g−1 at 200 mA g−1 after 300 cycles), and promising cycling performance (205 mAh g−1 at 1000 mA g−1 after 2000 cycles). The superior performance can be attributed to the unique robust spherical structure and 3D electrical transfer network together with nitrogen-rich nanosheets. Moreover, the regulation of the nitrogen doping types and morphology of NCS-5 is also discussed in detail based on the experiments results and density functional theory calculations. This strategy for manipulating the structure and properties of 3D materials is expected to meet the grand challenges for advanced carbon materials as high-performance PIB anodes in practical applications.


2021 ◽  
Vol 13 (24) ◽  
pp. 13782
Author(s):  
Soyoung Park ◽  
Sanghun Son ◽  
Jaegu Bae ◽  
Doi Lee ◽  
Jae-Jin Kim ◽  
...  

Particulate matter (PM) as an air pollutant is harmful to the human body as well as to the ecosystem. It is crucial to understand the spatiotemporal PM distribution in order to effectively implement reduction methods. However, ground-based air quality monitoring sites are limited in providing reliable concentration values owing to their patchy distribution. Here, we aimed to predict daily PM10 concentrations using boosting algorithms such as gradient boosting machine (GBM), extreme gradient boost (XGB), and light gradient boosting machine (LightGBM). The three models performed well in estimating the spatial contrasts and temporal variability in daily PM10 concentrations. In particular, the LightGBM model outperformed the GBM and XGM models, with an adjusted R2 of 0.84, a root mean squared error of 12.108 μg/m2, a mean absolute error of 8.543 μg/m2, and a mean absolute percentage error of 16%. Despite having high performance, the LightGBM model showed low spatial prediction accuracy near the southwest part of the study area. Additionally, temporal differences were found between the observed and predicted values at high concentrations. These outcomes indicate that such methods can provide intuitive and reliable PM10 concentration values for the management, prevention, and mitigation of air pollution. In the future, performance accuracy could be improved through consideration of different variables related to spatial and seasonal characteristics.


Author(s):  
C.K. Wu ◽  
P. Chang ◽  
N. Godinho

Recently, the use of refractory metal silicides as low resistivity, high temperature and high oxidation resistance gate materials in large scale integrated circuits (LSI) has become an important approach in advanced MOS process development (1). This research is a systematic study on the structure and properties of molybdenum silicide thin film and its applicability to high performance LSI fabrication.


Sign in / Sign up

Export Citation Format

Share Document