scholarly journals solPredict: Antibody apparent solubility prediction from sequence by transfer learning

2021 ◽  
Author(s):  
Jiangyan Feng ◽  
Min Jiang ◽  
James Shih ◽  
Qing Chai

There is growing interest in developing therapeutic mAbs for the route of subcutaneous administration for several reasons, including patient convenience and compliance. This requires identifying mAbs with superior solubility that are amenable for high-concentration formulation development. However, early selection of developable antibodies with optimal high-concentration attributes remains challenging. Since experimental screening is often material and labor intensive, there is significant interest in developing robust in silico tools capable of screening thousands of molecules based on sequence information alone. In this paper, we present a strategy applying protein language modeling, named solPredict, to predict the apparent solubility of mAbs in histidine (pH 6.0) buffer condition. solPredict inputs embeddings extracted from pretrained protein language model from single sequences into a shallow neutral network. A dataset of 220 diverse, in-house mAbs, with extrapolated protein solubility data obtained from PEG-induced precipitation method, were used for model training and hyperparameter tuning through five-fold cross validation. An independent test set of 40 mAbs were used for model evaluation. solPredict achieves high correlation with experimental data (Spearman correlation coefficient = 0.86, Pearson correlation coefficient = 0.84, R2 = 0.69, and RMSE = 4.40). The output from solPredict directly corresponds to experimental solubility measurements (PEG %) and enables quantitative interpretation of results. This approach eliminates the need of 3D structure modeling of mAbs, descriptor computation, and expert-crafted input features. The minimal computational expense of solPredict enables rapid, large-scale, and high-throughput screening of mAbs during early antibody discovery.

Author(s):  
Ismael Montero Fernández ◽  
Edvan Alves Chagas ◽  
Pollyana Cardoso Chagas ◽  
Selvin Antonio Saravia Maldonado ◽  
Ricardo Carvalho dos Santos ◽  
...  

In this work, nine fruits cultivated in the northern Amazon were studied: abiu, acerola, araçá, bacupari, biribá, caçarí, fruta-do-conde, graviola and taperebá, with the objective of carrying out a bromatological and nutritional study of the pulps of fruits studied. Of all of them, are the pulps of graviola (76.83 ± 0.02 kcal 100 g-1) bacupari (76.83 ± 0.02 kcal 100 g-1) and fruta-do-conde (46.66 ± 0.02 kcal 100 g-1). Among the macronutrients, the high concentration of potassium stands out, especially in the graviola (541.16 ± 0.24 mg 100 g-1) and the biribá (468.21 ± 0.13 mg 100 g-1). Among the micronutrients, iron concentrations are representative for araçá pulp (3.04 ± 0.02 mg 100g-1), abiu is rich in zinc (3.71 ± 0.02 mg 100 g-1) and manganese (6.61 ± 0.11 mg 100 g-1). The presence of cobalt at the level of traces in some of the pulps studied stands out. The Pearson correlation coefficient was evaluated, as well as the statistical treatment by multivariate analysis PCA to establish the correlation between the variables studied.


Author(s):  
Ismael Montero Fernández ◽  
Edvan Alves Chagas ◽  
Antonio Alves de Melo Filho ◽  
Selvin Antonio Saravia Maldonado ◽  
Ricardo Carvalho dos Santos ◽  
...  

In this work, nine fruits cultivated in the northern Amazon were studied: abiu, acerola, araçá, bacupari, biribá, caçarí, fruta-do-conde, graviola and taperebá, with the objective of carrying out a bromatological and nutritional study of the pulps of fruits studied. Of all of them, are the pulps of graviola (76.83 ± 0.02 kcal 100 g-1) bacupari (76.83 ± 0.02 kcal 100 g-1) and fruta-do-conde (46.66 ± 0.02 kcal 100 g-1). Among the macronutrients, the high concentration of potassium stands out, especially in the graviola (541.16 ± 0.24 mg 100g-1) and the biribá (468.21 ± 0.13 mg 100g-1). Among the micronutrients, iron concentrations are representative for araçá pulp (3.04 ± 0.02 mg 100g-1), abiu is rich in zinc (3.71 ± 0.02 mg 100g-1) and manganese (6.61 ± 0.11 mg 100g-1). The presence of cobalt at the level of traces in some of the pulps studied stands out. The Pearson correlation coefficient was evaluated, as well as the statistical treatment by multivariate analysis PCA to establish the correlation between the variables studied.


Author(s):  
Novikova ◽  
SP Romanenko ◽  
MA Lobkis

Introduction: In the Russian Federation, much attention is traditionally paid to military education and training. A special place in its structure is occupied by the system of cadet classes and corps. A distinctive feature of the learning mode in such institutions is a combined effect of standard and specific factors of indoor school environment and intensive physical activity owing to sports, applied military and drill training. No evidence-based methods of establishing nutrient requirements of children in modern conditions of cadet corps have been developed so far, which predetermines the potential of transforming nutrition from a health-saving factor into a health risk factor. Our objective was to provide a scientific substantiation of the model of healthy nutrition for students of cadet-type educational establishments. Methods: The statistical significance of the correlation was evaluated using the Student’s t-test. Correlation and regression analyses were used to assess cause-and-effect relationships. The Pearson correlation coefficient (rxy) was used as an indicator of the strength of the relationship between quantitative indicators x and y, both having a normal distribution. Correlation coefficient (rxy) values were interpreted in accordance with the Chaddock scale. For the purpose of statistical modeling, the method of multiple linear regressions was used. Conclusions: We substantiated the innovative model of organizing healthy nutrition for students of cadet-type schools based on the correlation and regression analyses with determination of statistical significance of the studied characteristics. Its efficiency indicators include an increase in average functional capabilities of students by more than 10 % and a reduction in the probability of developmental disorders by more than 25 %.


2020 ◽  
Vol 16 (1) ◽  
pp. 47-53
Author(s):  
Vicente Benavides-Córdoba ◽  
Mauricio Palacios Gómez

Introduction: Animal models have been used to understand the pathophysiology of pulmonary hypertension, to describe the mechanisms of action and to evaluate promising active ingredients. The monocrotaline-induced pulmonary hypertension model is the most used animal model. In this model, invasive and non-invasive hemodynamic variables that resemble human measurements have been used. Aim: To define if non-invasive variables can predict hemodynamic measures in the monocrotaline-induced pulmonary hypertension model. Materials and Methods: Twenty 6-week old male Wistar rats weighing between 250-300g from the bioterium of the Universidad del Valle (Cali - Colombia) were used in order to establish that the relationships between invasive and non-invasive variables are sustained in different conditions (healthy, hypertrophy and treated). The animals were organized into three groups, a control group who was given 0.9% saline solution subcutaneously (sc), a group with pulmonary hypertension induced with a single subcutaneous dose of Monocrotaline 30 mg/kg, and a group with pulmonary hypertension with 30 mg/kg of monocrotaline treated with Sildenafil. Right ventricle ejection fraction, heart rate, right ventricle systolic pressure and the extent of hypertrophy were measured. The functional relation between any two variables was evaluated by the Pearson correlation coefficient. Results: It was found that all correlations were statistically significant (p <0.01). The strongest correlation was the inverse one between the RVEF and the Fulton index (r = -0.82). The Fulton index also had a strong correlation with the RVSP (r = 0.79). The Pearson correlation coefficient between the RVEF and the RVSP was -0.81, meaning that the higher the systolic pressure in the right ventricle, the lower the ejection fraction value. Heart rate was significantly correlated to the other three variables studied, although with relatively low correlation. Conclusion: The correlations obtained in this study indicate that the parameters evaluated in the research related to experimental pulmonary hypertension correlate adequately and that the measurements that are currently made are adequate and consistent with each other, that is, they have good predictive capacity.


Author(s):  
Yu Wang ◽  
Jiantao Wang ◽  
Haiping Wang ◽  
Xinyu Yang ◽  
Liming Chang ◽  
...  

Objective: Accurate assessment of breast tumor size preoperatively is important for the initial decision-making in surgical approach. Therefore, we aimed to compare efficacy of mammography and ultrasonography in ductal carcinoma in situ (DCIS) of breast cancer. Methods: Preoperative mammography and ultrasonography were performed on 104 women with DCIS of breast cancer. We compared the accuracy of each of the imaging modalities with pathological size by Pearson correlation. For each modality, it was considered concordant if the difference between imaging assessment and pathological measurement is less than 0.5cm. Results: At pathological examination tumor size ranged from 0.4cm to 7.2cm in largest diameter. For mammographically determined size versus pathological size, correlation coefficient of r was 0.786 and for ultrasonography it was 0.651. Grouped by breast composition, in almost entirely fatty and scattered areas of fibroglandular dense breast, correlation coefficient of r was 0.790 for mammography and 0.678 for ultrasonography; in heterogeneously dense and extremely dense breast, correlation coefficient of r was 0.770 for mammography and 0.548 for ultrasonography. In microcalcification positive group, coeffient of r was 0.772 for mammography and 0.570 for ultrasonography. In microcalcification negative group, coeffient of r was 0.806 for mammography and 0.783 for ultrasonography. Conclusion: Mammography was more accurate than ultrasonography in measuring the largest cancer diameter in DCIS of breast cancer. The correlation coefficient improved in the group of almost entirely fatty/ scattered areas of fibroglandular dense breast or in microcalcification negative group.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yingxi Yang ◽  
Hui Wang ◽  
Wen Li ◽  
Xiaobo Wang ◽  
Shizhao Wei ◽  
...  

Abstract Background Protein post-translational modification (PTM) is a key issue to investigate the mechanism of protein’s function. With the rapid development of proteomics technology, a large amount of protein sequence data has been generated, which highlights the importance of the in-depth study and analysis of PTMs in proteins. Method We proposed a new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites. Using eight different sequential and five structural construction methods, 1497 valid features were remained after the filtering by Pearson correlation coefficient. To solve the data imbalance problem, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methods were leveraged and compared to generate new samples for the types with fewer samples. Finally, random forest algorithm was utilized to predict seven categories. Results In the tenfold cross-validation, accuracy (Acc) and Matthews correlation coefficient (MCC) were 0.8589 and 0.8376, respectively. In the independent test, Acc and MCC were 0.8549 and 0.8330, respectively. The results indicated that CWGAN better solved the existing data imbalance and stabilized the training error. Alternatively, an accumulated feature importance analysis reported that CKSAAP, PWM and structural features were the three most important feature-encoding schemes. MultiLyGAN can be found at https://github.com/Lab-Xu/MultiLyGAN. Conclusions The CWGAN greatly improved the predictive performance in all experiments. Features derived from CKSAAP, PWM and structure schemes are the most informative and had the greatest contribution to the prediction of PTM.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 156
Author(s):  
Charles Carlson ◽  
Vanessa-Rose Turpin ◽  
Ahmad Suliman ◽  
Carl Ade ◽  
Steve Warren ◽  
...  

Background: The goal of this work was to create a sharable dataset of heart-driven signals, including ballistocardiograms (BCGs) and time-aligned electrocardiograms (ECGs), photoplethysmograms (PPGs), and blood pressure waveforms. Methods: A custom, bed-based ballistocardiographic system is described in detail. Affiliated cardiopulmonary signals are acquired using a GE Datex CardioCap 5 patient monitor (which collects ECG and PPG data) and a Finapres Medical Systems Finometer PRO (which provides continuous reconstructed brachial artery pressure waveforms and derived cardiovascular parameters). Results: Data were collected from 40 participants, 4 of whom had been or were currently diagnosed with a heart condition at the time they enrolled in the study. An investigation revealed that features extracted from a BCG could be used to track changes in systolic blood pressure (Pearson correlation coefficient of 0.54 +/− 0.15), dP/dtmax (Pearson correlation coefficient of 0.51 +/− 0.18), and stroke volume (Pearson correlation coefficient of 0.54 +/− 0.17). Conclusion: A collection of synchronized, heart-driven signals, including BCGs, ECGs, PPGs, and blood pressure waveforms, was acquired and made publicly available. An initial study indicated that bed-based ballistocardiography can be used to track beat-to-beat changes in systolic blood pressure and stroke volume. Significance: To the best of the authors’ knowledge, no other database that includes time-aligned ECG, PPG, BCG, and continuous blood pressure data is available to the public. This dataset could be used by other researchers for algorithm testing and development in this fast-growing field of health assessment, without requiring these individuals to invest considerable time and resources into hardware development and data collection.


Water ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 82
Author(s):  
Omolola M. Adisa ◽  
Muthoni Masinde ◽  
Joel O. Botai

This study examines the (dis)similarity of two commonly used indices Standardized Precipitation Index (SPI) computed over accumulation periods 1-month, 3-month, 6-month, and 12-month (hereafter SPI-1, SPI-3, SPI-6, and SPI-12, respectively) and Effective Drought Index (EDI). The analysis is based on two drought monitoring indicators (derived from SPI and EDI), namely, the Drought Duration (DD) and Drought Severity (DS) across the 93 South African Weather Service’s delineated rainfall districts over South Africa from 1980 to 2019. In the study, the Pearson correlation coefficient dissimilarity and periodogram dissimilarity estimates were used. The results indicate a positive correlation for the Pearson correlation coefficient dissimilarity and a positive value for periodogram of dissimilarity in both the DD and DS. With the Pearson correlation coefficient dissimilarity, the study demonstrates that the values of the SPI-1/EDI pair and the SPI-3/EDI pair exhibit the highest similar values for DD, while the SPI-6/EDI pair shows the highest similar values for DS. Moreover, dissimilarities are more obvious in SPI-12/EDI pair for DD and DS. When a periodogram of dissimilarity is used, the values of the SPI-1/EDI pair and SPI-6/EDI pair exhibit the highest similar values for DD, while SPI-1/EDI displayed the highest similar values for DS. Overall, the two measures show that the highest similarity is obtained in the SPI-1/EDI pair for DS. The results obtainable in this study contribute towards an in-depth knowledge of deviation between the EDI and SPI values for South Africa, depicting that these two drought indices values are replaceable in some rainfall districts of South Africa for drought monitoring and prediction, and this is a step towards the selection of the appropriate drought indices.


Sign in / Sign up

Export Citation Format

Share Document