scholarly journals Charge and hydrophobicity are key features in sequence-trained machine learning models for predicting the biophysical properties of clinical-stage antibodies

2019 ◽  
Author(s):  
Max Hebditch ◽  
Jim Warwicker

AbstractImproved understanding of properties that mediate protein solubility and resistance to aggregation are important for developing biopharmaceuticals, and more generally in biotechnology and synthetic biology. Recent acquisition of large datasets for antibody biophysical properties enables the search for predictive models. In this report, machine learning methods are used to derive models for 12 biophysical properties. A physicochemical perspective is maintained in analysing the models, leading to the observation that models cluster largely according to charge (cross-interaction measurements) and hydrophobicity (self-interaction methods). These two properties also overlap in some cases, for example in a new interpretation of variation in hydrophobic interaction chromatography. Since the models are developed from differences of antibody variable loops, the next stage is to extend models to more diverse protein sets.AvailabilityThe web application for the sequence based algorithms are available on the protein-sol webserver, at https://protein-sol.manchester.ac.uk/abpred, with models and virtualisation software available at https://protein-sol.manchester.ac.uk/software.

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e8199 ◽  
Author(s):  
Max Hebditch ◽  
Jim Warwicker

Improved understanding of properties that mediate protein solubility and resistance to aggregation are important for developing biopharmaceuticals, and more generally in biotechnology and synthetic biology. Recent acquisition of large datasets for antibody biophysical properties enables the search for predictive models. In this report, machine learning methods are used to derive models for 12 biophysical properties. A physicochemical perspective is maintained in analysing the models, leading to the observation that models cluster largely according to charge (cross-interaction measurements) and hydrophobicity (self-interaction methods). These two properties also overlap in some cases, for example in a new interpretation of variation in hydrophobic interaction chromatography. Since the models are developed from differences of antibody variable loops, the next stage is to extend models to more diverse protein sets. Availability The web application for the sequence-based algorithms are available on the protein-sol webserver, at https://protein-sol.manchester.ac.uk/abpred, with models and virtualisation software available at https://protein-sol.manchester.ac.uk/software.


2022 ◽  
Vol 2 (14) ◽  
pp. 26-34
Author(s):  
Nguyen Manh Thang ◽  
Tran Thi Luong

Abstract—Almost developed applications tend to become as accessible as possible to the user on the Internet. Different applications often store their data in cyberspace for more effective work and entertainment, such as Google Docs, emails, cloud storage, maps, weather, news,... Attacks on Web resources most often occur at the application level, in the form of HTTP/HTTPS-requests to the site, where traditional firewalls have limited capabilities for analysis and detection attacks. To protect Web resources from attacks at the application level, there are special tools - Web Application Firewall (WAF). This article presents an anomaly detection algorithm, and how it works in the open-source web application firewall ModSecurity, which uses machine learning methods with 8 suggested features to detect attacks on web applications. Tóm tắt—Hầu hết các ứng dụng được phát triển có xu hướng trở nên dễ tiếp cận nhất có thể đối với người dùng qua Internet. Các ứng dụng khác nhau thường lưu trữ dữ liệu trên không gian mạng để làm việc và giải trí hiệu quả hơn, chẳng hạn như Google Docs, email, lưu trữ đám mây, bản đồ, thời tiết, tin tức,... Các cuộc tấn công vào tài nguyên Web thường xảy ra nhất ở tầng ứng dụng, dưới dạng các yêu cầu HTTP/HTTPS đến trang web, nơi tường lửa truyền thống có khả năng hạn chế trong việc phân tích và phát hiện các cuộc tấn công. Để bảo vệ tài nguyên Web khỏi các cuộc tấn công ở tầng ứng dụng, xuất hiện các công cụ đặc biệt - Tường lửa Ứng dụng Web (WAF). Bài viết này trình bày thuật toán phát hiện bất thường và cách thức hoạt động của tường lửa ứng dụng web mã nguồn mở ModSecurity khi sử dụng phương pháp học máy với 8 đặc trưng được đề xuất để phát hiện các cuộc tấn công vào các ứng dụng web.


2020 ◽  
Author(s):  
Trang T. Le ◽  
Jason H. Moore

AbstractSummarytreeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.AvailabilityThe treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous [email protected]


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Abu Sayed Chowdhury ◽  
Sarah M. Reehl ◽  
Kylene Kehn-Hall ◽  
Barney Bishop ◽  
Bobbie-Jo M. Webb-Robertson

Abstract The emergence of viral epidemics throughout the world is of concern due to the scarcity of available effective antiviral therapeutics. The discovery of new antiviral therapies is imperative to address this challenge, and antiviral peptides (AVPs) represent a valuable resource for the development of novel therapies to combat viral infection. We present a new machine learning model to distinguish AVPs from non-AVPs using the most informative features derived from the physicochemical and structural properties of their amino acid sequences. To focus on those features that are most likely to contribute to antiviral performance, we filter potential features based on their importance for classification. These feature selection analyses suggest that secondary structure is the most important peptide sequence feature for predicting AVPs. Our Feature-Informed Reduced Machine Learning for Antiviral Peptide Prediction (FIRM-AVP) approach achieves a higher accuracy than either the model with all features or current state-of-the-art single classifiers. Understanding the features that are associated with AVP activity is a core need to identify and design new AVPs in novel systems. The FIRM-AVP code and standalone software package are available at https://github.com/pmartR/FIRM-AVP with an accompanying web application at https://msc-viz.emsl.pnnl.gov/AVPR.


2020 ◽  
Vol 79 ◽  
pp. 01012
Author(s):  
Konstantin Sergeevich Nikolaev ◽  
Fail Mubarakovich Gafarov ◽  
Pavel Nikolaevich Ustin

This paper discusses the technical details of obtaining and processing data to determine a set of characteristics of texts from social networks, genre preferences in movies and music genres for students of Kazan Federal University who have different academic performance (successful, average, not-successful). The selection of such characteristics is carried out using machine learning methods (Word2Vec, tSNE). The data obtained is used in the development of a functional psychometric model of cognitive behavioral predictors of an individual’s activity within the framework of their educational activities. We also developed a web application for visualizing the obtained data using the Flask engine.


The network attacks become the most important security problems in the today’s world. There is a high increase in use of computers, mobiles, sensors,IoTs in networks, Big Data, Web Application/Server,Clouds and other computing resources. With the high increase in network traffic, hackers and malicious users are planning new ways of network intrusions. Many techniques have been developed to detect these intrusions which are based on data mining and machine learning methods. Machine learning algorithms intend to detect anomalies using supervised and unsupervised approaches.Both the detection techniques have been implemented using IDS datasets like DARPA98, KDDCUP99, NSL-KDD, ISCX, ISOT.UNSW-NB15 is the latest dataset. This data set contains nine different modern attack types and wide varieties of real normal activities. In this paper, a detailed survey of various machine learning based techniques applied on UNSW-NB15 data set have been carried out and suggested thatUNSW-NB15 is more complex than other datasets and is assumed as a new benchmark data set for evaluating NIDSs.


2021 ◽  
Vol 70 (11) ◽  
Author(s):  
Wenjia Liu ◽  
Nanjiao Ying ◽  
Qiusi Mo ◽  
Shanshan Li ◽  
Mengjie Shao ◽  
...  

Introduction. Klebsiella pneumoniae , a gram-negative bacterium, is a common pathogen causing nosocomial infection. The drug-resistance rate of K. pneumoniae is increasing year by year, posing a severe threat to public health worldwide. K. pneumoniae has been listed as one of the pathogens causing the global crisis of antimicrobial resistance in nosocomial infections. We need to explore the drug resistance of K. pneumoniae for clinical diagnosis. Single nucleotide polymorphisms (SNPs) are of high density and have rich genetic information in whole-genome sequencing (WGS), which can affect the structure or expression of proteins. SNPs can be used to explore mutation sites associated with bacterial resistance. Hypothesis/Gap Statement. Machine learning methods can detect genetic features associated with the drug resistance of K. pneumoniae from whole-genome SNP data. Aims. This work used Fast Feature Selection (FFS) and Codon Mutation Detection (CMD) machine learning methods to detect genetic features related to drug resistance of K. pneumoniae from whole-genome SNP data. Methods. WGS data on resistance of K. pneumoniae strains to four antibiotics (tetracycline, gentamicin, imipenem, amikacin) were downloaded from the European Nucleotide Archive (ENA). Sequence alignments were performed with MUMmer 3 to complete SNP calling using K. pneumoniae HS11286 chromosome as the reference genome. The FFS algorithm was applied to feature selection of the SNP dataset. The training set was constructed based on mutation sites with mutation frequency >0.995. Based on the original SNP training set, 70% of SNPs were randomly selected from each dataset as the test set to verify the accuracy of the training results. Finally, the resistance genes were obtained by the CMD algorithm and Venny. Results. The number of strains resistant to tetracycline, gentamicin, imipenem and amikacin was 931, 1048, 789 and 203, respectively. Machine learning algorithms were applied to the SNP training set and test set, and 28 and 23 resistance genes were predicted, respectively. The 28 resistance genes in the training set included 22 genes in the test set, which verified the accuracy of gene prediction. Among them, some genes (KPHS_35310, KPHS_18220, KPHS_35880, etc.) corresponded to known resistance genes (Eef2, lpxK, MdtC, etc). Logistic regression classifiers were established based on the identified SNPs in the training set. The area under the curves (AUCs) of the four antibiotics was 0.939, 0.950, 0.912 and 0.935, showing a strong ability to predict bacterial resistance. Conclusion. Machine learning methods can effectively be used to predict resistance genes and associated SNPs. The FFS and CMD algorithms have wide applicability. They can be used for the drug-resistance analysis of any microorganism with genomic variation and phenotypic data. This work lays a foundation for resistance research in clinical applications.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Chengyuan Sha ◽  
Miroslava Cuperlovic-Culf ◽  
Ting Hu

Abstract Background Direct link between metabolism and cell and organism phenotype in health and disease makes metabolomics, a high throughput study of small molecular metabolites, an essential methodology for understanding and diagnosing disease development and progression. Machine learning methods have seen increasing adoptions in metabolomics thanks to their powerful prediction abilities. However, the “black-box” nature of many machine learning models remains a major challenge for wide acceptance and utility as it makes the interpretation of decision process difficult. This challenge is particularly predominant in biomedical research where understanding of the underlying decision making mechanism is essential for insuring safety and gaining new knowledge. Results In this article, we proposed a novel computational framework, Systems Metabolomics using Interpretable Learning and Evolution (SMILE), for supervised metabolomics data analysis. Our methodology uses an evolutionary algorithm to learn interpretable predictive models and to identify the most influential metabolites and their interactions in association with disease. Moreover, we have developed a web application with a graphical user interface that can be used for easy analysis, interpretation and visualization of the results. Performance of the method and utilization of the web interface is shown using metabolomics data for Alzheimer’s disease. Conclusions SMILE was able to identify several influential metabolites on AD and to provide interpretable predictive models that can be further used for a better understanding of the metabolic background of AD. SMILE addresses the emerging issue of interpretability and explainability in machine learning, and contributes to more transparent and powerful applications of machine learning in bioinformatics.


Sign in / Sign up

Export Citation Format

Share Document