Better Performance with Transformer: CPPFormer in precise prediction of cell-Penetrating Peptides

2021 ◽  
Vol 28 ◽  
Author(s):  
Yuyang Xue ◽  
Xiucai Ye ◽  
Lesong Wei ◽  
Xin Zhang ◽  
Tetsuya Sakurai ◽  
...  

: With its superior performance, the Transformer model, which is based on the 'Encoder-Decoder' paradigm, has become the mainstream in natural language processing. On the other hand, bioinformatics has embraced machine learning and made great progress in drug design and protein property prediction. Cell-penetrating peptides (CPPs) are one kind of permeable protein that is convenient as a kind of 'postman' in drug penetration tasks. However, a small number of CPPs have been discovered by research, let alone practical applications in drug permeability. Therefore, correctly identifying the CPPs has opened up a new way to take macromolecules into cells without other potentially harmful materials in the drug. Most of the previous work only uses trivial machine learning techniques and hand-crafted features to construct a simple classifier. In CPPFormer, we learn from the idea of implementing the attention structure of Transformer, rebuilding the network based on the characteristics of CPPs according to its short length, and using an automatic feature extractor with a few manual engineered features to co-direct the predicted results. Compared to all previous methods and other classic text classification models, the empirical result has shown that our proposed deep model-based method has achieved the best performance of 92.16% accuracy in the CPP924 dataset and has passed various index tests.

2019 ◽  
Vol 15 (3) ◽  
pp. 206-211 ◽  
Author(s):  
Jihui Tang ◽  
Jie Ning ◽  
Xiaoyan Liu ◽  
Baoming Wu ◽  
Rongfeng Hu

<P>Introduction: Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. </P><P> Materials and Methods: In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. </P><P> Results: The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. </P><P> Conclusion: All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.</P>


2018 ◽  
Vol 7 (2.32) ◽  
pp. 462
Author(s):  
G Krishna Chaitanya ◽  
Dinesh Reddy Meka ◽  
Vakalapudi Surya Vamsi ◽  
M V S Ravi Karthik

Sentiment or emotion behind a tweet from Twitter or a post from Facebook can help us answer what opinions or feedback a person has. With the advent of growing user-generated blogs, posts and reviews across various social media and online retails, calls for an understanding of these afore mentioned user data acts as a catalyst in building Recommender systems and drive business plans. User reviews on online retail stores influence buying behavior of customers and thus complements the ever-growing need of sentiment analysis. Machine Learning helps us to read between the lines of tweets by proving us with various algorithms like Naïve Bayes, SVM, etc. Sentiment Analysis uses Machine Learning and Natural Language Processing (NLP) to extract, classify and analyze tweets for sentiments (emotions). There are various packages and frameworks in R and Python that aid in Sentiment Analysis or Text Mining in general. 


2020 ◽  
Vol 7 (10) ◽  
pp. 380-389
Author(s):  
Asogwa D.C ◽  
Anigbogu S.O ◽  
Anigbogu G.N ◽  
Efozia F.N

Author's age prediction is the task of determining the author's age by studying the texts written by them. The prediction of author’s age can be enlightening about the different trends, opinions social and political views of an age group. Marketers always use this to encourage a product or a service to an age group following their conveyed interests and opinions. Methodologies in natural language processing have made it possible to predict author’s age from text by examining the variation of linguistic characteristics. Also, many machine learning algorithms have been used in author’s age prediction. However, in social networks, computational linguists are challenged with numerous issues just as machine learning techniques are performance driven with its own challenges in realistic scenarios. This work developed a model that can predict author's age from text with a machine learning algorithm (Naïve Bayes) using three types of features namely, content based, style based and topic based. The trained model gave a prediction accuracy of 80%.


The online discussion forums and blogs are very vibrant platforms for cancer patients to express their views in the form of stories. These stories sometimes become a source of inspiration for some patients who are anxious in searching the similar cases. This paper proposes a method using natural language processing and machine learning to analyze unstructured texts accumulated from patient’s reviews and stories. The proposed methodology aims to identify behavior, emotions, side-effects, decisions and demographics associated with the cancer victims. The pre-processing phase of our work involves extraction of web text followed by text-cleaning where some special characters and symbols are omitted, and finally tagging the texts using NLTK’s (Natural Language Toolkit) POS (Parts of Speech) Tagger. The post-processing phase performs training of seven machine learning classifiers (refer Table 6). The Decision Tree classifier shows the higher precision (0.83) among the other classifiers while, the Area under the operating Characteristics (AUC) for Support Vector Machine (SVM) classifier is highest (0.98).


Author(s):  
Rashida Ali ◽  
Ibrahim Rampurawala ◽  
Mayuri Wandhe ◽  
Ruchika Shrikhande ◽  
Arpita Bhatkar

Internet provides a medium to connect with individuals of similar or different interests creating a hub. Since a huge hub participates on these platforms, the user can receive a high volume of messages from different individuals creating a chaos and unwanted messages. These messages sometimes contain a true information and sometimes false, which leads to a state of confusion in the minds of the users and leads to first step towards spam messaging. Spam messages means an irrelevant and unsolicited message sent by a known/unknown user which may lead to a sense of insecurity among users. In this paper, the different machine learning algorithms were trained and tested with natural language processing (NLP) to classify whether the messages are spam or ham.


2018 ◽  
Vol 17 (8) ◽  
pp. 2715-2726 ◽  
Author(s):  
Balachandran Manavalan ◽  
Sathiyamoorthy Subramaniyam ◽  
Tae Hwan Shin ◽  
Myeong Ok Kim ◽  
Gwang Lee

2012 ◽  
pp. 13-22 ◽  
Author(s):  
João Gama ◽  
André C.P.L.F. de Carvalho

Machine learning techniques have been successfully applied to several real world problems in areas as diverse as image analysis, Semantic Web, bioinformatics, text processing, natural language processing,telecommunications, finance, medical diagnosis, and so forth. A particular application where machine learning plays a key role is data mining, where machine learning techniques have been extensively used for the extraction of association, clustering, prediction, diagnosis, and regression models. This text presents our personal view of the main aspects, major tasks, frequently used algorithms, current research, and future directions of machine learning research. For such, it is organized as follows: Background information concerning machine learning is presented in the second section. The third section discusses different definitions for Machine Learning. Common tasks faced by Machine Learning Systems are described in the fourth section. Popular Machine Learning algorithms and the importance of the loss function are commented on in the fifth section. The sixth and seventh sections present the current trends and future research directions, respectively.


Author(s):  
João Gama ◽  
André C.P.L.F. de Carvalho

Machine learning techniques have been successfully applied to several real world problems in areas as diverse as image analysis, Semantic Web, bioinformatics, text processing, natural language processing,telecommunications, finance, medical diagnosis, and so forth. A particular application where machine learning plays a key role is data mining, where machine learning techniques have been extensively used for the extraction of association, clustering, prediction, diagnosis, and regression models. This text presents our personal view of the main aspects, major tasks, frequently used algorithms, current research, and future directions of machine learning research. For such, it is organized as follows: Background information concerning machine learning is presented in the second section. The third section discusses different definitions for Machine Learning. Common tasks faced by Machine Learning Systems are described in the fourth section. Popular Machine Learning algorithms and the importance of the loss function are commented on in the fifth section. The sixth and seventh sections present the current trends and future research directions, respectively.


Author(s):  
Marina Sokolova ◽  
Stan Szpakowicz

This chapter presents applications of machine learning techniques to problems in natural language processing that require work with very large amounts of text. Such problems came into focus after the Internet and other computer-based environments acquired the status of the prime medium for text delivery and exchange. In all cases which the authors discuss, an algorithm has ensured a meaningful result, be it the knowledge of consumer opinions, the protection of personal information or the selection of news reports. The chapter covers elements of opinion mining, news monitoring and privacy protection, and, in parallel, discusses text representation, feature selection, and word category and text classification problems. The applications presented here combine scientific interest and significant economic potential.


Sign in / Sign up

Export Citation Format

Share Document