scholarly journals Transformer protein language models are unsupervised structure learners

2020 ◽  
Author(s):  
Roshan M Rao ◽  
Joshua Meier ◽  
Tom Sercu ◽  
Sergey Ovchinnikov ◽  
Alexander Rives

Unsupervised contact prediction is central to uncovering physical, structural, and functional constraints for protein structure determination and design. For decades, the predominant approach has been to infer evolutionary constraints from a set of related sequences. In the past year, protein language models have emerged as a potential alternative, but performance has fallen short of state-of-the-art approaches in bioinformatics. In this paper we demonstrate that Transformer attention maps learn contacts from the unsupervised language modeling objective. We find the highest capacity models that have been trained to date already outperform a state-of-the-art unsupervised contact prediction pipeline, suggesting these pipelines can be replaced with a single forward pass of an end-to-end model.

2021 ◽  
Author(s):  
Roshan Rao ◽  
Jason Liu ◽  
Robert Verkuil ◽  
Joshua Meier ◽  
John F. Canny ◽  
...  

AbstractUnsupervised protein language models trained across millions of diverse sequences learn structure and function of proteins. Protein language models studied to date have been trained to perform inference from individual sequences. The longstanding approach in computational biology has been to make inferences from a family of evolutionarily related sequences by fitting a model to each family independently. In this work we combine the two paradigms. We introduce a protein language model which takes as input a set of sequences in the form of a multiple sequence alignment. The model interleaves row and column attention across the input sequences and is trained with a variant of the masked language modeling objective across many protein families. The performance of the model surpasses current state-of-the-art unsupervised structure learning methods by a wide margin, with far greater parameter efficiency than prior state-of-the-art protein language models.


2016 ◽  
Vol 4 ◽  
pp. 329-342 ◽  
Author(s):  
Joris Pelemans ◽  
Noam Shazeer ◽  
Ciprian Chelba

We present Sparse Non-negative Matrix (SNM) estimation, a novel probability estimation technique for language modeling that can efficiently incorporate arbitrary features. We evaluate SNM language models on two corpora: the One Billion Word Benchmark and a subset of the LDC English Gigaword corpus. Results show that SNM language models trained with n-gram features are a close match for the well-established Kneser-Ney models. The addition of skip-gram features yields a model that is in the same league as the state-of-the-art recurrent neural network language models, as well as complementary: combining the two modeling techniques yields the best known result on the One Billion Word Benchmark. On the Gigaword corpus further improvements are observed using features that cross sentence boundaries. The computational advantages of SNM estimation over both maximum entropy and neural network estimation are probably its main strength, promising an approach that has large flexibility in combining arbitrary features and yet scales gracefully to large amounts of data.


2021 ◽  
Author(s):  
Aman Pathak

Natural language processing (NLP) has witnessed many substantial advancements in the past three years. With the introduction of the Transformer and self-attention mechanism, language models are now able to learn better representations of the natural language. These attentionbased models have achieved exceptional state-of-the-art results on various NLP benchmarks. One of the contributing factors is the growing use of transfer learning. Models are pre-trained on unsupervised objectives using rich datasets that develop fundamental natural language abilities that are fine-tuned further on supervised data for downstream tasks. Surprisingly, current researches have led to a novel era of powerful models that no longer require finetuning. The objective of this paper is to present a comparative analysis of some of the most influential language models. The benchmarks of the study are problem-solving methodologies, model architecture, compute power, standard NLP benchmark accuracies and shortcomings.


Information ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 409
Author(s):  
Panagiotis Kasnesis ◽  
Lazaros Toumanidis ◽  
Charalampos Z. Patrikakis

The widespread use of social networks has brought to the foreground a very important issue, the veracity of the information circulating within them. Many natural language processing methods have been proposed in the past to assess a post’s content with respect to its reliability; however, end-to-end approaches are not comparable in ability to human beings. To overcome this, in this paper, we propose the use of a more modular approach that produces indicators about a post’s subjectivity and the stance provided by the replies it has received to date, letting the user decide whether (s)he trusts or does not trust the provided information. To this end, we fine-tuned state-of-the-art transformer-based language models and compared their performance with previous related work on stance detection and subjectivity analysis. Finally, we discuss the obtained results.


Author(s):  
Carl E. Henderson

Over the past few years it has become apparent in our multi-user facility that the computer system and software supplied in 1985 with our CAMECA CAMEBAX-MICRO electron microprobe analyzer has the greatest potential for improvement and updating of any component of the instrument. While the standard CAMECA software running on a DEC PDP-11/23+ computer under the RSX-11M operating system can perform almost any task required of the instrument, the commands are not always intuitive and can be difficult to remember for the casual user (of which our laboratory has many). Given the widespread and growing use of other microcomputers (such as PC’s and Macintoshes) by users of the microprobe, the PDP has become the “oddball” and has also fallen behind the state-of-the-art in terms of processing speed and disk storage capabilities. Upgrade paths within products available from DEC are considered to be too expensive for the benefits received. After using a Macintosh for other tasks in the laboratory, such as instrument use and billing records, word processing, and graphics display, its unique and “friendly” user interface suggested an easier-to-use system for computer control of the electron microprobe automation. Specifically a Macintosh IIx was chosen for its capacity for third-party add-on cards used in instrument control.


Sign in / Sign up

Export Citation Format

Share Document