A zero-delay sequential quantizer for individual sequences

Author(s):  
T. Linder ◽  
G. Lugosi
Keyword(s):  
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Suman Pokhrel ◽  
Benjamin R. Kraemer ◽  
Scott Burkholz ◽  
Daria Mochly-Rosen

AbstractIn December 2019, a novel coronavirus, termed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was identified as the cause of pneumonia with severe respiratory distress and outbreaks in Wuhan, China. The rapid and global spread of SARS-CoV-2 resulted in the coronavirus 2019 (COVID-19) pandemic. Earlier during the pandemic, there were limited genetic viral variations. As millions of people became infected, multiple single amino acid substitutions emerged. Many of these substitutions have no consequences. However, some of the new variants show a greater infection rate, more severe disease, and reduced sensitivity to current prophylaxes and treatments. Of particular importance in SARS-CoV-2 transmission are mutations that occur in the Spike (S) protein, the protein on the viral outer envelope that binds to the human angiotensin-converting enzyme receptor (hACE2). Here, we conducted a comprehensive analysis of 441,168 individual virus sequences isolated from humans throughout the world. From the individual sequences, we identified 3540 unique amino acid substitutions in the S protein. Analysis of these different variants in the S protein pinpointed important functional and structural sites in the protein. This information may guide the development of effective vaccines and therapeutics to help arrest the spread of the COVID-19 pandemic.


2019 ◽  
Vol 41 (1) ◽  
pp. 69-76
Author(s):  
Teresa Jakubczyk

Abstract The paper presents the results of analysis of duration of precipitation sequences and the amounts of precipitation in individual sequences in Legnica. The study was aimed at an analysis of potential trends and regularities in atmospheric precipitations over the period of 1966–2015. On their basis a prediction attempt was made for trends in subsequent years. The analysis was made by fitting data to suitable distributions – the Weibull distribution for diurnal sums in sequences and the Pascal distribution for sequence durations, and then by analysing the variation of the particular indices such the mean value, variance and quartiles. The analysis was performed for five six-week periods in a year, from spring to late autumn, analysed in consecutive five-year periods. The trends of the analysed indices, observed over the fifty-year period, are not statistically significant, which indicates stability of precipitation conditions over the last half-century.


2021 ◽  
Author(s):  
Roshan Rao ◽  
Jason Liu ◽  
Robert Verkuil ◽  
Joshua Meier ◽  
John F. Canny ◽  
...  

AbstractUnsupervised protein language models trained across millions of diverse sequences learn structure and function of proteins. Protein language models studied to date have been trained to perform inference from individual sequences. The longstanding approach in computational biology has been to make inferences from a family of evolutionarily related sequences by fitting a model to each family independently. In this work we combine the two paradigms. We introduce a protein language model which takes as input a set of sequences in the form of a multiple sequence alignment. The model interleaves row and column attention across the input sequences and is trained with a variant of the masked language modeling objective across many protein families. The performance of the model surpasses current state-of-the-art unsupervised structure learning methods by a wide margin, with far greater parameter efficiency than prior state-of-the-art protein language models.


2007 ◽  
Vol 53 (5) ◽  
pp. 1860-1866 ◽  
Author(s):  
Jacob Ziv ◽  
Neri Merhav

Author(s):  
Jorja G. Henikoff

A block is an ungapped local multiple alignment of amino acid sequences from a group of related proteins. Ideally, the contiguous stretch of residues represented by a block is conserved for biological function. Blocks have depth (the number of sequences) and width (the number of aligned positions). There are currently several useful programs for finding blocks in a group of related sequences that I do not discuss in detail here. Among these, Motif (Smith et al., 1990) and Asset (Neuwald and Green, 1994) both align blocks on occurrences of certain types of patterns found in the sequences; Gibbs (Lawrence et al., 1993; Neuwald et al., 1995) and MEME (Bailey and Elkan, 1994) both look for statistically optimal local alignments; and Macaw (Schuler et al., 1991) and Somap (Parry-Smith and Attwood, 1992) both give the user assistance in finding blocks interactively. After candidate blocks are identified by a block-finding method, they can be evaluated and assembled into a set representing the protein group, resulting in a multiple alignment consisting of ungapped regions separated by unaligned regions of variable length. The block assembly process is the subject of this chapter. Both the Blocks (Henikoff and Henikoff, 1996a) and Prints (Attwood and Beck, 1994) databases consist of such sets of blocks and between them currently represent 1,163 different protein groups. These collections of blocks are more sensitive and efficient for classifying new sequences into known protein groups than are collections of individual sequences, as demonstrated by comprehensive evaluations (Henikoff and Henikoff, 1994b, 1997), by genomic studies (Green et al., 1993), and by individual studies (Posfai et al., 1988; Henikoff, 1992, 1993; Attwood and Findlay, 1993; Pietrokovski, 1994; Brown, 1995). Issues that must be addressed during block assembly include the number of blocks provided to the assembly module by the block finders, block width, the number of times a block occurs in each sequence (zero to many), overlap of blocks, and the order of multiple blocks within each sequence. Once these issues are decided, it is necessary to score individual competing blocks and then competing sets of blocks.


Sign in / Sign up

Export Citation Format

Share Document