table entry
Recently Published Documents


TOTAL DOCUMENTS

10
(FIVE YEARS 3)

H-INDEX

2
(FIVE YEARS 1)

Symmetry ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2385
Author(s):  
Xue Sun ◽  
Chao-Chin Wu ◽  
Yan-Fang Liu

In the field of computational biology, sequence alignment is a very important methodology. BLAST is a very common tool for performing sequence alignment in bioinformatics provided by National Center for Biotechnology Information (NCBI) in the USA. The BLAST server receives tens of thousands of queries every day on average. Among the procedures of BLAST, the hit detection process whose core architecture is a lookup table is the most time-consuming. In the latest work, a lightweight BLASTP on CUDA GPU with a hybrid query-index table was proposed for servicing the sequence query length shorter than 512, which effectively improved the query efficiency. According to the reported protein sequence length distribution, about 90% of sequences are equal to or smaller than 1024. In this paper, we propose an improved lightweight BLASTP to speed up the hit detection time for longer query sequences. The largest sequence is enlarged from 512 to 1024. As a result, one more bit is required to encode each sequence position. To meet the requirement, an extended hybrid query-index table (EHQIT) is proposed to accommodate three sequence positions in a four-byte table entry, making only one memory access sufficient to retrieve all the position information as long as the number of hits is equal to or smaller than three. Moreover, if there are more than three hits for a possible word, all the position information will be stored in contiguous table entries, which eliminates branch divergence and reduces memory space for pointers to overflow buffer. A square symmetric scoring matrix, Blosum62, is used to determine the relative score made by matching two characters in a sequence alignment. The experimental results show that for queries shorter than 512 our improved lightweight BLASTP outperforms the original lightweight BLASTP with speedups of 1.2 on average. When the number of hit overflows increases, the speedup can be as high as two. For queries shorter than 1024, our improved lightweight BLASTP can provide speedups ranging from 1.56 to 3.08 over the CUDA-BLAST. In short, the improved lightweight BLASTP can replace the original one because it can support a longer query sequence and provide better performance.


2020 ◽  
Vol 38 (2) ◽  
pp. 377-388 ◽  
Author(s):  
Hemin Yang ◽  
George F. Riley ◽  
Douglas M. Blough
Keyword(s):  

10.29007/f89j ◽  
2018 ◽  
Author(s):  
Vivek Nigam ◽  
Limin Jia ◽  
Anduo Wang ◽  
Boon Thau Loo ◽  
Andre Scedrov

Network Datalog (<i>NDlog</i>) is a recursive query language that extends Datalog by allowing programs to be distributed in a network. In our initial efforts to formally specify <i>NDlog</i>'s operational semantics, we have found several problems with the current evaluation algorithm being used, including unsound results, unintended multiple derivations of the same table entry, and divergence. In this paper, we make a first step towards correcting these problems by formally specifying a new operational semantics for <i>NDlog</i> and proving its correctness for the fragment of non-recursive programs. We also argue that if termination is guaranteed, then the results also extend to recursive programs. Finally, we identify a number of potential implementation improvements to <i>NDlog</i>.


2013 ◽  
Vol 46 (3) ◽  
pp. 316-334 ◽  
Author(s):  
Alison Wray

Creating a timeline for formulaic language is far from simple, because several partially independent lines of research have contributed to the emerging picture. Each exhibits cycles of innovation and consolidation over time: domains take a leading role in developing new knowledge and then fall back, while another area comes to the fore. Thus, some of the first observations about formulaic language, back in the nineteenth century, were in the clinical domain of aphasia studies. By the early to mid twentieth century it was theories of language structure that had most to say, until eclipsed by the Chomskian model, which saw little significance in lexicalised units larger than the word (an issue discussed by Jackendoff 2002; see table entry). Meanwhile, changes in language teaching methodology in the mid to late twentieth century increasingly urged teachers to ask how adult learners could best master multiword strings to improve fluency and idiomaticity – a question still asked today. By the end of the twentieth century, new technological advances revealed frequency in usage as a probable agent of formulaicity, and these chimed with new models of lexical knowledge based on neural pathways and networks that could be strengthened by repeated exposure. Drawing on these models, we have seen, as we move into the twenty-first century, the development of new approaches to modelling language as a system – emergent grammars, including Construction Grammar – that are more accommodating of large, internally complex units. And finally, as we gradually understand more about how the brain accesses and retrieves linguistic material, we are seeing a resurgence of interest in formulaic language in neurological and clinical contexts.


Author(s):  
Martin H. Weik
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document