scholarly journals Representation learning applications in biological sequence analysis

Author(s):  
Hitoshi Iuchi ◽  
Taro Matsutani ◽  
Keisuke Yamada ◽  
Natsuki Iwano ◽  
Shunsuke Sumi ◽  
...  
2021 ◽  
Author(s):  
Hitoshi Iuchi ◽  
Taro Matsutani ◽  
Keisuke Yamada ◽  
Shunsuke Sumi ◽  
Shion Hosoda ◽  
...  

Remarkable advances in high-throughput sequencing have resulted in rapid data accumulation, and analyzing biological (DNA/RNA/protein) sequences to discover new insights in biology has become more critical and challenging. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention, because biological sequences are regarded as sentences and k-mers in these sequences as words. Embedding is an essential step in NLP, which converts words into vectors. This transformation is called representation learning and can be applied to biological sequences. Vectorized biological sequences can be used for function and structure estimation, or as inputs for other probabilistic models. Given the importance and growing trend in the application of representation learning in biology, here, we review the existing knowledge in representation learning for biological sequence analysis.


Sign in / Sign up

Export Citation Format

Share Document