character sequence
Recently Published Documents


TOTAL DOCUMENTS

25
(FIVE YEARS 9)

H-INDEX

5
(FIVE YEARS 2)

Author(s):  
B. Premjith ◽  
K. P. Soman

Morphological synthesis is one of the main components of Machine Translation (MT) frameworks, especially when any one or both of the source and target languages are morphologically rich. Morphological synthesis is the process of combining two words or two morphemes according to the Sandhi rules of the morphologically rich language. Malayalam and Tamil are two languages in India which are morphologically abundant as well as agglutinative. Morphological synthesis of a word in these two languages is challenging basically because of the following reasons: (1) Abundance in morphology; (2) Complex Sandhi rules; (3) The possibilty in Malayalam to form words by combining words that belong to different syntactic categories (for example, noun and verb); and (4) The construction of a sentence by combining multiple words. We formulated the task of the morphological generation of nouns and verbs of Malayalam and Tamil as a character-to-character sequence tagging problem. In this article, we used deep learning architectures like Recurrent Neural Network (RNN) , Long Short-Term Memory Networks (LSTM) , Gated Recurrent Unit (GRU) , and their stacked and bidirectional versions for the implementation of morphological synthesis at the character level. In addition to that, we investigated the performance of the combination of the aforementioned deep learning architectures and the Conditional Random Field (CRF) in the morphological synthesis of nouns and verbs in Malayalam and Tamil. We observed that the addition of CRF to the Bidirectional LSTM/GRU architecture achieved more than 99% accuracy in the morphological synthesis of Malayalam and Tamil nouns and verbs.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Pengfei Meng ◽  
Shuangcheng Jia ◽  
Qian Li

AbstractSequence recognition of natural scene images has always been an important research topic in the field of computer vision. CRNN has been proven to be a popular end-to-end character sequence recognition network. However, the problem of wide characters is not considered under the setting of CRNN. The CRNN is less effective in recognizing long dense small characters. Aiming at the shortcomings of CRNN, we proposed an improved CRNN network, named CRNN-RES, based on BiLSTM and multiple receptive fields. Specifically, on the one hand, the CRNN-RES uses a dual pooling core to enhance the CNN network’s ability to extract features. On the other hand, by improving the last RNN layer, the BiLSTM is changed to a shared parameter BiLSTM network using recursive residuals, which reduces the number of network parameters and improves the accuracy. In addition, we designed a structure that can flexibly configure the length of the input data sequence in the RNN layer, called the CRFC layer. Comparing the CRNN-RES network proposed in this paper with the original CRNN network, the extensive experiments show that when recognizing English characters and numbers, the parameters of CRNN-RES is 8197549, which decreased 133,752 parameters compare with CRNN. In the public dataset ICDAR 2003 (IC03), ICDAR 2013 (IC13), IIIT 5k-word (IIIT5k), and Street View Text (SVT), the CRNN-RES obtain the accuracy of 96.90%, 89.85%, 83.63%, and 82.96%, which higher than CRNN by 1.40%, 3.15%, 5.43%, and 2.16% respectively.


2020 ◽  
Author(s):  
Jaroslaw Roman Lelonkiewicz ◽  
Maria Ktori ◽  
Davide Crepaldi

During visual word processing readers identify chunks of co-occurring letters and code for their typical position within words. Using an artificial script, we examined whether these phenomena can be explained by the ability to extract visual regularities from the environment. Participants were first familiarized with a lexicon of pseudoletter strings, each comprising an affix-like chunk that either followed (Experiment 1) or preceded (Experiment 2) a random character sequence. In the absence of any linguistic information, chunks could be defined only by their statistical properties - similarly to affixes in the real language, chunks occurred frequently and assumed a specific position within strings. In a later testing phase, we found that participants were more likely to attribute a previously unseen string to the familiarization lexicon if it contained an affix, and if the affix appeared in its typical position. Importantly, these findings suggest that readers may chunk words using a general, language-agnostic cognitive mechanism that captures statistical regularities in the learning materials. [NOTE: Please cite this paper as: Lelonkiewicz, J. R., Ktori, M., & Crepaldi, D. (2020). Morphemes as letter chunks: Discovering affixes through visual regularities. Journal of Memory and language, 115, 104152. https://doi.org/10.1016/j.jml.2020.104152 ]


JURNAL IQRA ◽  
2020 ◽  
Vol 5 (1) ◽  
pp. 171-182
Author(s):  
Dede Ramdani ◽  
Deasy Nurma Hidayat ◽  
Asep Sumarna ◽  
Icmiati Santika

This article was to find out the character Muslim prioritizes in facing the Industrial Revolution Era 4.0 and Society 5.0 The data analysis technique in this study uses categorical statistics from the distribution of questionnaires in the Bandung area and it’s surroundings. The results of the distribution of the questionnaire showed the ideal character sequence in Muslim children, namely honesty, discipline, responsibility, polite, confident, hardworking, tolerant, creative and innovative, caring, productive and religious. Then, it can be concluded that these characters can be the foundation of Muslim children facing the development of the industrial revolution era 4.0 and society 5.0. Keywords:  Ideal Character, Muslim Generation, Industrial Revolution 4.0


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 35185-35199 ◽  
Author(s):  
Chris Henry ◽  
Sung Yoon Ahn ◽  
Sang-Woong Lee

Electronics ◽  
2019 ◽  
Vol 8 (9) ◽  
pp. 971 ◽  
Author(s):  
Min Zhang ◽  
Yujin Yan ◽  
Hai Wang ◽  
Wei Zhao

Irregular text has widespread applications in multiple areas. Different from regular text, irregular text is difficult to recognize because of its various shapes and distorted patterns. In this paper, we develop a multidirectional convolutional neural network (MCN) to extract four direction features to fully describe the textual information. Meanwhile, the character placement possibility is extracted as the weight of the four direction features. Based on these works, we propose the encoder to fuse the four direction features for the generation of feature code to predict the character sequence. The whole network is end-to-end trainable due to using images and word-level labels. The experiments on standard benchmarks, including the IIIT-5K, SVT, CUTE80, and ICDAR datasets, demonstrate the superiority of the proposed method on both regular and irregular datasets. The developed method shows an increase of 1.2% in the CUTE80 dataset and 1.5% in the SVT dataset, and has fewer parameters than most existing methods.


2019 ◽  
Vol 89 (19-20) ◽  
pp. 4148-4161 ◽  
Author(s):  
Pengpeng Hu ◽  
Edmond SL Ho ◽  
Nauman Aslam ◽  
Taku Komura ◽  
Hubert PH Shum

With the development of e-shopping, there is a significant growth in clothing purchases online. However, the virtual clothing fit evaluation is still under-researched. In the literature, the thickness of the air layer between the human body and clothes is a dominant geometric indicator to evaluate the clothing fit. However, such an approach has only been applied to the stationary positions of the mannequin/human body. Physical indicators such as the pressure/tension of a virtual garment fitted on the virtual body in a continuous motion are also proposed for clothing fit evaluation. Neither geometric nor physical evaluations consider the interaction of the garment with the body, e.g., the sliding of the garment along the human body. In this study, a new framework was proposed to automatically determine the dynamic air gap thickness. First, the dynamic dressed character sequence was simulated in three-dimensional (3D) clothing software via importing the body parameters, cloth parameters, and a walking motion. Second, a cost function was defined to convert the garment in the previous frame to the local coordinate of the next frame. The dynamic air gap thickness between clothes and the human body was determined. Third, a new metric, the 3D garment vector field was proposed to represent the movement flow of the dynamic virtual garment, whose directional changes are calculated by cosine similarity. Experimental results show that our method is more sensitive to the small air gap thickness changes compared with state-of-the-art methods, allowing it to more effectively evaluate clothing fit in a virtual environment.


Cladistics ◽  
2019 ◽  
Vol 35 (5) ◽  
pp. 573-575
Author(s):  
Ward C. Wheeler ◽  
Alexander J. Washburn

Author(s):  
Jason Lee ◽  
Kyunghyun Cho ◽  
Thomas Hofmann

Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural machine translation (NMT) model that maps a source character sequence to a target character sequence without any segmentation. We employ a character-level convolutional network with max-pooling at the encoder to reduce the length of source representation, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularities. Our character-to-character model outperforms a recently proposed baseline with a subword-level encoder on WMT’15 DE-EN and CS-EN, and gives comparable performance on FI-EN and RU-EN. We then demonstrate that it is possible to share a single character-level encoder across multiple languages by training a model on a many-to-one translation task. In this multilingual setting, the character-level encoder significantly outperforms the subword-level encoder on all the language pairs. We observe that on CS-EN, FI-EN and RU-EN, the quality of the multilingual character-level translation even surpasses the models specifically trained on that language pair alone, both in terms of the BLEU score and human judgment.


Sign in / Sign up

Export Citation Format

Share Document