word level
Recently Published Documents





Parita Shah ◽  
Priya Swaminarayan ◽  
Maitri Patel

<span>Opinion analysis is by a long shot most basic zone of characteristic language handling. It manages the portrayal of information to choose the motivation behind the wellspring of the content. The reason might be of a type of gratefulness (positive) or study (negative). This paper offers a correlation between the outcomes accomplished by applying the calculation arrangement using various classifiers for instance K-nearest neighbor and multinomial naive Bayes. These techniques are utilized to assess a significant assessment with either a positive remark or negative remark. The gathered information considered on the grounds of the extremity film datasets and an association with the results accessible proof has been created for a careful assessment. This paper investigates the word level count vectorizer and term frequency inverse document frequency (TF-IDF) influence on film sentiment analysis. We concluded that multinomial Naive Bayes (MNB) classier generate more accurate result using TF-IDF vectorizer compared to CountVectorizer, K-nearest-neighbors (KNN) classifier has the same accuracy result in case of TF-IDF and CountVectorizer.</span>

2022 ◽  
Vol 9 (1) ◽  
Jin Wang ◽  
Marisa N. Lytle ◽  
Yael Weiss ◽  
Brianna L. Yamasaki ◽  
James R. Booth

AbstractThis dataset examines language development with a longitudinal design and includes diffusion- and T1-weighted structural magnetic resonance imaging (MRI), task-based functional MRI (fMRI), and a battery of psycho-educational assessments and parental questionnaires. We collected data from 5.5-6.5-year-old children (ses-5) and followed them up when they were 7-8 years old (ses-7) and then again at 8.5-10 years old (ses-9). To increase the sample size at the older time points, another cohort of 7-8-year-old children (ses-7) were recruited and followed up when they were 8.5–10 years old (ses-9). In total, 322 children who completed at least one structural and functional scan were included. Children performed four fMRI tasks consisting of two word-level tasks examining phonological and semantic processing and two sentence-level tasks investigating semantic and syntactic processing. The MRI data is valuable for examining changes over time in interactive specialization due to the use of multiple imaging modalities and tasks in this longitudinal design. In addition, the extensive psycho-educational assessments and questionnaires provide opportunities to explore brain-behavior and brain-environment associations.

2022 ◽  
Vol 12 ◽  
Sietske van Viersen ◽  
Athanassios Protopapas ◽  
Peter F. de Jong

In this study, we investigated how word- and text-level processes contribute to different types of reading fluency measures. We aimed to increase our understanding of the underlying processes necessary for fluent reading. The sample included 73 Dutch Grade 3 children, who were assessed on serial word reading rate (familiar words), word-list reading fluency (increasingly difficult words), and sentence reading fluency. Word-level processes were individual word recognition speed (discrete word reading) and sequential processing efficiency (serial digit naming). Text-level processes were receptive vocabulary and syntactic skills. The results showed that word- and text-level processes combined accounted for a comparable amount of variance in all fluency outcomes. Both word-level processes were moderate predictors of all fluency outcomes. However, vocabulary only moderately predicted sentence reading fluency, and syntactic skills merely contributed to sentence reading fluency indirectly through vocabulary. The findings indicate that sequential processing efficiency has a crucial role in reading fluency across various measures besides individual word recognition speed. Additionally, text-level processes come into play when complexity and context availability of fluency measures increases, but the exact timing requires further study. Findings are discussed in terms of future directions and their possible value for diagnostic assessment and intervention of reading difficulties.

2022 ◽  
Vol 12 (2) ◽  
pp. 594
Jianjie Shao ◽  
Jiwei Qin ◽  
Wei Zeng ◽  
Jiong Zheng

Recently, the interaction information from reviews has been modeled to acquire representations between users and items and improve the sparsity problem in recommendation systems. Reviews are more responsive to information about users’ preferences for the different aspects and attributes of items. However, how to better construct the representation of users (items) still needs further research. Inspired by the interaction information from reviews, auxiliary ID embedding information is used to further enrich the word-level representation in the proposed model named MPCAR. In this paper, first, a multipointer learning scheme is adopted to extract the most informative reviews from user and item reviews and represent users (items) in a word-by-word manner. Then, users and items are embedded to extract the ID embedding that can reveal the identity of users (items). Finally, the review features and ID embedding are input to the gated neural network for effective fusion to obtain richer representations of users and items. We randomly select ten subcategory datasets from the Amazon dataset to evaluate our algorithm. The experimental results show that our algorithm can achieve the best results compared to other recommendation approaches.

Shrinidhi Kanchi ◽  
Alain Pagani ◽  
Hamam Mokayed ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  

Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. The image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal approach co-learns the visual and textual features, and it has proved to be more effective. Nonetheless, these approaches require a huge amount of data. This paper presents a novel approach for document classification that works with a small amount of data and outperforms other approaches. The proposed approach incorporates a hierarchical attention network(HAN) for the textual stream and the EfficientNet-B0 for the image stream. The hierarchical attention network in the textual stream uses the dynamic word embedding through fine-tuned BERT. HAN incorporates both the word level and sentence level features. While the earlier approaches rely on training on a large corpus (RVL-CDIP), we show that our approach works with a small amount of data (Tobacco-3482). To this end, we trained the neural network at Tobacco-3428 from scratch. Thereby, we outperform state-of-the-art by obtaining an accuracy of 90.3%. This results in a relative error reduction rate of 7.9%.

2022 ◽  
Vol 5 (1) ◽  
pp. 9
Junjie Liu ◽  
Yong Yang ◽  
Xiaochao Fan ◽  
Ge Ren ◽  
Liang Yang ◽  

The rapid identification of offensive language in social media is of great significance for preventing viral spread and reducing the spread of malicious information, such as cyberbullying and content related to self-harm. In existing research, the public datasets of offensive language are small; the label quality is uneven; and the performance of the pre-trained models is not satisfactory. To overcome these problems, we proposed a multi-semantic fusion model based on data augmentation (MSF). Data augmentation was carried out by back translation so that it reduced the impact of too-small datasets on performance. At the same time, we used a novel fusion mechanism that combines word-level semantic features and n-grams character features. The experimental results on the two datasets showed that the model proposed in this study can effectively extract the semantic information of offensive language and achieve state-of-the-art performance on both datasets.

2022 ◽  
pp. 101973
Abdullah I. Alharbi ◽  
Phillip Smith ◽  
Mark Lee

2021 ◽  
Vol 14 (4) ◽  
pp. 1-24
Sushant Kafle ◽  
Becca Dingman ◽  
Matt Huenerfauth

There are style guidelines for authors who highlight important words in static text, e.g., bolded words in student textbooks, yet little research has investigated highlighting in dynamic texts, e.g., captions during educational videos for Deaf or Hard of Hearing (DHH) users. In our experimental study, DHH participants subjectively compared design parameters for caption highlighting, including: decoration (underlining vs. italicizing vs. boldfacing), granularity (sentence level vs. word level), and whether to highlight only the first occurrence of a repeating keyword. In partial contrast to recommendations in prior research, which had not been based on experimental studies with DHH users, we found that DHH participants preferred boldface, word-level highlighting in captions. Our empirical results provide guidance for the design of keyword highlighting during captioned videos for DHH users, especially in educational video genres.

2021 ◽  
Vol 2 (4) ◽  
pp. 27-35
Lok Raj Sharma

Error analysis in linguistics is a systematic process of collecting, identifying, describing, explaining and evaluating unaccepted linguistic forms committed by learners in their writings or speeches. This article attempts to assess the errors committed by 128 bachelor first year education students studying English as a foreign language at Makawanpur Multiple Campus, Hetauda, Nepal in the year 2021. Every student was assigned to write an essay on ‘The Impact of Corona Pandemic on Students’ in about 500 words as the written language sample in a free mode. 128 essays were selected as a sample through the simple random sampling method lottery from 190 essays. All the errors in their essays were identified, described, classified, explained and analyzed. The results revealed that most of the students committed errors in omission at the sentence level, and the causes of the errors were due to intralingual transfer, whereas the highest frequency of errors at the word level was preposition resulted due to mother tongue transfer and overgeneralization.

Sign in / Sign up

Export Citation Format

Share Document