scholarly journals Research on text summarization classification based on crowdfunding projects

2021 ◽  
Vol 336 ◽  
pp. 06020
Author(s):  
Gang Zhou

In recent years, artificial intelligence technologies represented by deep learning and natural language processing have made huge breakthroughs and have begun to emerge in the field of crowdfunding project analysis. Natural language processing technology enables machines to understand and analyze the text of crowdfunding projects, and classify them based on the summary description of the project, which can help companies and individuals improve the project pass rate, so it has received widespread attention. However, most of the current researches are mostly applied to topic modeling of project texts. Few studies have proposed effective solutions for classification prediction based on abstracts of crowdfunding projects. Therefore, this paper proposes a sequence-enhanced capsule network model for this problem. Specifically, based on the work of the capsule network, we propose to connect BiGRU and CapsNet in order to achieve the effect of considering both the sequence semantic information and spatial location information of the text. We apply the proposed method to the kickstarter-NLP dataset, and the experimental results prove that our model has a good classification effect in this case.

2018 ◽  
Vol 54 (3A) ◽  
pp. 64
Author(s):  
Nguyen Chi Hieu

The exact tagging of the words in the texts is a very important task in the natural language processing. It can support parsing the text, contribute to the solution of the polysemous word, and help to access a semantic information, etc. One of crucial factors in the POS (Part-of-Speech) tagging approaches based on the statistical method is the processing time. In this paper, we propose an approach to calculate the pruning threshold, which can apply into the Viterbi algorithm of Hidden Markov model for tagging the texts in the natural language processing. Experiment on the 1.000.000 words on the tag of the Wall Street Journal corpus showed that our proposed solution is satisfactory.


2021 ◽  
pp. 1-10
Author(s):  
Shuai Zhao ◽  
Fucheng You ◽  
Wen Chang ◽  
Tianyu Zhang ◽  
Man Hu

The BERT pre-trained language model has achieved good results in various subtasks of natural language processing, but its performance in generating Chinese summaries is not ideal. The most intuitive reason is that the BERT model is based on character-level composition, while the Chinese language is mostly in the form of phrases. Directly fine-tuning the BERT model cannot achieve the expected effect. This paper proposes a novel summary generation model with BERT augmented by the pooling layer. In our model, we perform an average pooling operation on token embedding to improve the model’s ability to capture phrase-level semantic information. We use LCSTS and NLPCC2017 to verify our proposed method. Experimental data shows that the average pooling model’s introduction can effectively improve the generated summary quality. Furthermore, different data needs to be set with varying pooling kernel sizes to achieve the best results through comparative analysis. In addition, our proposed method has strong generalizability. It can be applied not only to the task of generating summaries, but also to other natural language processing tasks.


Author(s):  
Tianlin Liu ◽  
Lyle Ungar ◽  
João Sedoc

Word vectors are at the core of many natural language processing tasks. Recently, there has been interest in post-processing word vectors to enrich their semantic information. In this paper, we introduce a novel word vector post-processing technique based on matrix conceptors (Jaeger 2014), a family of regularized identity maps. More concretely, we propose to use conceptors to suppress those latent features of word vectors having high variances. The proposed method is purely unsupervised: it does not rely on any corpus or external linguistic database. We evaluate the post-processed word vectors on a battery of intrinsic lexical evaluation tasks, showing that the proposed method consistently outperforms existing state-of-the-art alternatives. We also show that post-processed word vectors can be used for the downstream natural language processing task of dialogue state tracking, yielding improved results in different dialogue domains.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Sign in / Sign up

Export Citation Format

Share Document