chinese word segmentation Latest Research Papers

A Transformer-based Neural Model for Chinese Word Segmentation and Part-of-Speech Tagging

IJARCCE ◽

10.17148/ijarcce.2021.101201 ◽

2021 ◽

Vol 10 (12) ◽

Author(s):

Xinxin Li

Keyword(s):

Neural Model ◽

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Pseudo-siamese networks with lexicon for Chinese short text matching

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202592 ◽

2021 ◽

pp. 1-13

Author(s):

Jiawen Shi ◽

Hong Li ◽

Chiyu Wang ◽

Zhicheng Pang ◽

Jiale Zhou

Keyword(s):

Language Processing ◽

Chinese Text ◽

Experimental Studies ◽

Word Segmentation ◽

Chinese Word Segmentation ◽

Lexical Information ◽

Short Text ◽

Single Sentence ◽

Word Sequence ◽

Text Matching

Short text matching is one of the fundamental technologies in natural language processing. In previous studies, most of the text matching networks are initially designed for English text. The common approach to applying them to Chinese is segmenting each sentence into words, and then taking these words as input. However, this method often results in word segmentation errors. Chinese short text matching faces the challenges of constructing effective features and understanding the semantic relationship between two sentences. In this work, we propose a novel lexicon-based pseudo-siamese model (CL2 N), which can fully mine the information expressed in Chinese text. Instead of utilizing a character-sequence or a single word-sequence, CL2 N augments the text representation with multi-granularity information in characters and lexicons. Additionally, it integrates sentence-level features through single-sentence features as well as interactive features. Experimental studies on two Chinese text matching datasets show that our model has better performance than the state-of-the-art short text matching models, and the proposed method can solve the error propagation problem of Chinese word segmentation. Particularly, the incorporation of single-sentence features and interactive features allows the network to capture the contextual semantics and co-attentive lexical information, which contributes to our best result.

Download Full-text

Chinese Word Segmentation for Sub-character Representation

10.1109/icisfall51598.2021.9627454 ◽

2021 ◽

Author(s):

Taozheng Zhang ◽

Chenyang Shang

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Character Representation

Download Full-text

Application of Evolutionary Algorithm in Chinese Word Segmentation

10.1007/978-981-16-5857-0_32 ◽

2021 ◽

pp. 256-260

Author(s):

Yushan Zhang

Keyword(s):

Evolutionary Algorithm ◽

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation

Download Full-text

MRE: A Military Relation Extraction Model Based on BiGRU and Multi-Head Attention

Symmetry ◽

10.3390/sym13091742 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1742

Author(s):

Yiwei Lu ◽

Ruopeng Yang ◽

Xuping Jiang ◽

Dan Zhou ◽

Changshen Yin ◽

...

Keyword(s):

Comprehensive Evaluation ◽

Language Model ◽

Relation Extraction ◽

Attention Mechanism ◽

Semantic Features ◽

Chinese Word Segmentation ◽

Conceptual Foundation ◽

Attention Model ◽

Operational Information ◽

Military Relations

A great deal of operational information exists in the form of text. Therefore, extracting operational information from unstructured military text is of great significance for assisting command decision making and operations. Military relation extraction is one of the main tasks of military information extraction, which aims at identifying the relation between two named entities from unstructured military texts. However, the traditional methods of extracting military relations cannot easily resolve problems such as inadequate manual features and inaccurate Chinese word segmentation in military fields, failing to make full use of symmetrical entity relations in military texts. With our approach, based on the pre-trained language model, we present a Chinese military relation extraction method, which combines the bi-directional gate recurrent unit (BiGRU) and multi-head attention mechanism (MHATT). More specifically, the conceptual foundation of our method lies in constructing an embedding layer and combining word embedding with position embedding, based on the pre-trained language model; the output vectors of BiGRU neural networks are symmetrically spliced to learn the semantic features of context, and they fuse the multi-head attention mechanism to improve the ability of expressing semantic information. On the military text corpus that we have built, we conduct extensive experiments. We demonstrate the superiority of our method over the traditional non-attention model, attention model, and improved attention model, and the comprehensive evaluation value F1-score of the model is improved by about 4%.

Download Full-text

A Neural N-Gram-Based Classifier for Chinese Clinical Named Entity Recognition

Applied Sciences ◽

10.3390/app11188682 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8682

Author(s):

Ching-Sheng Lin ◽

Jung-Sing Jwo ◽

Cheng-Hsiung Lee

Keyword(s):

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Neural Model ◽

Entity Recognition ◽

Chinese Word Segmentation ◽

Named Entities ◽

Named Entity ◽

Biomedical Systems ◽

N Gram

Clinical Named Entity Recognition (CNER) focuses on locating named entities in electronic medical records (EMRs) and the obtained results play an important role in the development of intelligent biomedical systems. In addition to the research in alphabetic languages, the study of non-alphabetic languages has attracted considerable attention as well. In this paper, a neural model is proposed to address the extraction of entities from EMRs written in Chinese. To avoid erroneous noise being caused by the Chinese word segmentation, we employ the character embeddings as the only feature without extra resources. In our model, concatenated n-gram character embeddings are used to represent the context semantics. The self-attention mechanism is then applied to model long-range dependencies of embeddings. The concatenation of the new representations obtained by the attention module is taken as the input to bidirectional long short-term memory (BiLSTM), followed by a conditional random field (CRF) layer to extract entities. The empirical study is conducted on the CCKS-2017 Shared Task 2 dataset to evaluate our method and the experimental results show that our model outperforms other approaches.

Download Full-text

Performance Comparisons of Bi-LSTM and Bi-GRU Networks in Chinese Word Segmentation

10.1145/3480001.3480011 ◽

2021 ◽

Author(s):

Taozheng Zhang ◽

Rui Xu

Keyword(s):

Word Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Performance Comparisons

Download Full-text

Research on Recommendation of Online Materials Course Resources Based on Text Similarity

CONVERTER ◽

10.17762/converter.161 ◽

2021 ◽

pp. 100-112

Author(s):

Ziyu Liu, Mengying Yao

Keyword(s):

College Students ◽

Word Segmentation ◽

Online Course ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Web Crawler ◽

Text Similarity ◽

Learning Platform ◽

Similarity Calculation ◽

Noise Data

In order to solve the problem that it is difficult for college students to find learning resources related to thecourses they are learning quickly and accurately in blended learning. This paper proposes an online materials course resources recommendation method based on text similarity. Firstly, collecting the data of course resources on the online learning platform through web crawler technology. Secondly, preprocessing the data which contend deleting noise data, the Chinese word segmentation and calculating the course similarity based on cosine similarity then getting the course recommendation results according to the similarity ranking. Thirdly, evaluating the recommendation results and optimizing the similarity calculation method according to the evaluation results. Finally, the learners are recommended curriculum resources according to the similarity ranking results. According to the courses learned on the Superstar platform, the experiment recommends similar course resources on the XueYin Online platform. The results show that the online materials course resources recommendation method based on text similarity can recommend relevant online materials course resources for learners quickly and accurately, which has certain reference significance and application value for online course resources recommendation.

Download Full-text

Improved Chinese Word Segmentation Algorithm of Quantitative Units in Elementary Mathematics Application Problems

10.1109/icnisc54316.2021.00094 ◽

2021 ◽

Author(s):

Mingli Bai ◽

Mingwen Wang

Keyword(s):

Elementary Mathematics ◽

Word Segmentation ◽

Segmentation Algorithm ◽

Chinese Word ◽

Chinese Word Segmentation

Download Full-text

CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation

10.31235/osf.io/tyjr7 ◽

2021 ◽

Author(s):

Fei Shen ◽

Wenting Yu ◽

Chen Min ◽

Qianying Ye ◽

Chuanli Xia ◽

...

Keyword(s):

Social Media ◽

Text Mining ◽

Word Segmentation ◽

Unstructured Data ◽

Text Segmentation ◽

Chinese Word ◽

Chinese Word Segmentation ◽

Text Data ◽

Social Media Text

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.

Download Full-text

chinese word segmentation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Transformer-based Neural Model for Chinese Word Segmentation and Part-of-Speech Tagging

Pseudo-siamese networks with lexicon for Chinese short text matching

Chinese Word Segmentation for Sub-character Representation

Application of Evolutionary Algorithm in Chinese Word Segmentation

MRE: A Military Relation Extraction Model Based on BiGRU and Multi-Head Attention

A Neural N-Gram-Based Classifier for Chinese Clinical Named Entity Recognition

Performance Comparisons of Bi-LSTM and Bi-GRU Networks in Chinese Word Segmentation

Research on Recommendation of Online Materials Course Resources Based on Text Similarity

Improved Chinese Word Segmentation Algorithm of Quantitative Units in Elementary Mathematics Application Problems

CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation

Export Citation Format

chinese word segmentationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Transformer-based Neural Model for Chinese Word Segmentation and Part-of-Speech Tagging

Pseudo-siamese networks with lexicon for Chinese short text matching

Chinese Word Segmentation for Sub-character Representation

Application of Evolutionary Algorithm in Chinese Word Segmentation

MRE: A Military Relation Extraction Model Based on BiGRU and Multi-Head Attention

A Neural N-Gram-Based Classifier for Chinese Clinical Named Entity Recognition

Performance Comparisons of Bi-LSTM and Bi-GRU Networks in Chinese Word Segmentation

Research on Recommendation of Online Materials Course Resources Based on Text Similarity

Improved Chinese Word Segmentation Algorithm of Quantitative Units in Elementary Mathematics Application Problems

CyberCan: A New Dictionary for Cantonese Social Media Text Segmentation

chinese word segmentation
Recently Published Documents