token sequence Latest Research Papers

Improving Ponzi Scheme Contract Detection Using Multi-Channel TextCNN and Transformer

Sensors ◽

10.3390/s21196417 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6417

Author(s):

Yizhou Chen ◽

Heng Dai ◽

Xiao Yu ◽

Wenhua Hu ◽

Zhiwen Xie ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Structural Information ◽

Source Code ◽

Semantic Features ◽

Learning Classifier ◽

Ponzi Scheme ◽

The Veil ◽

Fully Connected ◽

Token Sequence

With the development of blockchain technologies, many Ponzi schemes disguise themselves under the veil of smart contracts. The Ponzi scheme contracts cause serious financial losses, which has a bad effect on the blockchain. Existing Ponzi scheme contract detection studies have mainly focused on extracting hand-crafted features and training a machine learning classifier to detect Ponzi scheme contracts. However, the hand-crafted features cannot capture the structural and semantic feature of the source code. Therefore, in this study, we propose a Ponzi scheme contract detection method called MTCformer (Multi-channel Text Convolutional Neural Networks and Transofrmer). In order to reserve the structural information of the source code, the MTCformer first converts the Abstract Syntax Tree (AST) of the smart contract code to the specially formatted code token sequence via the Structure-Based Traversal (SBT) method. Then, the MTCformer uses multi-channel TextCNN (Text Convolutional Neural Networks) to learn local structural and semantic features from the code token sequence. Next, the MTCformer employs the Transformer to capture the long-range dependencies of code tokens. Finally, a fully connected neural network with a cost-sensitive loss function in the MTCformer is used for classification. The experimental results show that the MTCformer is superior to the state-of-the-art methods and its variants in Ponzi scheme contract detection.

Leveraging Document-Level Label Consistency for Named Entity Recognition

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/550 ◽

2020 ◽

Author(s):

Tao Gui ◽

Jiacheng Ye ◽

Qi Zhang ◽

Yaqian Zhou ◽

Yeyun Gong ◽

...

Keyword(s):

Neural Networks ◽

Side Effects ◽

High Probability ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Memory Network ◽

Document Level ◽

Token Sequence

Document-level label consistency is an effective indicator that different occurrences of a particular token sequence are very likely to have the same entity types. Previous work focused on better context representations and used the CRF for label decoding. However, CRF-based methods are inadequate for modeling document-level label consistency. This work introduces a novel two-stage label refinement approach to handle document-level label consistency, where a key-value memory network is first used to record draft labels predicted by the base model, and then a multi-channel Transformer makes refinements on these draft predictions based on the explicit co-occurrence relationship derived from the memory network. In addition, in order to mitigate the side effects of incorrect draft labels, Bayesian neural networks are used to indicate the labels with a high probability of being wrong, which can greatly assist in preventing the incorrect refinement of correct draft labels. The experimental results on three named entity recognition benchmarks demonstrated that the proposed method significantly outperformed the state-of-the-art methods.

Disfluency Detection Based on Speech-Aware Token-by-Token Sequence Labeling with BLSTM-CRFs and Attention Mechanisms

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.1109/apsipaasc47483.2019.9023119 ◽

2019 ◽

Author(s):

Tomohiro Tanaka ◽

Ryo Masumura ◽

Takafumi Moriya ◽

Takanobu Oba ◽

Yushi Aono

Keyword(s):

Sequence Labeling ◽

Token Sequence

Finding and correcting syntax errors using recurrent neural networks

10.7287/peerj.preprints.3123v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Eddie A Santos ◽

Joshua C Campbell ◽

Abram Hindle ◽

José Nelson Amaral

Keyword(s):

Recurrent Neural Networks ◽

Short Term Memory ◽

Language Models ◽

Short Term ◽

Precise Location ◽

Correct Location ◽

Long Short Term Memory ◽

Lr Parsers ◽

Large Corpus ◽

Token Sequence

Minor syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by checking if two language models “agree” on each token. If the models disagree, it indicates a possible syntax error; the methodology tries to suggest a fix by finding an alternative token sequence obtained from the models. We trained two LSTM (Long short-term memory) language models on a large corpus of JavaScript code collected from GitHub. The dual LSTM neural network model predicts the correct location of the syntax error 54.74% in its top 4 suggestions and produces an exact fix up to 35.50% of the time. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors.

Finding and correcting syntax errors using recurrent neural networks

10.7287/peerj.preprints.3123 ◽

2017 ◽

Cited By ~ 1

Author(s):

Eddie A Santos ◽

Joshua C Campbell ◽

Abram Hindle ◽

José Nelson Amaral

Keyword(s):

Recurrent Neural Networks ◽

Short Term Memory ◽

Language Models ◽

Short Term ◽

Precise Location ◽

Correct Location ◽

Long Short Term Memory ◽

Lr Parsers ◽

Large Corpus ◽

Token Sequence

Minor syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by checking if two language models “agree” on each token. If the models disagree, it indicates a possible syntax error; the methodology tries to suggest a fix by finding an alternative token sequence obtained from the models. We trained two LSTM (Long short-term memory) language models on a large corpus of JavaScript code collected from GitHub. The dual LSTM neural network model predicts the correct location of the syntax error 54.74% in its top 4 suggestions and produces an exact fix up to 35.50% of the time. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors.

Detecting Source Code Plagiarism on .NET Programming Languages using Low-level Representation and Adaptive Local Alignment

Journal of information and organizational sciences ◽

10.31341/jios.41.1.7 ◽

2017 ◽

Vol 41 (1) ◽

pp. 105-123 ◽

Cited By ~ 4

Author(s):

Faqih Salban Rabbani ◽

Oscar Karnalim

Keyword(s):

Programming Languages ◽

State Of The Art ◽

Source Code ◽

Point Of View ◽

Local Alignment ◽

Plagiarism Detection ◽

Low Level ◽

Intermediate Language ◽

Token Sequence

Even though there are various source code plagiarism detection approaches, only a few works which are focused on low-level representation for deducting similarity. Most of them are only focused on lexical token sequence extracted from source code. In our point of view, low-level representation is more beneficial than lexical token since its form is more compact than the source code itself. It only considers semantic-preserving instructions and ignores many source code delimiter tokens. This paper proposes a source code plagiarism detection which rely on low-level representation. For a case study, we focus our work on .NET programming languages with Common Intermediate Language as its low-level representation. In addition, we also incorporate Adaptive Local Alignment for detecting similarity. According to Lim et al, this algorithm outperforms code similarity state-of-the-art algorithm (i.e. Greedy String Tiling) in term of effectiveness. According to our evaluation which involves various plagiarism attacks, our approach is more effective and efficient when compared with standard lexical-token approach. 

Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

International Journal of Organizational and Collective Intelligence ◽

10.4018/joci.2011040104 ◽

2011 ◽

Vol 2 (2) ◽

pp. 49-62 ◽

Cited By ~ 1

Author(s):

Tomonari Masada

Keyword(s):

Large Scale ◽

Structure Learning ◽

Unsupervised Segmentation ◽

New Approach ◽

Document Structure ◽

Word Token ◽

Token Sequence

This paper introduces a new approach for large-scale unsupervised segmentation of bibliographic elements. The problem is segmenting a citation given as an untagged word token sequence into subsequences so that each subsequence corresponds to a different bibliographic element (e.g., authors, paper title, journal name, publication year, etc.). The same bibliographic element should be referred to by contiguous word tokens. This constraint is called contiguity constraint. The authors meet this constraint by using generalized Mallows models, effectively applied to document structure learning by Chen, Branavan, Barzilay, and Karger (2009). However, the method works for this problem only after modification. Therefore, the author proposes strategies to make the method applicable to this problem.

token sequence
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Ponzi Scheme Contract Detection Using Multi-Channel TextCNN and Transformer

Leveraging Document-Level Label Consistency for Named Entity Recognition

Disfluency Detection Based on Speech-Aware Token-by-Token Sequence Labeling with BLSTM-CRFs and Attention Mechanisms

Finding and correcting syntax errors using recurrent neural networks

Finding and correcting syntax errors using recurrent neural networks

Detecting Source Code Plagiarism on .NET Programming Languages using Low-level Representation and Adaptive Local Alignment

Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

Export Citation Format

token sequenceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Ponzi Scheme Contract Detection Using Multi-Channel TextCNN and Transformer

Leveraging Document-Level Label Consistency for Named Entity Recognition

Disfluency Detection Based on Speech-Aware Token-by-Token Sequence Labeling with BLSTM-CRFs and Attention Mechanisms

Finding and correcting syntax errors using recurrent neural networks

Finding and correcting syntax errors using recurrent neural networks

Detecting Source Code Plagiarism on .NET Programming Languages using Low-level Representation and Adaptive Local Alignment

Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

token sequence
Recently Published Documents