token sequence
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 1)

H-INDEX

1
(FIVE YEARS 0)

Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6417
Author(s):  
Yizhou Chen ◽  
Heng Dai ◽  
Xiao Yu ◽  
Wenhua Hu ◽  
Zhiwen Xie ◽  
...  

With the development of blockchain technologies, many Ponzi schemes disguise themselves under the veil of smart contracts. The Ponzi scheme contracts cause serious financial losses, which has a bad effect on the blockchain. Existing Ponzi scheme contract detection studies have mainly focused on extracting hand-crafted features and training a machine learning classifier to detect Ponzi scheme contracts. However, the hand-crafted features cannot capture the structural and semantic feature of the source code. Therefore, in this study, we propose a Ponzi scheme contract detection method called MTCformer (Multi-channel Text Convolutional Neural Networks and Transofrmer). In order to reserve the structural information of the source code, the MTCformer first converts the Abstract Syntax Tree (AST) of the smart contract code to the specially formatted code token sequence via the Structure-Based Traversal (SBT) method. Then, the MTCformer uses multi-channel TextCNN (Text Convolutional Neural Networks) to learn local structural and semantic features from the code token sequence. Next, the MTCformer employs the Transformer to capture the long-range dependencies of code tokens. Finally, a fully connected neural network with a cost-sensitive loss function in the MTCformer is used for classification. The experimental results show that the MTCformer is superior to the state-of-the-art methods and its variants in Ponzi scheme contract detection.



Author(s):  
Tao Gui ◽  
Jiacheng Ye ◽  
Qi Zhang ◽  
Yaqian Zhou ◽  
Yeyun Gong ◽  
...  

Document-level label consistency is an effective indicator that different occurrences of a particular token sequence are very likely to have the same entity types. Previous work focused on better context representations and used the CRF for label decoding. However, CRF-based methods are inadequate for modeling document-level label consistency. This work introduces a novel two-stage label refinement approach to handle document-level label consistency, where a key-value memory network is first used to record draft labels predicted by the base model, and then a multi-channel Transformer makes refinements on these draft predictions based on the explicit co-occurrence relationship derived from the memory network. In addition, in order to mitigate the side effects of incorrect draft labels, Bayesian neural networks are used to indicate the labels with a high probability of being wrong, which can greatly assist in preventing the incorrect refinement of correct draft labels. The experimental results on three named entity recognition benchmarks demonstrated that the proposed method significantly outperformed the state-of-the-art methods.



Author(s):  
Eddie A Santos ◽  
Joshua C Campbell ◽  
Abram Hindle ◽  
José Nelson Amaral

Minor syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by checking if two language models “agree” on each token. If the models disagree, it indicates a possible syntax error; the methodology tries to suggest a fix by finding an alternative token sequence obtained from the models. We trained two LSTM (Long short-term memory) language models on a large corpus of JavaScript code collected from GitHub. The dual LSTM neural network model predicts the correct location of the syntax error 54.74% in its top 4 suggestions and produces an exact fix up to 35.50% of the time. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors.



Author(s):  
Eddie A Santos ◽  
Joshua C Campbell ◽  
Abram Hindle ◽  
José Nelson Amaral

Minor syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by checking if two language models “agree” on each token. If the models disagree, it indicates a possible syntax error; the methodology tries to suggest a fix by finding an alternative token sequence obtained from the models. We trained two LSTM (Long short-term memory) language models on a large corpus of JavaScript code collected from GitHub. The dual LSTM neural network model predicts the correct location of the syntax error 54.74% in its top 4 suggestions and produces an exact fix up to 35.50% of the time. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors.



2017 ◽  
Vol 41 (1) ◽  
pp. 105-123 ◽  
Author(s):  
Faqih Salban Rabbani ◽  
Oscar Karnalim

Even though there are various source code plagiarism detection approaches, only a few works which are focused on low-level representation for deducting similarity. Most of them are only focused on lexical token sequence extracted from source code. In our point of view, low-level representation is more beneficial than lexical token since its form is more compact than the source code itself. It only considers semantic-preserving instructions and ignores many source code delimiter tokens. This paper proposes a source code plagiarism detection which rely on low-level representation. For a case study, we focus our work on .NET programming languages with Common Intermediate Language as its low-level representation. In addition, we also incorporate Adaptive Local Alignment for detecting similarity. According to Lim et al, this algorithm outperforms code similarity state-of-the-art algorithm (i.e. Greedy String Tiling) in term of effectiveness. According to our evaluation which involves various plagiarism attacks, our approach is more effective and efficient when compared with standard lexical-token approach. 



Author(s):  
Tomonari Masada

This paper introduces a new approach for large-scale unsupervised segmentation of bibliographic elements. The problem is segmenting a citation given as an untagged word token sequence into subsequences so that each subsequence corresponds to a different bibliographic element (e.g., authors, paper title, journal name, publication year, etc.). The same bibliographic element should be referred to by contiguous word tokens. This constraint is called contiguity constraint. The authors meet this constraint by using generalized Mallows models, effectively applied to document structure learning by Chen, Branavan, Barzilay, and Karger (2009). However, the method works for this problem only after modification. Therefore, the author proposes strategies to make the method applicable to this problem.



Sign in / Sign up

Export Citation Format

Share Document