Summarizing Source Code with Transferred API Knowledge

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/314 ◽

2018 ◽

Cited By ~ 14

Author(s):

Xing Hu ◽

Ge Li ◽

Xin Xia ◽

David Lo ◽

Shuai Lu ◽

...

Keyword(s):

Real World ◽

Software Maintenance ◽

Large Scale ◽

State Of The Art ◽

Source Code ◽

Code Search ◽

Novel Approach ◽

Software Maintenance And Evolution ◽

World Industry ◽

Similar Code

Code summarization, aiming to generate succinct natural language description of source code, is extremely useful for code search and code comprehension. It has played an important role in software maintenance and evolution. Previous approaches generate summaries by retrieving summaries from similar code snippets. However, these approaches heavily rely on whether similar code snippets can be retrieved, how similar the snippets are, and fail to capture the API knowledge in the source code, which carries vital information about the functionality of the source code. In this paper, we propose a novel approach, named TL-CodeSum, which successfully uses API knowledge learned in a different but related task to code summarization. Experiments on large-scale real-world industry Java projects indicate that our approach is effective and outperforms the state-of-the-art in code summarization.

Download Full-text

Extrinsic Camera Calibration with Line-Laser Projection

Sensors ◽

10.3390/s21041091 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1091

Author(s):

Izaak Van Crombrugge ◽

Rudi Penne ◽

Steve Vanlanduit

Keyword(s):

Camera Calibration ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Bundle Adjustment ◽

Field Of View ◽

Extrinsic Calibration ◽

Practical Procedure ◽

Partial Overlap

Knowledge of precise camera poses is vital for multi-camera setups. Camera intrinsics can be obtained for each camera separately in lab conditions. For fixed multi-camera setups, the extrinsic calibration can only be done in situ. Usually, some markers are used, like checkerboards, requiring some level of overlap between cameras. In this work, we propose a method for cases with little or no overlap. Laser lines are projected on a plane (e.g., floor or wall) using a laser line projector. The pose of the plane and cameras is then optimized using bundle adjustment to match the lines seen by the cameras. To find the extrinsic calibration, only a partial overlap between the laser lines and the field of view of the cameras is needed. Real-world experiments were conducted both with and without overlapping fields of view, resulting in rotation errors below 0.5°. We show that the accuracy is comparable to other state-of-the-art methods while offering a more practical procedure. The method can also be used in large-scale applications and can be fully automated.

Download Full-text

Augmenting Bug Localization with Part-of-Speech and Invocation

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500346 ◽

2017 ◽

Vol 27 (06) ◽

pp. 925-949 ◽

Cited By ~ 5

Author(s):

Yu Zhou ◽

Yanxiang Tong ◽

Taolue Chen ◽

Jin Han

Keyword(s):

Software Maintenance ◽

Large Scale ◽

Bug Localization ◽

Bug Reports ◽

Part Of Speech ◽

Adaptive Technique ◽

Bug Report ◽

Software Maintenance And Evolution ◽

Speech Features ◽

Localization Approach

Bug localization represents one of the most expensive, as well as time-consuming, activities during software maintenance and evolution. To alleviate the workload of developers, numerous methods have been proposed to automate this process and narrow down the scope of reviewing buggy files. In this paper, we present a novel buggy source-file localization approach, using the information from both the bug reports and the source files. We leverage the part-of-speech features of bug reports and the invocation relationship among source files. We also integrate an adaptive technique to further optimize the performance of the approach. The adaptive technique discriminates Top 1 and Top N recommendations for a given bug report and consists of two modules. One module is to maximize the accuracy of the first recommended file, and the other one aims at improving the accuracy of the fixed defect file list. We evaluate our approach on six large-scale open source projects, i.e. ASpectJ, Eclipse, SWT, Zxing, Birt and Tomcat. Compared to the previous work, empirical results show that our approach can improve the overall prediction performance in all of these cases. Particularly, in terms of the Top 1 recommendation accuracy, our approach achieves an enhancement from 22.73% to 39.86% for ASpectJ, from 24.36% to 30.76% for Eclipse, from 31.63% to 46.94% for SWT, from 40% to 55% for ZXing, from 7.97% to 21.99% for Birt, and from 33.37% to 38.90% for Tomcat.

Download Full-text

Legal Judgment Prediction Based on Multiclass Information Fusion

Complexity ◽

10.1155/2020/3089189 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Kongfan Zhu ◽

Rundong Guo ◽

Weifeng Hu ◽

Zeqiang Li ◽

Yujun Li

Keyword(s):

Information Fusion ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

External Information ◽

Criminal Cases ◽

Law System ◽

Large Scale Dataset ◽

Assistant Systems ◽

Civil Law System

Legal judgment prediction (LJP), as an effective and critical application in legal assistant systems, aims to determine the judgment results according to the information based on the fact determination. In real-world scenarios, to deal with the criminal cases, judges not only take advantage of the fact description, but also consider the external information, such as the basic information of defendant and the court view. However, most existing works take the fact description as the sole input for LJP and ignore the external information. We propose a Transformer-Hierarchical-Attention-Multi-Extra (THME) Network to make full use of the information based on the fact determination. We conduct experiments on a real-world large-scale dataset of criminal cases in the civil law system. Experimental results show that our method outperforms state-of-the-art LJP methods on all judgment prediction tasks.

Download Full-text

Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5329 ◽

2020 ◽

Vol 34 (01) ◽

pp. 19-26 ◽

Cited By ~ 5

Author(s):

Chong Chen ◽

Min Zhang ◽

Yongfeng Zhang ◽

Weizhi Ma ◽

Yiqun Liu ◽

...

Keyword(s):

Collaborative Filtering ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Heterogeneous Data ◽

Model Parameters ◽

Online Systems ◽

Practical Applications ◽

Real World Datasets ◽

Primary Type

Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluctuation, making sampling-based methods difficult to achieve the optimal ranking performance in practical applications; 2) although heterogeneous feedback (e.g., view, click, and purchase) is widespread in many online systems, most existing methods leverage only one primary type of user feedback such as purchase. In this work, we propose a novel non-sampling transfer learning solution, named Efficient Heterogeneous Collaborative Filtering (EHCF) for Top-N recommendation. It can not only model fine-grained user-item relations, but also efficiently learn model parameters from the whole heterogeneous data (including all unlabeled data) with a rather low time complexity. Extensive experiments on three real-world datasets show that EHCF significantly outperforms state-of-the-art recommendation methods in both traditional (single-behavior) and heterogeneous scenarios. Moreover, EHCF shows significant improvements in training efficiency, making it more applicable to real-world large-scale systems. Our implementation has been released 1 to facilitate further developments on efficient whole-data based neural methods.

Download Full-text

Large-Scale Multi-View Subspace Clustering in Linear Time

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5867 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4412-4419 ◽

Cited By ~ 3

Author(s):

Zhao Kang ◽

Wangtao Zhou ◽

Zhitong Zhao ◽

Junming Shao ◽

Meng Han ◽

...

Keyword(s):

Large Scale ◽

State Of The Art ◽

Linear Time ◽

Subspace Clustering ◽

Data Sets ◽

Clustering Methods ◽

Single View ◽

Novel Approach ◽

Points Of View ◽

Effectiveness And Efficiency

A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods.

Download Full-text

Scene text removal via cascaded text stroke detection and erasing

Computational Visual Media ◽

10.1007/s41095-021-0242-8 ◽

2021 ◽

Vol 8 (2) ◽

pp. 273-287

Author(s):

Xuewei Bian ◽

Chaoqun Wang ◽

Weize Quan ◽

Juntao Ye ◽

Xiaopeng Zhang ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Processing Unit ◽

Final Model ◽

Scene Text ◽

End To End

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Download Full-text

Syntax-Guided Controlled Generation of Paraphrases

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00318 ◽

2020 ◽

Vol 8 ◽

pp. 330-345

Author(s):

Ashutosh Kumar ◽

Kabir Ahuja ◽

Raghuram Vadapalli ◽

Partha Talukdar

Keyword(s):

Real World ◽

English Language ◽

State Of The Art ◽

Source Code ◽

Future Research ◽

Text Generation ◽

Syntactic Information ◽

Input Sentence ◽

World English ◽

Paraphrase Generation

Given a sentence (e.g., “I like mangoes”) and a constraint (e.g., sentiment flip), the goal of controlled text generation is to produce a sentence that adapts the input sentence to meet the requirements of the constraint (e.g., “I hate mangoes”). Going beyond such simple constraints, recent work has started exploring the incorporation of complex syntactic-guidance as constraints in the task of controlled paraphrase generation. In these methods, syntactic-guidance is sourced from a separate exemplar sentence. However, these prior works have only utilized limited syntactic information available in the parse tree of the exemplar sentence. We address this limitation in the paper and propose Syntax Guided Controlled Paraphraser (SGCP), an end-to-end framework for syntactic paraphrase generation. We find that Sgcp can generate syntax-conforming sentences while not compromising on relevance. We perform extensive automated and human evaluations over multiple real-world English language datasets to demonstrate the efficacy of Sgcp over state-of-the-art baselines. To drive future research, we have made Sgcp’s source code available. 1

Download Full-text

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/514 ◽

2017 ◽

Cited By ~ 8

Author(s):

Xiaodong Gu ◽

Hongyu Zhang ◽

Dongmei Zhang ◽

Sunghun Kim

Keyword(s):

Sequence Learning ◽

Large Scale ◽

Intelligent System ◽

State Of The Art ◽

Source Code ◽

Computer Programs ◽

Experimental Results ◽

Multiple Devices ◽

Application Programming ◽

Programming Interfaces

Computer programs written in one language are often required to be ported to other languages to support multiple devices and environments. When programs use language specific APIs (Application Programming Interfaces), it is very challenging to migrate these APIs to the corresponding APIs written in other languages. Existing approaches mine API mappings from projects that have corresponding versions in two languages. They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings. In this paper, we propose an intelligent system called DeepAM for automatically mining API mappings from a large-scale code corpus without bilingual projects. The key component of DeepAM is based on the multi-modal sequence to sequence learning architecture that aims to learn joint semantic representations of bilingual API sequences from big source code data. Experimental results indicate that DeepAM significantly increases the accuracy of API mappings as well as the number of API mappings when compared with the state-of-the-art approaches.

Download Full-text

Real-world longitudinal data collected from the SleepHealth mobile app study

Scientific Data ◽

10.1038/s41597-020-00753-2 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Sean Deering ◽

Abhishek Pratap ◽

Christine Suver ◽

A. Joseph Borelli ◽

Adam Amdur ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Digital Health ◽

The United States ◽

Mobile App ◽

Environmental Data ◽

Sleep Habits ◽

Real World Data ◽

Disease Manifestation ◽

Novel Approach

AbstractConducting biomedical research using smartphones is a novel approach to studying health and disease that is only beginning to be meaningfully explored. Gathering large-scale, real-world data to track disease manifestation and long-term trajectory in this manner is quite practical and largely untapped. Researchers can assess large study cohorts using surveys and sensor-based activities that can be interspersed with participants’ daily routines. In addition, this approach offers a medium for researchers to collect contextual and environmental data via device-based sensors, data aggregator frameworks, and connected wearable devices. The main aim of the SleepHealth Mobile App Study (SHMAS) was to gain a better understanding of the relationship between sleep habits and daytime functioning utilizing a novel digital health approach. Secondary goals included assessing the feasibility of a fully-remote approach to obtaining clinical characteristics of participants, evaluating data validity, and examining user retention patterns and data-sharing preferences. Here, we provide a description of data collected from 7,250 participants living in the United States who chose to share their data broadly with the study team and qualified researchers worldwide.

Download Full-text

Search for Compatible Source Code

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500169 ◽

2021 ◽

Vol 31 (03) ◽

pp. 477-502

Author(s):

Fuqi Cai ◽

Changjing Wang ◽

Qing Huang ◽

Zhengkang Zuo ◽

Yunyan Liao

Keyword(s):

Programming Language ◽

State Of The Art ◽

Source Code ◽

The State ◽

Third Party ◽

Search Model ◽

Code Search ◽

Art Methods ◽

Local Programming ◽

Cosine Distance

Third-party libraries always evolve and produce multiple versions. Lucene, for example, released ten new versions (from version 7.7.0 to 8.4.0) in 2019. These versions confuse the existing code search methods to retrieve the source code that is not compatible with local programming language. To solve this issue, we propose DCSE, a deep code search model based on evolving information (i.e. evolved code tokens and evolution description). DCSE first deeply excavates evolved code tokens and evolution description in the code evolution process; then it takes evolved code tokens and evolution description as one feature of source code and code description, respectively. With such fuller representation, DCSE embeds source code and its code description into a high-dimensional shared vector space, and makes the cosine distance of their vectors closer. For the ever-evolving third-party libraries like Lucene, the experimental results show that DCSE could retrieve the source code that is compatible with local programming language, it outperforms the state-of-the-art methods (e.g. CODEnn) by 56.9–60.9[Formula: see text] in RFVersion. For the rarely-evolving third-party libraries, DCSE outperforms the state-of-the-art methods (e.g. CODEnn) by 4–11[Formula: see text] in Precision.

Download Full-text