DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

Computer programs written in one language are often required to be ported to other languages to support multiple devices and environments. When programs use language specific APIs (Application Programming Interfaces), it is very challenging to migrate these APIs to the corresponding APIs written in other languages. Existing approaches mine API mappings from projects that have corresponding versions in two languages. They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings. In this paper, we propose an intelligent system called DeepAM for automatically mining API mappings from a large-scale code corpus without bilingual projects. The key component of DeepAM is based on the multi-modal sequence to sequence learning architecture that aims to learn joint semantic representations of bilingual API sequences from big source code data. Experimental results indicate that DeepAM significantly increases the accuracy of API mappings as well as the number of API mappings when compared with the state-of-the-art approaches.

Download Full-text

Evaluation of recent advances in recommender systems on Arabic content

Journal Of Big Data ◽

10.1186/s40537-021-00420-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Mehdi Srifi ◽

Ahmed Oussous ◽

Ayoub Ait Lahcen ◽

Salma Mouline

Keyword(s):

Recommender Systems ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Recent Advances ◽

Research Gap ◽

Text Preprocessing

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.

Download Full-text

BioInstaller: a comprehensive R package to integrate bioinformatics resources

10.7287/peerj.preprints.27221v1 ◽

2018 ◽

Author(s):

Jianfeng Li ◽

Bowen Cui ◽

Yuting Dai ◽

Ling Bai ◽

Jinyan Huang

Keyword(s):

Source Code ◽

R Package ◽

Community Based ◽

Representational State Transfer ◽

State Transfer ◽

Application Programming ◽

Representational State ◽

Programming Interfaces ◽

Shiny Application ◽

R Functions

The number of bioinformatics resources, such as tools/scripts and databases are growing exponentially. This poses a great challenge for users to access, manage, and integrate the corresponding bioinformatics resources. To facilitate the request, we proposed a comprehensive R package, BioInstaller, which includes the R functions, Shiny application, and the HTTP representational state transfer (REST) application programming interfaces (APIs). We also established a community-based configuration pool to collect, access and share bioinformatics resources. The source code of BioInstaller is freely available at our lab website http://bioinfo.rjh.com.cn/labs/jhuang/tools/bioinstaller or popular package host GitHub at: https://github.com/JhuangLab/BioInstaller. Also, a docker image can be downloaded from DockerHub (https://hub.docker.com/r/bioinstaller).

Download Full-text

A Boosting Framework of Factorization Machine

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421590369 ◽

2021 ◽

pp. 2159036

Author(s):

Jun Zhou ◽

Longfei Li ◽

Ziqi Liu ◽

Chaochao Chen

Keyword(s):

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Low Rank ◽

Inner Product ◽

Adaptive Boosting ◽

Rank Matrix ◽

Factorization Machine ◽

Fixed Rank ◽

Low Rank Matrix

Recently, Factorization Machine (FM) has become more and more popular for recommendation systems due to its effectiveness in finding informative interactions between features. Usually, the weights for the interactions are learned as a low rank weight matrix, which is formulated as an inner product of two low rank matrices. This low rank matrix can help improve the generalization ability of Factorization Machine. However, to choose the rank properly, it usually needs to run the algorithm for many times using different ranks, which clearly is inefficient for some large-scale datasets. To alleviate this issue, we propose an Adaptive Boosting framework of Factorization Machine (AdaFM), which can adaptively search for proper ranks for different datasets without re-training. Instead of using a fixed rank for FM, the proposed algorithm will gradually increase its rank according to its performance until the performance does not grow. Extensive experiments are conducted to validate the proposed method on multiple large-scale datasets. The experimental results demonstrate that the proposed method can be more effective than the state-of-the-art Factorization Machines.

Download Full-text

RDFuzz: Accelerating Directed Fuzzing with Intertwined Schedule and Optimized Mutation

Mathematical Problems in Engineering ◽

10.1155/2020/7698916 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Jiaxi Ye ◽

Ruilin Li ◽

Bin Zhang

Keyword(s):

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Exploration And Exploitation ◽

Balance Problem ◽

Evaluation Strategy ◽

Testing Schedule ◽

Available Resources

Directed fuzzing is a practical technique, which concentrates its testing energy on the process toward the target code areas, while costing little on other unconcerned components. It is a promising way to make better use of available resources, especially in testing large-scale programs. However, by observing the state-of-the-art-directed fuzzing engine (AFLGo), we argue that there are two universal limitations, the balance problem between the exploration and the exploitation and the blindness in mutation toward the target code areas. In this paper, we present a new prototype RDFuzz to address these two limitations. In RDFuzz, we first introduce the frequency-guided strategy in the exploration and improve its accuracy by adopting the branch-level instead of the path-level frequency. Then, we introduce the input-distance-based evaluation strategy in the exploitation stage and present an optimized mutation to distinguish and protect the distance sensitive input content. Moreover, an intertwined testing schedule is leveraged to perform the exploration and exploitation in turn. We test RDFuzz on 7 benchmarks, and the experimental results demonstrate that RDFuzz is skilled at driving the program toward the target code areas, and it is not easily stuck by the balance problem of the exploration and the exploitation.

Download Full-text

Scene text removal via cascaded text stroke detection and erasing

Computational Visual Media ◽

10.1007/s41095-021-0242-8 ◽

2021 ◽

Vol 8 (2) ◽

pp. 273-287

Author(s):

Xuewei Bian ◽

Chaoqun Wang ◽

Weize Quan ◽

Juntao Ye ◽

Xiaopeng Zhang ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Processing Unit ◽

Final Model ◽

Scene Text ◽

End To End

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Download Full-text

Sim-DRS: a similarity-based dynamic resource scheduling algorithm for microservice-based web systems

PeerJ Computer Science ◽

10.7717/peerj-cs.824 ◽

2021 ◽

Vol 7 ◽

pp. e824

Author(s):

Yiren Li ◽

Tieke Li ◽

Pei Shen ◽

Liang Hao ◽

Wenjing Liu ◽

...

Keyword(s):

Large Scale ◽

Web Applications ◽

State Of The Art ◽

Scheduling Algorithm ◽

Resource Scheduling ◽

Experimental Results ◽

Heuristic Scheduling ◽

Dynamic Resource ◽

Web Systems

Microservice-based Web Systems (MWS), which provide a fundamental infrastructure for constructing large-scale cloud-based Web applications, are designed as a set of independent, small and modular microservices implementing individual tasks and communicating with messages. This microservice-based architecture offers great application scalability, but meanwhile incurs complex and reactive autoscaling actions that are performed dynamically and periodically based on current workloads. However, this problem has thus far remained largely unexplored. In this paper, we formulate a problem of Dynamic Resource Scheduling for Microservice-based Web Systems (DRS-MWS) and propose a similarity-based heuristic scheduling algorithm that aims to quickly find viable scheduling schemes by utilizing solutions to similar problems. The performance superiority of the proposed scheduling solution in comparison with three state-of-the-art algorithms is illustrated by experimental results generated through a well-known microservice benchmark on disparate computing nodes in public clouds.

Download Full-text

Mashup Service Recommendation Based on Usage History and Service Network

International Journal of Web Services Research ◽

10.4018/ijwsr.2013100104 ◽

2013 ◽

Vol 10 (4) ◽

pp. 82-101 ◽

Cited By ~ 9

Author(s):

Buqing Cao ◽

Jianxun Liu ◽

Mingdong Tang ◽

Zibin Zheng ◽

Guangrong Wang

Keyword(s):

Web Application ◽

Large Scale ◽

Web Applications ◽

Rapid Development ◽

Prototype System ◽

Service Recommendation ◽

Service Usage ◽

Service Network ◽

Application Programming ◽

Programming Interfaces

With the rapid development of Web2.0 and its related technologies, Mashup services (i.e., Web applications created by combining two or more Web APIs) are becoming a hot research topic. The explosion of Mashup services, especially the functionally similar or equivalent services, however, make services discovery more difficult than ever. In this paper, we present an approach to recommend Mashup services to users based on usage history and service network. This approach firstly extracts users' interests from their Mashup service usage history and builds a service network based on social relationships information among Mashup services, Web application programming interfaces (APIs) and their tags. The approach then leverages the target user's interest and the service social relationship to perform Mashup service recommendation. Large-scale experiments based on a real-world Mashup service dataset show that the authors' proposed approach can effectively recommend Mashup services to users with excellent performance. Moreover, a Mashup service recommendation prototype system is developed.

Download Full-text

Provisioning Converged Applications and Services via the Cloud

Intelligent Multimedia Technologies for Networking Applications ◽

10.4018/978-1-4666-2833-5.ch010 ◽

2013 ◽

pp. 248-269

Author(s):

Michael Adeyeye

Keyword(s):

Large Scale ◽

Hybrid Architecture ◽

Web Browsers ◽

Internet Applications ◽

Rich Internet Applications ◽

Content Sharing ◽

Huge Data ◽

Rich Media ◽

Application Programming ◽

Programming Interfaces

The cloud is becoming an atmosphere to store huge data and deploy massive applications. Using virtualization technologies, it is economical and feasible to provide testbeds in the cloud. The convergence of Next Generation (NG) networks and Internet-based applications may result in the deployment of future rich Internet applications and services in the cloud. This chapter shows the migration of mobility-enabled services to the cloud. It presents a SIP-based hybrid architecture for Web session mobility that offers content sharing and session handoff between Web browsers. The implemented system has recently evolved to a framework for developing different kinds of converged services over the Internet, which are similar to services offered by Google Wave and existing telephony Application Programming Interfaces (APIs). In addition, the work in this chapter is compared with those similar technologies. Lastly, the authors show efforts to migrate the SIP/HTTP application server to the cloud, which was necessitated by the need to include more functionalities (i.e., QoS and rich media support) as well as to provide large-scale deployment in a multi-domain scenario.

Download Full-text

Improving shift-reduce constituency parsing with large-scale unlabeled data

Natural Language Engineering ◽

10.1017/s1351324913000119 ◽

2013 ◽

Vol 21 (1) ◽

pp. 113-138 ◽

Cited By ~ 1

Author(s):

MUHUA ZHU ◽

JINGBO ZHU ◽

HUIZHEN WANG

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

State Of The Art ◽

Unlabeled Data ◽

Experimental Results ◽

Empirical Methods ◽

Part Of Speech

AbstractShift-reduce parsing has been studied extensively for diverse grammars due to the simplicity and running efficiency. However, in the field of constituency parsing, shift-reduce parsers lag behind state-of-the-art parsers. In this paper we propose a semi-supervised approach for advancing shift-reduce constituency parsing. First, we apply the uptraining approach (Petrov, S. et al. 2010. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 705–713) to improve part-of-speech taggers to provide better part-of-speech tags to subsequent shift-reduce parsers. Second, we enhance shift-reduce parsing models with novel features that are defined on lexical dependency information. Both stages depend on the use of large-scale unlabeled data. Experimental results show that the approach achieves overall improvements of 1.5 percent and 2.1 percent on English and Chinese data respectively. Moreover, the final parsing accuracies reach 90.9 percent and 82.2 percent respectively, which are comparable with the accuracy of state-of-the-art parsers.

Download Full-text

Low-Cost and Programmable CRC Implementation based on FPGA (Extended Version)

10.36227/techrxiv.12181494.v3 ◽

2020 ◽

Author(s):

Huan Liu ◽

Zhiliang Qiu ◽

Weitao Pan ◽

Jun Li ◽

Ling Zheng ◽

...

Keyword(s):

Resource Utilization ◽

Error Detection ◽

High Performance ◽

State Of The Art ◽

Low Cost ◽

Source Code ◽

Experimental Results ◽

Extended Version ◽

Cyclic Redundancy Check

Cyclic redundancy check (CRC) is a well-known error detection code that is widely used in Ethernet, PCIe, and other transmission protocols. The existing FPGA-based implementation solutions are faced with the problem of excessive resource utilization in high-performance scenarios. The padding zeros problem and the introduction of programmability further exacerbate this problem. In this brief, the stride-by-5 algorithm is proposed to achieve the optimal utilization of FPGA resources. The pipelining go back algorithm is proposed to solve the padding zeros problem. The method of reprogramming by HWICAP is proposed to realize programmability with a small and constant resource utilization. The experimental results show that the resource utilization of proposed non-segmented architecture is 80.7%-87.5% and 25.1%-46.2% lower than those of two state-of-the-art FPGA-based CRC implementations, and the proposed segmented architecture has a lower resource utilization by 81.7%-85.9% and 2.9%-20.8% compared wtih the two state-of-the-art architectures; meanwhile, the throughput and programmability are guaranteed. We made the source code available on GitHub.

Download Full-text