scholarly journals TLGP: a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain

2021 ◽  
Vol 22 (S9) ◽  
Author(s):  
Yan Wang ◽  
Zuheng Xia ◽  
Jingjing Deng ◽  
Xianghua Xie ◽  
Maoguo Gong ◽  
...  

Abstract Background Gene prioritization (gene ranking) aims to obtain the centrality of genes, which is critical for cancer diagnosis and therapy since keys genes correspond to the biomarkers or targets of drugs. Great efforts have been devoted to the gene ranking problem by exploring the similarity between candidate and known disease-causing genes. However, when the number of disease-causing genes is limited, they are not applicable largely due to the low accuracy. Actually, the number of disease-causing genes for cancers, particularly for these rare cancers, are really limited. Therefore, there is a critical needed to design effective and efficient algorithms for gene ranking with limited prior disease-causing genes. Results In this study, we propose a transfer learning based algorithm for gene prioritization (called TLGP) in the cancer (target domain) without disease-causing genes by transferring knowledge from other cancers (source domain). The underlying assumption is that knowledge shared by similar cancers improves the accuracy of gene prioritization. Specifically, TLGP first quantifies the similarity between the target and source domain by calculating the affinity matrix for genes. Then, TLGP automatically learns a fusion network for the target cancer by fusing affinity matrix, pathogenic genes and genomic data of source cancers. Finally, genes in the target cancer are prioritized. The experimental results indicate that the learnt fusion network is more reliable than gene co-expression network, implying that transferring knowledge from other cancers improves the accuracy of network construction. Moreover, TLGP outperforms state-of-the-art approaches in terms of accuracy, improving at least 5%. Conclusion The proposed model and method provide an effective and efficient strategy for gene ranking by integrating genomic data from various cancers.

2018 ◽  
Vol 27 (06) ◽  
pp. 1850022
Author(s):  
Karl R. Weiss ◽  
Taghi M. Khoshkoftaar

A transfer learning environment is characterized by not having sufficient labeled training data from the domain of interest (target domain) to build a high-performing machine learner. Transfer learning algorithms use labeled data from an alternate domain (source domain), that is similar to the target domain, to build high-performing learners. The design of a transfer learning algorithm is typically comprised of a domain adaptation step following by a learning step. The domain adaptation step attempts to align the distribution differences between the source domain and the target domain. Then, the aligned data from the domain adaptation step is used in the learning step, which is typically implemented with a traditional machine learning algorithm. Our research studies the impact of the learning step on the performance of various transfer learning algorithms. In our experiment, we use five unique domain adaptation methods coupled with seven different traditional machine learning methods to create 35 different transfer learning algorithms. We perform comparative performance analyses of the 35 transfer learning algorithms, along with the seven stand-alone traditional machine learning methods. This research will aid machine learning practitioners in the algorithm selection process for a transfer learning environment in the absence of reliable validation techniques.


2016 ◽  
Vol 2016 ◽  
pp. 1-9
Author(s):  
Haijun Zhang ◽  
Bo Zhang ◽  
Zhoujun Li ◽  
Guicheng Shen ◽  
Liping Tian

In a real e-commerce website, usually only a small number of users will give ratings to the items they purchased, and this can lead to the very sparse user-item rating data. The data sparsity issue will greatly limit the recommendation performance of most recommendation algorithms. However, a user may register accounts in many e-commerce websites. If such users’ historical purchasing data on these websites can be integrated, the recommendation performance could be improved. But it is difficult to align the users and items between these websites, and thus how to effectively borrow the users’ rating data of one website (source domain) to help improve the recommendation performance of another website (target domain) is very challenging. To this end, this paper extended the traditional one-dimensional psychometrics model to multidimension. The extended model can effectively capture users’ multiple interests. Based on this multidimensional psychometrics model, we further propose a novel transfer learning algorithm. It can effectively transfer users’ rating preferences from the source domain to the target domain. Experimental results show that the proposed method can significantly improve the recommendation performance.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Jun He ◽  
Xiang Li ◽  
Yong Chen ◽  
Danfeng Chen ◽  
Jing Guo ◽  
...  

In mechanical fault diagnosis, it is impossible to collect massive labeled samples with the same distribution in real industry. Transfer learning, a promising method, is usually used to address the critical problem. However, as the number of samples increases, the interdomain distribution discrepancy measurement of the existing method has a higher computational complexity, which may make the generalization ability of the method worse. To solve the problem, we propose a deep transfer learning method based on 1D-CNN for rolling bearing fault diagnosis. First, 1-dimension convolutional neural network (1D-CNN), as the basic framework, is used to extract features from vibration signal. The CORrelation ALignment (CORAL) is employed to minimize marginal distribution discrepancy between the source domain and target domain. Then, the cross-entropy loss function and Adam optimizer are used to minimize the classification errors and the second-order statistics of feature distance between the source domain and target domain, respectively. Finally, based on the bearing datasets of Case Western Reserve University and Jiangnan University, seven transfer fault diagnosis comparison experiments are carried out. The results show that our method has better performance.


2020 ◽  
Author(s):  
Eliseu Guimarães ◽  
Jonnathan Carvalho ◽  
Aline Paes ◽  
Alexandre Plastino

Sentiment analysis on social media data can be a challenging task, among other reasons, because labeled data for training is not always available. Transfer learning approaches address this problem by leveraging a labeled source domain to obtain a model for a target domain that is different but related to the source domain. However, the question that arises is how to choose proper source data for training the target classifier, which can be made considering the similarity between source and target data using distance metrics. This article investigates the relation between these distance metrics and the classifiers’ performance. For this purpose, we propose to evaluate four metrics combined with distinct dataset representations. Computational experiments, conducted in the Twitter sentiment analysis scenario, showed that the cosine similarity metric combined with bag-of-words normalized with term frequency-inverse document frequency presented the best results in terms of predictive power, outperforming even the classifiers trained with the target dataset in many cases.


2020 ◽  
Vol 34 (05) ◽  
pp. 7830-7838 ◽  
Author(s):  
Han Guo ◽  
Ramakanth Pasunuru ◽  
Mohit Bansal

Domain adaptation performance of a learning algorithm on a target domain is a function of its source domain error and a divergence measure between the data distribution of these two domains. We present a study of various distance-based measures in the context of NLP tasks, that characterize the dissimilarity between domains based on sample estimates. We first conduct analysis experiments to show which of these distance measures can best differentiate samples from same versus different domains, and are correlated with empirical results. Next, we develop a DistanceNet model which uses these distance measures, or a mixture of these distance measures, as an additional loss function to be minimized jointly with the task's loss function, so as to achieve better unsupervised domain adaptation. Finally, we extend this model to a novel DistanceNet-Bandit model, which employs a multi-armed bandit controller to dynamically switch between multiple source domains and allow the model to learn an optimal trajectory and mixture of domains for transfer to the low-resource target domain. We conduct experiments on popular sentiment analysis datasets with several diverse domains and show that our DistanceNet model, as well as its dynamic bandit variant, can outperform competitive baselines in the context of unsupervised domain adaptation.


Sensors ◽  
2019 ◽  
Vol 19 (18) ◽  
pp. 3992 ◽  
Author(s):  
Jingmei Li ◽  
Weifei Wu ◽  
Di Xue ◽  
Peng Gao

Transfer learning can enhance classification performance of a target domain with insufficient training data by utilizing knowledge relating to the target domain from source domain. Nowadays, it is common to see two or more source domains available for knowledge transfer, which can improve performance of learning tasks in the target domain. However, the classification performance of the target domain decreases due to mismatching of probability distribution. Recent studies have shown that deep learning can build deep structures by extracting more effective features to resist the mismatching. In this paper, we propose a new multi-source deep transfer neural network algorithm, MultiDTNN, based on convolutional neural network and multi-source transfer learning. In MultiDTNN, joint probability distribution adaptation (JPDA) is used for reducing the mismatching between source and target domains to enhance features transferability of the source domain in deep neural networks. Then, the convolutional neural network is trained by utilizing the datasets of each source and target domain to obtain a set of classifiers. Finally, the designed selection strategy selects classifier with the smallest classification error on the target domain from the set to assemble the MultiDTNN framework. The effectiveness of the proposed MultiDTNN is verified by comparing it with other state-of-the-art deep transfer learning on three datasets.


2019 ◽  
Vol 16 (2) ◽  
pp. 172988141984086 ◽  
Author(s):  
Chuanqi Tan ◽  
Fuchun Sun ◽  
Bin Fang ◽  
Tao Kong ◽  
Wenchang Zhang

The brain–computer interface-based rehabilitation robot has quickly become a very important research area due to its natural interaction. One of the most important problems in brain–computer interface is that large-scale annotated electroencephalography data sets required by advanced classifiers are almost impossible to acquire because biological data acquisition is challenging and quality annotation is costly. Transfer learning relaxes the hypothesis that the training data must be independent and identically distributed with the test data. It can be considered a powerful tool for solving the problem of insufficient training data. There are two basic issues with transfer learning, under transfer and negative transfer. We proposed a novel brain–computer interface framework by using autoencoder-based transfer learning, which includes three main components: an autoencoder framework, a joint adversarial network, and a regularized manifold constraint. The autoencoder framework automatically encodes and reconstructs data from source and target domains and forces the neural network to learn to represent these domains reliably. The joint adversarial network aims to force the network to learn to encode more appropriately for the source domain and target domain simultaneously, thereby overcoming the problem of under transfer. The regularized manifold constraint aims to avoid the problem of negative transfer by avoiding geometric manifold structure in the target domain being destroyed by the source domain. Experiments show that the brain–computer interface framework proposed by us can achieve better results than state-of-the-art approaches in electroencephalography signal classification tasks. This is helpful in aiding our rehabilitation robot to understand the intention of patients and can help patients to carry out rehabilitation exercises effectively.


2020 ◽  
Vol 12 (1) ◽  
pp. 8
Author(s):  
Peng (Edward) Wang ◽  
Matthew Russell

Given its demonstrated ability in analyzing and revealing patterns underlying data, Deep Learning (DL) has been increasingly investigated to complement physics-based models in various aspects of smart manufacturing, such as machine condition monitoring and fault diagnosis, complex manufacturing process modeling, and quality inspection. However, successful implementation of DL techniques relies greatly on the amount, variety, and veracity of data for robust network training. Also, the distributions of data used for network training and application should be identical to avoid the internal covariance shift problem that reduces the network performance applicability. As a promising solution to address these challenges, Transfer Learning (TL) enables DL networks trained on a source domain and task to be applied to a separate target domain and task. This paper presents a domain adversarial TL approach, based upon the concepts of generative adversarial networks. In this method, the optimizer seeks to minimize the loss (i.e., regression or classification accuracy) across the labeled training examples from the source domain while maximizing the loss of the domain classifier across the source and target data sets (i.e., maximizing the similarity of source and target features). The developed domain adversarial TL method has been implemented on a 1-D CNN backbone network and evaluated for prediction of tool wear propagation, using NASA's milling dataset. Performance has been compared to other TL techniques, and the results indicate that domain adversarial TL can successfully allow DL models trained on certain scenarios to be applied to new target tasks.


2020 ◽  
Author(s):  
Rodrigo Azevedo Santos ◽  
Aline Paes ◽  
Gerson Zaverucha

Statistical machine learning algorithms usually assume that there is considerably-size data to train the models. However, they would fail in addressing domains where data is difficult or expensive to obtain. Transfer learning has emerged to address this problem of learning from scarce data by relying on a model learned in a source domain where data is easy to obtain to be a starting point for the target domain. On the other hand, real-world data contains objects and their relations, usually gathered from noisy environment. Finding patterns through such uncertain relational data has been the focus of the Statistical Relational Learning (SRL) area. Thus, to address domains with scarce, relational, and uncertain data, in this paper, we propose TreeBoostler, an algorithm that transfers the SRL state-of-the-art Boosted Relational Dependency Networks learned in a source domain to the target domain. TreeBoostler first finds a mapping between pairs of predicates to accommodate the additive trees into the target vocabulary. After, it employs two theory revision operators devised to handle incorrect relational regression trees aiming at improving the performance of the mapped trees. In the experiments presented in this paper, TreeBoostler has successfully transferred knowledge among several distinct domains. Moreover, it performs comparably or better than learning from scratch methods in terms of accuracy and outperforms a transfer learning approach in terms of accuracy and runtime.


Author(s):  
Wenhao Jiang ◽  
Cheng Deng ◽  
Wei Liu ◽  
Feiping Nie ◽  
Fu-lai Chung ◽  
...  

Domain adaptation problems arise in a variety of applications, where a training dataset from the source domain and a test dataset from the target domain typically follow different distributions. The primary difficulty in designing effective learning models to solve such problems lies in how to bridge the gap between the source and target distributions. In this paper, we provide comprehensive analysis of feature learning algorithms used in conjunction with linear classifiers for domain adaptation. Our analysis shows that in order to achieve good adaptation performance, the second moments of the source domain distribution and target domain distribution should be similar. Based on our new analysis, a novel extremely easy feature learning algorithm for domain adaptation is proposed. Furthermore, our algorithm is extended by leveraging multiple layers, leading to another feature learning algorithm. We evaluate the effectiveness of the proposed algorithms in terms of domain adaptation tasks on Amazon review and spam datasets from the ECML/PKDD 2006 discovery challenge.


Sign in / Sign up

Export Citation Format

Share Document