scholarly journals FairMixRep: Self-supervised Robust Representation Learning for Heterogeneous Data with Fairness constraints

Author(s):  
Souradip Chakraborty ◽  
Ekansh Verma ◽  
Saswata Sahoo ◽  
Jyotishka Datta
2020 ◽  
Author(s):  
zhenyan Ji ◽  
Chun Yang ◽  
HUIHUI Wang ◽  
JOS´E ENRIQUE ARMEND´ARIZ-I˜NIGO3 ◽  
MARTA ARCE-URRIZA4

Abstract Recommendation systems are often used to solve the problem of information overload on the Internet. Many types of data can be used for recommendation, and fusing different types of data can make recommendation more accurate. Most existing fusion recommendation models simply combine the recommendation results from different data instead of fully fusing multi-source heterogeneous data to make recommendations. Furthermore, users’ choices are usually affected by their direct and even indirect friends’ preferences. This paper proposes a hybrid recommendation model BRScS (an acronym for BPR-Review-Score-Social). It fully fuses social data, score, and review together, uses improved BPR model to optimize the ranking, and trains them in a joint representation learning framework to get the top-N recommendations. User trust model is used to introduce social relationships into the rating and review data, PV-DBOW model is used to process the review data, and fully connected neural network is used to process the rating data. Experiments on Yelp public dataset show that the BRScS algorithm proposed outperforms other recommendation algorithms such as BRSc, UserCF, HRSc. BRScS model is also scalable and can fuse new type of data easily.


2020 ◽  
Author(s):  
zhenyan Ji ◽  
Chun Yang ◽  
HUIHUI Wang ◽  
JOS´E ENRIQUE ARMEND´ARIZ-I˜NIGO3 ◽  
MARTA ARCE-URRIZA4

Abstract Recommendation systems are often used to solve the problem of information overload on the Internet. Many types of data can be used for recommendation, and fusing different types of data can make recommendation more accurate. Most existing fusion recommendation models simply combine the recommendation results from different data instead of fully fusing multi-source heterogeneous data to make recommendations. Furthermore, users’ choices are usually affected by their direct and even indirect friends’ preferences. This paper proposes a hybrid recommendation model BRScS (an acronym for BPR-Review-Score-Social). It fully fuses social data, score, and review together, uses improved BPR model to optimize the ranking, and trains them in a joint representation learning framework to get the top-N recommendations. User trust model is used to introduce social relationships into the rating and review data, PV-DBOW model is used to process the review data, and fully connected neural network is used to process the rating data. Experiments on Yelp public dataset show that the BRScS algorithm proposed outperforms other recommendation algorithms such as BRSc, UserCF, HRSc. BRScS model is also scalable and can fuse new type of data easily.


2021 ◽  
Author(s):  
Sajit Kumar ◽  
Alicia Nanelia Tan Li Shi ◽  
Ragunathan Mariappan ◽  
Adithya Rajagopal ◽  
Vaibhav Rajan

BACKGROUND Patient Representation Learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images or graphs. Most previous techniques have used neural network based autoencoders to learn patient representations, primarily from clinical notes in Electronic Medical Records (EMR). Knowledge Graphs (KG), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature, and provide complementary information to EMR data that have been found to provide valuable predictive signals. OBJECTIVE We evaluate the efficacy of Collective Matrix Factorization (CMF) - both classical variants and a recent neural architecture called Deep CMF (DCMF) - in integrating heterogeneous data sources from EMR and KG to obtain patient representations for Clinical Decision Support Tasks. METHODS Using a recent formulation of obtaining graph representations through matrix factorization, within the context of CMF, we infuse auxiliary information during patient representation learning. We also extend the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predict. We compare the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluate patient representation learning using CMF-based methods and autoencoders for two clinical decision support tasks on a large EMR dataset. RESULTS Our experiments show that DCMF provides a seamless way to integrate multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable to that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous non-neural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance. CONCLUSIONS Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources, and to combine information from EMR data and Knowledge Graphs. Further, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations.


2020 ◽  
Vol 36 (15) ◽  
pp. 4248-4254 ◽  
Author(s):  
Nazia Fatima ◽  
Luis Rueda

Abstract Motivation One of the main challenges in applying graph convolutional neural networks (CNNs) on gene-interaction data is the lack of understanding of the vector space to which they belong, and also the inherent difficulties involved in representing those interactions on a significantly lower dimension, viz Euclidean spaces. The challenge becomes more prevalent when dealing with various types of heterogeneous data. We introduce a systematic, generalized method, called iSOM-GSN, used to transform ‘multi-omic’ data with higher dimensions onto a 2D grid. Afterwards, we apply a CNN to predict disease states of various types. Based on the idea of Kohonen’s self-organizing map, we generate a 2D grid for each sample for a given set of genes that represent a gene similarity network. Results We have tested the model to predict breast and prostate cancer using gene expression, DNA methylation and copy number alteration. Prediction accuracies in the 94–98% range were obtained for tumor stages of breast cancer and calculated Gleason scores of prostate cancer with just 14 input genes for both cases. The scheme not only outputs nearly perfect classification accuracy, but also provides an enhanced scheme for representation learning, visualization, dimensionality reduction and interpretation of multi-omic data. Availability and implementation The source code and sample data are available via a Github project at https://github.com/NaziaFatima/iSOM_GSN. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Minh C. Phan ◽  
Aixin Sun ◽  
Yi Tay

Sign in / Sign up

Export Citation Format

Share Document