Con&Net: A Cross-Network Anchor Link Discovery Method Based on Embedding Representation

Xueyuan Wang; Hongpo Zhang; Zongmin Wang; Yaqiong Qiao; Jiangtao Ma; Honghua Dai

doi:10.1145/3469083

Con&Net: A Cross-Network Anchor Link Discovery Method Based on Embedding Representation

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3469083 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-18

Author(s):

Xueyuan Wang ◽

Hongpo Zhang ◽

Zongmin Wang ◽

Yaqiong Qiao ◽

Jiangtao Ma ◽

...

Keyword(s):

Area Under The Curve ◽

Research Problem ◽

Cosine Similarity ◽

Baseline Method ◽

Latent Space ◽

Link Discovery ◽

Cross Platform ◽

Auc Value ◽

The Stability ◽

Discovery Method

Cross-network anchor link discovery is an important research problem and has many applications in heterogeneous social network. Existing schemes of cross-network anchor link discovery can provide reasonable link discovery results, but the quality of these results depends on the features of the platform. Therefore, there is no theoretical guarantee to the stability. This article employs user embedding feature to model the relationship between cross-platform accounts, that is, the more similar the user embedding features are, the more similar the two accounts are. The similarity of user embedding features is determined by the distance of the user features in the latent space. Based on the user embedding features, this article proposes an embedding representation-based method Con&Net(Content and Network) to solve cross-network anchor link discovery problem. Con&Net combines the user’s profile features, user-generated content (UGC) features, and user’s social structure features to measure the similarity of two user accounts. Con&Net first trains the user’s profile features to get profile embedding. Then it trains the network structure of the nodes to get structure embedding. It connects the two features through vector concatenating, and calculates the cosine similarity of the vector based on the embedding vector. This cosine similarity is used to measure the similarity of the user accounts. Finally, Con&Net predicts the link based on similarity for account pairs across the two networks. A large number of experiments in Sina Weibo and Twitter networks show that the proposed method Con&Net is better than state-of-the-art method. The area under the curve (AUC) value of the receiver operating characteristic (ROC) curve predicted by the anchor link is 11% higher than the baseline method, and Precision@30 is 25% higher than the baseline method.

Download Full-text

Link Prediction via Sparse Gaussian Graphical Model

Mathematical Problems in Engineering ◽

10.1155/2016/7213432 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11

Author(s):

Liangliang Zhang ◽

Longqi Yang ◽

Guyu Hu ◽

Zhisong Pan ◽

Zhen Li

Keyword(s):

Link Prediction ◽

Graphical Model ◽

Area Under The Curve ◽

Superior Performance ◽

Gaussian Graphical Model ◽

Training Set ◽

Baseline Method ◽

Inverse Covariance Matrix ◽

Auc Value ◽

Real World Datasets

Link prediction is an important task in complex network analysis. Traditional link prediction methods are limited by network topology and lack of node property information, which makes predicting links challenging. In this study, we address link prediction using a sparse Gaussian graphical model and demonstrate its theoretical and practical effectiveness. In theory, link prediction is executed by estimating the inverse covariance matrix of samples to overcome information limits. The proposed method was evaluated with four small and four large real-world datasets. The experimental results show that the area under the curve (AUC) value obtained by the proposed method improved by an average of 3% and 12.5% compared to 13 mainstream similarity methods, respectively. This method outperforms the baseline method, and the prediction accuracy is superior to mainstream methods when using only 80% of the training set. The method also provides significantly higher AUC values when using only 60% in Dolphin and Taro datasets. Furthermore, the error rate of the proposed method demonstrates superior performance with all datasets compared to mainstream methods.

Download Full-text

External Validation and Test-Retest Reliability of Postpartum Bonding Questionnaire in Spanish Mothers

The Spanish Journal of Psychology ◽

10.1017/sjp.2021.44 ◽

2021 ◽

Vol 24 ◽

Author(s):

Anna Torres-Giménez ◽

Alba Roca-Lecumberri ◽

Bàrbara Sureda ◽

Susana Andrés-Perpiña ◽

Bruma Palacios-Hernández ◽

...

Keyword(s):

Mental Health ◽

Maternal Mental Health ◽

Characteristic Curve ◽

External Validation ◽

Area Under The Curve ◽

Retest Reliability ◽

Outpatient Unit ◽

Auc Value ◽

Postpartum Bonding ◽

Test Retest Reliability

Abstract The aim of the present study was to validate the Spanish Postpartum Bonding Questionnaire (PBQ) against external criteria of bonding disorder, as well as to establish its test-retest reliability. One hundred fifty-six postpartum women consecutively recruited from a perinatal mental health outpatient unit completed the PBQ at 4–6 weeks postpartum. Four weeks later, all mothers completed again the PBQ and were interviewed using the Birmingham Interview for Maternal Mental Health to establish the presence of a bonding disorder. Receiver operating characteristic curve analysis revealed an area under the curve (AUC) value for the PBQ total score of 0.93, 95% CI [0.88, 0.98], with the optimal cut-off of 13 for detecting bonding disorders (sensitivity: 92%, specificity: 87%). Optimal cut-off scores for each scale were also obtained. The test-retest reliability coefficients were moderate to good. Our data confirm the validity of PBQ for detecting bonding disorders in Spanish population.

Download Full-text

Mean Received Resources Meet Machine Learning Algorithms to Improve Link Prediction Methods

Information ◽

10.3390/info13010035 ◽

2022 ◽

Vol 13 (1) ◽

pp. 35

Author(s):

Jibouni Ayoub ◽

Dounia Lotfi ◽

Ahmed Hammouch

Keyword(s):

Machine Learning ◽

Link Prediction ◽

Learning Algorithms ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Actual State ◽

The Future ◽

Auc Value ◽

The Mean ◽

Analysis Of Social Networks

The analysis of social networks has attracted a lot of attention during the last two decades. These networks are dynamic: new links appear and disappear. Link prediction is the problem of inferring links that will appear in the future from the actual state of the network. We use information from nodes and edges and calculate the similarity between users. The more users are similar, the higher the probability of their connection in the future will be. The similarity metrics play an important role in the link prediction field. Due to their simplicity and flexibility, many authors have proposed several metrics such as Jaccard, AA, and Katz and evaluated them using the area under the curve (AUC). In this paper, we propose a new parameterized method to enhance the AUC value of the link prediction metrics by combining them with the mean received resources (MRRs). Experiments show that the proposed method improves the performance of the state-of-the-art metrics. Moreover, we used machine learning algorithms to classify links and confirm the efficiency of the proposed combination.

Download Full-text

Stability of MDS-UPDRS Motor Subtypes Over Three Years in Early Parkinson's Disease

Frontiers in Neurology ◽

10.3389/fneur.2021.704906 ◽

2021 ◽

Vol 12 ◽

Author(s):

Abhijeet K. Kohat ◽

Samuel Y. E. Ng ◽

Aidan S. Y. Wong ◽

Nicole S. Y. Chia ◽

Xinyi Choi ◽

...

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Rating Scale ◽

Area Under The Curve ◽

Baseline Factors ◽

Prospective Cohorts ◽

The Mean ◽

The Stability ◽

Early Parkinson’S Disease ◽

Levodopa Equivalent Daily Dose

Background: Various classifications have been proposed to subtype Parkinson's disease (PD) based on their motor phenotypes. However, the stability of these subtypes has not been properly evaluated.Objective: The goal of this study was to understand the distribution of PD motor subtypes, their stability over time, and baseline factors that predicted subtype stability.Methods: Participants (n = 170) from two prospective cohorts were included: the Early PD Longitudinal Singapore (PALS) study and the National Neuroscience Institute Movement Disorders Database. Early PD patients were classified into tremor-dominant (TD), postural instability and gait difficulty (PIGD), and indeterminate subtypes according to the Movement Disorder Society's Unified PD Rating Scale (MDS-UPDRS) criteria and clinically evaluated for three consecutive years.Results: At baseline, 60.6% patients were TD, 12.4% patients were indeterminate, and 27.1% patients were PIGD subtypes (p < 0.05). After 3 years, only 62% of patients in TD and 50% of patients in PIGD subtypes remained stable. The mean levodopa equivalent daily dose (LEDD) was higher in the PIGD subtype (276.92 ± 232.91 mg; p = 0.01). Lower LEDD [p < 0.05, odds ratio (OR) 0.99, 95% confidence interval (CI): 0.98–0.99] and higher TD/PIGD ratios (p < 0.05, OR 1.77, 95% CI: 1.29–2.43) were independent predictors of stability of TD subtype with an area under the curve (AUC) of 0.787 (95%CI: 0.669–0.876), sensitivity = 57.8%, and specificity = 89.7%.Conclusion: Only 50–62% of PD motor subtypes as defined by MDS-UPDRS remained stable over 3 years. TD/PIGD ratio and baseline LEDD were independent predictors for TD subtype stability over 3 years.

Download Full-text

Peritoneal Effluent MicroRNA Profile in Encapsulating Peritoneal Sclerosis

10.21203/rs.3.rs-832435/v2 ◽

2021 ◽

Author(s):

Kun-Lin Wu ◽

Che-Yi Chou ◽

An-Lun Li ◽

Chien-Lung Chen ◽

Jen-chieh Tsai ◽

...

Keyword(s):

Real Time ◽

Mirna Expression ◽

Real Time Pcr ◽

Clinical Characteristics ◽

Area Under The Curve ◽

Noninvasive Test ◽

Candidate Mirnas ◽

Microrna Profile ◽

Auc Value ◽

Diagnostic Technologies

Abstract Encapsulating peritoneal sclerosis (EPS) is a catastrophic complication of chronic peritoneal dialysis (PD). Late diagnosis is associated with high mortality. With the advancement of new diagnostic technologies, such as microRNA (miRNA), we attempted to develop a noninvasive test to assist in the diagnosis of EPS. The eight-hour PD effluents were collected from 71 non-EPS and 56 EPS patients. The screening set included 28 samples (20 of non-EPS vs. 8 of EPS). After analyzing the ratio values of two miRNA expression levels from the high-throughput real-time PCR-array of 377 miRNAs, eight candidate miRNAs were selected. The prediction model was conducted using 127 samples (71 of non-EPS vs 56 of EPS) to produce an area under the curve (AUC) value of the miRNA classifier. Candidate miRNAs were also verified by single real-time PCR. The ratios of the five miRNAs with the top five ROC values were selected to calculate the combined AUC by multiple logistic regression. The AUC value to detect EPS with the five miRNA ratios was 0.8929 with an accuracy of 78.7%. The accuracy of the EPS diagnosis was further optimized to 94.1% after considering clinical characteristics (AUC value 0.9931). A signature-based model of clinical characteristics and miRNA expression in PD effluents can efficiently assist in the diagnosis of EPS, thus preventing the catastrophic prognosis.

Download Full-text

Histopathological analysis of retrieved thrombi from patients with acute ischemic stroke with malignant tumors

Journal of NeuroInterventional Surgery ◽

10.1136/neurintsurg-2020-017195 ◽

2021 ◽

pp. neurintsurg-2020-017195

Author(s):

Yuko Kataoka ◽

Kazutaka Sonoda ◽

Jun C Takahashi ◽

Hatsue Ishibashi-Ueda ◽

Kazunori Toyoda ◽

...

Keyword(s):

Ischemic Stroke ◽

Malignant Tumors ◽

Characteristic Curve ◽

Cerebral Arteries ◽

Treatment Strategies ◽

Area Under The Curve ◽

Histopathological Analysis ◽

Thrombotic Risk ◽

Patients With Cancer ◽

Auc Value

BackgroundThe procoagulant state in cancer increases the thrombotic risk, and underlying cancer could affect treatment strategies and outcomes in patients with ischemic stroke. However, the histopathological characteristics of retrieved thrombi in patients with cancer have not been well studied. This study aimed to assess the histopathological difference between thrombi in patients with and without cancer.MethodsWe studied consecutive patients with acute major cerebral artery occlusion who were treated with endovascular therapy between October 2010 and December 2016 in our single-center registry. The retrieved thrombi were histopathologically investigated with hematoxylin and eosin and Masson’s trichrome staining. The organization and proportions of erythrocyte and fibrin/platelet components were studied using a lattice composed of 10×10 squares.ResultsOf the 180 patients studied, 17 (8 women, age 76.5±11.5 years) had cancer and 163 (69 women, age 74.1±11.2 years) did not. Those with cancer had a higher proportion of fibrin/platelets (56.6±27.4% vs 40.1±23.9%, p=0.008), a smaller proportion of erythrocytes (42.1±28.3% vs 57.5±25.1%, p=0.019), and higher serum D-dimer levels (5.9±8.2 vs 2.4±4.3 mg/dL, p=0.005) compared with the non-cancer cases. Receiver operating characteristic curve analysis showed the cut-off ratio of fibrin/platelet components related to cancer was 55.7% with a sensitivity of 74.8%, specificity 58.8% and area under the curve (AUC) value of 0.67 (95% CI 0.53 to 0.81), and the cut-off ratio of erythrocyte components was 44.7% with a sensitivity of 71.2%, specificity 58.9% and AUC value of 0.66 (95% CI 0.51 to 0.80).ConclusionsThromboemboli of major cerebral arteries in patients with cancer were mainly composed of fibrin/platelet-rich components.

Download Full-text

NMR-Based Metabolomic Profiling of Urine: Evaluation for Application in Prostate Cancer Detection

Natural Product Communications ◽

10.1177/1934578x19849978 ◽

2019 ◽

Vol 14 (5) ◽

pp. 1934578X1984997 ◽

Cited By ~ 2

Author(s):

Neil MacKinnon ◽

Wencheng Ge ◽

Peisong Han ◽

Javed Siddiqui ◽

John T. Wei ◽

...

Keyword(s):

Prostate Cancer ◽

Specific Antigen ◽

Area Under The Curve ◽

Clinical Test ◽

H Nmr ◽

Metabolomic Profiling ◽

Potential Biomarker ◽

Noninvasive Biomarker ◽

Auc Value ◽

Selection Of

Detection of prostate cancer (PCa) and distinguishing indolent versus aggressive forms of the disease is a critical clinical challenge. The current clinical test is circulating prostate-specific antigen levels, which faces particular challenges in cancer diagnosis in the range of 4 to 10 ng/mL. Thus, a concerted effort toward building a noninvasive biomarker panel has developed. In this report, the hypothesis that nuclear magnetic resonance (NMR)-derived metabolomic profiles measured in the urine of biopsy-negative versus biopsy-positive individuals would nominate a selection of potential biomarker signals was investigated. 1H NMR spectra of urine samples from 317 individuals (111 biopsy-negative, 206 biopsy-positive) were analyzed. A double cross-validation partial least squares-discriminant analysis modeling technique was utilized to nominate signals capable of distinguishing the two classes. It was observed that after variable selection protocols were applied, a subset of 29 variables produced an area under the curve (AUC) value of 0.94 after logistic regression analysis, whereas a “master list” of 18 variables produced a receiver operating characteristic ROC) AUC of 0.80. As proof of principle, this study demonstrates the utility of NMR-based metabolomic profiling of urine biospecimens in the nomination of PCa-specific biomarker signals and suggests that further investigation is certainly warranted.

Download Full-text

Prediction of Sudden Cardiac Death Risk with a Support Vector Machine Based on Heart Rate Variability and Heartprint Indices

Sensors ◽

10.3390/s20195483 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5483

Author(s):

Marisol Martinez-Alanis ◽

Erik Bojorges-Valdez ◽

Niels Wessel ◽

Claudia Lerma

Keyword(s):

Heart Rate ◽

Support Vector Machine ◽

Heart Rate Variability ◽

Sudden Cardiac Death ◽

Cardiac Death ◽

Area Under The Curve ◽

Premature Ventricular Complex ◽

Support Vector ◽

Auc Value ◽

Sudden Cardiac Death Risk

Most methods for sudden cardiac death (SCD) prediction require long-term (24 h) electrocardiogram recordings to measure heart rate variability (HRV) indices or premature ventricular complex indices (with the heartprint method). This work aimed to identify the best combinations of HRV and heartprint indices for predicting SCD based on short-term recordings (1000 heartbeats) through a support vector machine (SVM). Eleven HRV indices and five heartprint indices were measured in 135 pairs of recordings (one before an SCD episode and another without SCD as control). SVMs (defined with a radial basis function kernel with hyperparameter optimization) were trained with this dataset to identify the 13 best combinations of indices systematically. Through 10-fold cross-validation, the best area under the curve (AUC) value as a function of γ (gamma) and cost was identified. The predictive value of the identified combinations had AUCs between 0.80 and 0.86 and accuracies between 80 and 86%. Further SVM performance tests on a different dataset of 68 recordings (33 before SCD and 35 as control) showed AUC = 0.68 and accuracy = 67% for the best combination. The developed SVM may be useful for preventing imminent SCD through early warning based on electrocardiogram (ECG) or heart rate monitoring.

Download Full-text

Predictive value of relative fat mass algorithm for incident hypertension: a 6-year prospective study in Chinese population

BMJ Open ◽

10.1136/bmjopen-2020-038420 ◽

2020 ◽

Vol 10 (10) ◽

pp. e038420

Author(s):

Peng Yu ◽

Teng Huang ◽

Senlin Hu ◽

Xuefeng Yu

Keyword(s):

Prospective Study ◽

Fat Mass ◽

Chinese Population ◽

Predictive Power ◽

Roc Analysis ◽

Statistical Significance ◽

Area Under The Curve ◽

Predictive Ability ◽

Incident Hypertension ◽

Auc Value

ObjectivesIndividuals with obesity especially excessive visceral adiposity have high risk for incident hypertension. Recently, a new algorithm named relative fat mass (RFM) was introduced to define obesity. Our aim was to investigate whether it can predict hypertension in Chinese population and to compare its predictive power with traditional indices including body mass index (BMI), waist circumference (WC) and waist-to-height ratio (WHtR).DesignA 6-year prospective study.SettingNine provinces (Hei Long Jiang, Liao Ning, Jiang Su, Shan Dong, He Nan, Hu Bei, Hu Nan, Guang Xi and Gui Zhou) in China.ParticipantsThose without hypertension in 2009 survey and respond in 2015 survey.InterventionLogistic regression were performed to investigate the association between RFM and incident hypertension. Receiver operating characteristic (ROC) analysis was performed to compare the predictive ability of these indices and define their optimal cut-off values.Main outcome measuresIncident hypertension in 2015.ResultsThe prevalence of incident hypertension in 2015 based on RFM quartiles were 14.8%, 21.2%, 26.8% and 35.2%, respectively (p for trend <0.001). In overall population, the OR for the highest quartile compared with the lowest quartile for RFM was 2.032 (1.567–2.634) in the fully adjusted model. In ROC analysis, RFM and WHtR had the highest area under the curve (AUC) value in both sexes but did not show statistical significance when compared with AUC value of BMI and WC in men and AUC value of WC in women. The performance of the prediction model based on RFM was comparable to that of BMI, WC or WHtR.ConclusionsRFM can be a powerful indictor for predicting incident hypertension in Chinese population, but it does not show superiority over BMI, WC and WHtR in predictive power.

Download Full-text

From Hotel Reviews to City Similarities: A Unified Latent-Space Model

Electronics ◽

10.3390/electronics9010197 ◽

2020 ◽

Vol 9 (1) ◽

pp. 197 ◽

Cited By ~ 1

Author(s):

Luca Cagliero ◽

Moreno La Quatra ◽

Daniele Apiletti

Keyword(s):

Research Problem ◽

Hospitality Management ◽

User Generated Content ◽

Urban Context ◽

Large Cities ◽

Effective Strategies ◽

Point Of Interest ◽

Latent Space ◽

Challenging Research ◽

Textual Form

A large portion of user-generated content published on the Web consists of opinions and reviews on products, services, and places in textual form. Many travellers and tourists routinely rely on such content to drive their choices, shaping trips and visits to any place on earth, and specifically to select hotels in large cities. In the context of hospitality management, a challenging research problem is to identify effective strategies to explain hotel reviews and ratings and their correlation with the urban context. Under this umbrella, the paper investigates the use of sentence-based embedding models to deeply explore the similarities and dissimilarities between cities in terms of the corresponding hotel reviews and the surrounding points of interests. Reviews and point of interest (POI) descriptions are jointly modelled in a unified latent space, allowing us to deeply investigate the dependencies between guest feedbacks and the hotel neighborhood at different aggregation levels. The experiments performed on public TripAdvisor hotel-review datasets confirm the applicability and effectiveness of the proposed approach.

Download Full-text