EXTENSIONS TO THE K-AMH ALGORITHM FOR NUMERICAL CLUSTERING

Journal of Information and Communication Technology ◽

10.32890/jict2018.17.4.8272 ◽

2018 ◽

Author(s):

Ali Seman ◽

Azizian Mohd Sapawi

Keyword(s):

Real World ◽

Confidence Level ◽

Categorical Data ◽

Euclidean Distance ◽

P Value ◽

Accuracy Score ◽

Original Algorithm ◽

Real World Datasets ◽

The Cost ◽

Numerical Clustering

The k-AMH algorithm has been proven efficient in clustering categorical datasets. It can also be used to cluster numerical values with minimum modification to the original algorithm. In this paper, we present two algorithms that extend the k-AMH algorithm to the clustering of numerical values. The original k-AMH algorithm for categorical values uses a simple matching dissimilarity measure, but for numerical values it uses Euclidean distance. The first extension to the k-AMH algorithm, denoted k-AMH Numeric I, enables it to cluster numerical values in a fashion similar to k-AMH for categorical data. The second extension, k-AMH Numeric II, adopts the cost function of the fuzzy k-Means algorithm together with Euclidean distance, and has demonstrated performance similar to that of k-AMH Numeric I. The clustering performance of the two algorithms was evaluated on six real-world datasets against a benchmark algorithm, the fuzzy k-Means algorithm. The results obtained indicate that the two algorithms are as efficient as the fuzzy k-Means algorithm when clustering numerical values. Further, on an ANOVA test, k-AMH Numeric I obtained the highest accuracy score of 0.69 for the six datasets combined with p-value less than 0.01, indicating a 95% confidence level. The experimental results prove that the k-AMH Numeric I and k-AMH Numeric II algorithms can be effectively used for numerical clustering. The significance of this study lies in that the k-AMH numeric algorithms have been demonstrated as potential solutions for clustering numerical objects.

Download Full-text

Toward Fair Recommendation in Two-sided Platforms

ACM Transactions on the Web ◽

10.1145/3503624 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-34

Author(s):

Arpita Biswas ◽

Gourab K. Patro ◽

Niloy Ganguly ◽

Krishna P. Gummadi ◽

Abhijnan Chakraborty

Keyword(s):

Customer Satisfaction ◽

Real World ◽

Computation Time ◽

Well Being ◽

Personalized Recommendation ◽

Indivisible Goods ◽

Goods And Services ◽

Online Platforms ◽

Real World Datasets ◽

The Cost

Many online platforms today (such as Amazon, Netflix, Spotify, LinkedIn, and AirBnB) can be thought of as two-sided markets with producers and customers of goods and services. Traditionally, recommendation services in these platforms have focused on maximizing customer satisfaction by tailoring the results according to the personalized preferences of individual customers. However, our investigation reinforces the fact that such customer-centric design of these services may lead to unfair distribution of exposure to the producers, which may adversely impact their well-being. However, a pure producer-centric design might become unfair to the customers. As more and more people are depending on such platforms to earn a living, it is important to ensure fairness to both producers and customers. In this work, by mapping a fair personalized recommendation problem to a constrained version of the problem of fairly allocating indivisible goods, we propose to provide fairness guarantees for both sides. Formally, our proposed FairRec algorithm guarantees Maxi-Min Share of exposure for the producers, and Envy-Free up to One Item fairness for the customers. Extensive evaluations over multiple real-world datasets show the effectiveness of FairRec in ensuring two-sided fairness while incurring a marginal loss in overall recommendation quality. Finally, we present a modification of FairRec (named as FairRecPlus ) that at the cost of additional computation time, improves the recommendation performance for the customers, while maintaining the same fairness guarantees.

Download Full-text

A Bandit Approach to Maximum Inner Product Search

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014376 ◽

2019 ◽

Vol 33 ◽

pp. 4376-4383

Author(s):

Rui Liu ◽

Tianyi Wu ◽

Barzan Mozafari

Keyword(s):

Real World ◽

State Of The Art ◽

Linear Time ◽

Identification Problem ◽

Approximate Algorithm ◽

Inner Product ◽

Approximate Algorithms ◽

Product Search ◽

Real World Datasets ◽

The Cost

There has been substantial research on sub-linear time approximate algorithms for Maximum Inner Product Search (MIPS). To achieve fast query time, state-of-the-art techniques require significant preprocessing, which can be a burden when the number of subsequent queries is not sufficiently large to amortize the cost. Furthermore, existing methods do not have the ability to directly control the suboptimality of their approximate results with theoretical guarantees. In this paper, we propose the first approximate algorithm for MIPS that does not require any preprocessing, and allows users to control and bound the suboptimality of the results. We cast MIPS as a Best Arm Identification problem, and introduce a new bandit setting that can fully exploit the special structure of MIPS. Our approach outperforms state-of-the-art methods on both synthetic and real-world datasets.

Download Full-text

Efficient Heuristic Hypothesis Ranking

Journal of Artificial Intelligence Research ◽

10.1613/jair.615 ◽

1999 ◽

Vol 10 ◽

pp. 375-397 ◽

Cited By ~ 3

Author(s):

S. Chien ◽

A. Stechert ◽

D. Mutz

Keyword(s):

Incomplete Information ◽

Real World ◽

Expected Loss ◽

Ranking Problem ◽

Additional Information ◽

Decision Cycle ◽

Ranking Procedures ◽

Probably Approximately Correct ◽

Real World Datasets ◽

The Cost

This paper considers the problem of learning the ranking of a set of stochastic alternatives based upon incomplete information (i.e., a limited number of samples). We describe a system that, at each decision cycle, outputs either a complete ordering on the hypotheses or decides to gather additional information (i.e., observations) at some cost. The ranking problem is a generalization of the previously studied hypothesis selection problem - in selection, an algorithm must select the single best hypothesis, while in ranking, an algorithm must order all the hypotheses. The central problem we address is achieving the desired ranking quality while minimizing the cost of acquiring additional samples. We describe two algorithms for hypothesis ranking and their application for the probably approximately correct (PAC) and expected loss (EL) learning criteria. Empirical results are provided to demonstrate the effectiveness of these ranking procedures on both synthetic and real-world datasets.

Download Full-text

The Cost-Effectiveness of Pre-school Peanut Oral Immunotherapy in the Real World Setting

The Journal of Allergy and Clinical Immunology In Practice ◽

10.1016/j.jaip.2021.02.058 ◽

2021 ◽

Author(s):

Marcus Shaker ◽

Edmond S. Chan ◽

Jennifer LP. Protudjer ◽

Lianne Soller ◽

Elissa M. Abrams ◽

...

Keyword(s):

Cost Effectiveness ◽

Real World ◽

Oral Immunotherapy ◽

The Real ◽

Real World Setting ◽

The Cost

Download Full-text

Time-Efficient Ensemble Learning with Sample Exchange for Edge Computing

ACM Transactions on Internet Technology ◽

10.1145/3409265 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1-17

Author(s):

Wu Chen ◽

Yong Yu ◽

Keke Gai ◽

Jiamou Liu ◽

Kim-Kwang Raymond Choo

Keyword(s):

Ensemble Learning ◽

Real World ◽

Interaction Mechanism ◽

Training Model ◽

Edge Computing ◽

Learning Techniques ◽

Multi Agent ◽

Real World Datasets ◽

Entire Dataset ◽

Exchange Data

In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The framework leverages edge computing to facilitate ensemble learning techniques by focusing on the balancing of access restrictions (small sub-dataset) and accuracy enhancement. Specifically, network edge nodes (learners) are utilized to model classifications and predictions in our framework. Data is then distributed to multiple base learners who exchange data via an interaction mechanism to achieve improved prediction. The proposed approach relies on a training model rather than conventional centralized learning. Findings from the experimental evaluations using 20 real-world datasets suggest that Multi-Agent Ensemble outperforms other ensemble approaches in terms of accuracy even though the base learners require fewer samples (i.e., significant reduction in computation costs).

Download Full-text

OFCOD: On the Fly Clustering Based Outlier Detection Framework

Data ◽

10.3390/data6010001 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1

Author(s):

Ahmed Elmogy ◽

Hamada Rizk ◽

Amany M. Sarhan

Keyword(s):

Data Mining ◽

Image Processing ◽

Intrusion Detection ◽

Real Time ◽

Outlier Detection ◽

Real World ◽

Medical Data ◽

Experimental Results ◽

Real Time Applications ◽

Real World Datasets

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.

Download Full-text

AB0799 REAL-WORLD EXPERIENCE OF SECUKINUMAB FOR PSORIATIC ARTHRITIS WITH AXIAL INVOLVEMENT.

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.6564 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1699.3-1699

Author(s):

M. Martin Lopez ◽

B. Joven-Ibáñez ◽

J. L. Pablos

Keyword(s):

Psoriatic Arthritis ◽

Adverse Events ◽

Real World ◽

Spinal Pain ◽

Tender Joint Count ◽

P Value ◽

X Ray ◽

Activity Assessment ◽

Real World Setting ◽

Axial Involvement

Background:Evidence on the efficacy of biologics in the treatment of psoriatic arthritis (PsA) patients with axial manifestations affecting 30-70% of PsA patients is limited. Secukinumab (SEC) has provided significant and sustained improvement in the signs and symptoms of active PsA and ankylosing spondylitis.Objectives:This study aims to analyze the experience of using SEC for PsA patients with axial involvement in real-world setting.Methods:Multicentric observational, longitudinal, retrospective study conducted in a tertiary hospital between January 2016 and December 2019. Patients with PsA (CASPAR criteria) and clinical and/or image diagnosis of axial involvement receiving at least one dose of SEC were included. Patients with non-pathological sacroiliacs x-ray and MRI had to have spinal pain VAS ≥4/10 after failure to NSAIDs, prior to the onset of SEC, to be included. Medical records were reviewed to collect demographic and clinical data, features of PsA (manifestations, treatments and activity assessment). Descriptive statistics and then a comparative analysis with the Studentt-test to analyze the effectiveness of SEC were performed.Results:Of 98 PsA patients treated with SEC, 58 (59.2%) had axial involvement, of which 41 (71%) female. Mean age was 54 y.o (SD 10) and average duration of the disease was 10 years (SD 8). All 58 patients had peripheral disease (33% joint erosions), 55 (95%) had psoriasis, 20 (34%) showed dactilitis and 39 (67%) had enthesitis. Sacroiliacs x-ray was damaged in 38 (66%) patients (grade I-IV) and 23 (40%) pathological MRI, with HLAB27+ at 8 (14%) patients. Average BMI was 29 (SD 8), with an obesity rate of 33% (19 pt). Observed comorbidities were hypertension (27 pt, 47%), diabetes mellitus (6 pt, 10%), dyslipidemia (23 pt, 40%), active smoking (18 pt, 31%) and malignancy (6 pt, 10%). Regarding previous treatments, 90% had received cDMARDs, particularly methotrexate (86%) and 40 (69%) had been exposed to at least one bDMARD (15 pt to one, 9 to two, 6 to three and 10 to four or more). 7 patients were on 300 mg dose and 51 patients on 150 mg dose (dose escalation to 300 mg was performed in 16 patients and 44% respond and maintain SEC). Average drug survival time was 1.4 (SD 1) years. At 6 months of SEC therapy, tender and swollen joint count, spinal pain VAS, CRP, ASDAS-CRP and DAPSA had significantly decreased (Table 1). 29 (50%) patients suspended SEC during follow-up due to primary ineffectiveness (8), secondary ineffectiveness (16), adverse events (3), latex allergy (1) and remission (1). Adverse events do not differ from those reported in clinical trials.Table 1.Disease activity assessment at 6 months of secukinumab therapy.Baseline6 months after SECMean differenceP valueSJC4,8±5,41,9±3,1-2,8 (IC95% -3,9 a -1,7)p<0,0001TJC7,7±5,83,9±4,1-3,8 (IC95% -5,1 a -2,4)p<0,0001Spinal pVAS6,1±3,24,2±2,9-1,9 (IC95% -2,4 a -1,4)p<0,0001CRP (mg/L)7,7±9,94,9±5,9-2,9 (IC95% -4,5 a -1,2)p=0,0009ASDAS-CRP2,5±1,91,8±1,3-0,7 (IC95% -0,9 a -0,4)p<0,0001DAPSA27,7±12,116,7±10,4-11 (IC95% -15,3 a -6,8)p<0,0001SJC: swollen joint count, TJC: tender joint count, Spinal pVAS: spinal pain visual analog scale, CRP: C-reactive protein, SEC: secukinumab.Conclusion:Secukinumab in real-world setting provided improvements in the axial and peripheral manifestations of PsA, using both the 150 mg and 300 mg doses.Disclosure of Interests:MARIA MARTIN LOPEZ: None declared, Beatriz Joven-Ibáñez Speakers bureau: Abbvie, Celgene, Janssen, Merck Sharp & Dohme, Novartis, Pfizer, José Luis Pablos: None declared

Download Full-text

Utility of routinely collected electronic health records data to support effectiveness evaluations in inflammatory bowel disease: a pilot study of tofacitinib

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2021-100337 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100337

Author(s):

Vivek Ashok Rudrapatna ◽

Benjamin Scott Glicksberg ◽

Atul Janardhan Butte

Keyword(s):

Inflammatory Bowel Disease ◽

Pilot Study ◽

Disease Activity ◽

Real World ◽

Bowel Disease ◽

P Value ◽

Health Records ◽

Real World Evidence ◽

Inflammatory Bowel

ObjectivesElectronic health records (EHR) are receiving growing attention from regulators, biopharmaceuticals and payors as a potential source of real-world evidence. However, their suitability for the study of diseases with complex activity measures is unclear. We sought to evaluate the use of EHR data for estimating treatment effectiveness in inflammatory bowel disease (IBD), using tofacitinib as a use case.MethodsRecords from the University of California, San Francisco (6/2012 to 4/2019) were queried to identify tofacitinib-treated IBD patients. Disease activity variables at baseline and follow-up were manually abstracted according to a preregistered protocol. The proportion of patients meeting the endpoints of recent randomised trials in ulcerative colitis (UC) and Crohn’s disease (CD) was assessed.Results86 patients initiated tofacitinib. Baseline characteristics of the real-world and trial cohorts were similar, except for universal failure of tumour necrosis factor inhibitors in the former. 54% (UC) and 62% (CD) of patients had complete capture of disease activity at baseline (month −6 to 0), while only 32% (UC) and 69% (CD) of patients had complete follow-up data (month 2 to 8). Using data imputation, we estimated the proportion achieving the trial primary endpoints as being similar to the published estimates for both UC (16%, p value=0.5) and CD (38%, p-value=0.8).Discussion/ConclusionThis pilot study reproduced trial-based estimates of tofacitinib efficacy despite its use in a different cohort but revealed substantial missingness in routinely collected data. Future work is needed to strengthen EHR data and enable real-world evidence in complex diseases like IBD.

Download Full-text

POS0619 MODELLING OF DISEASE ACTIVITY IN PATIENTS WITH INFLAMMATORY ARTHROPATHIES TREATED WITH ETANERCEPT ORIGINATOR OR BIOSIMILAR AS FIRST-LINE BIOLOGIC IN AN AUSTRALIAN REAL-WORLD DATASET

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2021-eular.2265 ◽

2021 ◽

Vol 80 (Suppl 1) ◽

pp. 547.1-547

Author(s):

C. Deakin ◽

G. Littlejohn ◽

H. Griffiths ◽

T. Smith ◽

C. Osullivan ◽

...

Keyword(s):

Disease Activity ◽

Clinical Data ◽

Real World ◽

Rank Test ◽

P Value ◽

Continuous Variables ◽

Linear Quadratic ◽

First Line ◽

Modest Improvement ◽

Over Time

Background:The availability of biosimilars as non-proprietary versions of established biologic disease-modifying anti-rheumatic drugs (bDMARDs) is enabling greater access for patients with rheumatic diseases to effective medications at a lower cost. Since April 2017 both the originator and a biosimilar for etanercept (trade names Enbrel and Brenzys, respectively) have been available for use in Australia.Objectives:[1]To model effectiveness of etanercept originator or biosimilar in reducing Disease Activity Score 28-joint count C reactive protein (DAS28CRP) in patients with rheumatoid arthritis (RA), psoriatic arthritis (PsA) or ankylosing spondylitis (AS) treated with either drug as first-line bDMARD[2]To describe persistence on etanercept originator or biosimilar as first-line bDMARD in patients with RA, PsA or ASMethods:Clinical data were obtained from the Optimising Patient outcomes in Australian rheumatoLogy (OPAL) dataset, derived from electronic medical records. Eligible patients with RA, PsA or AS who initiated etanercept originator (n=856) or biosimilar (n=477) as first-line bDMARD between 1 April 2017 and 31 December 2020 were identified. Propensity score matching was performed to select patients on originator (n=230) or biosimilar (n=136) with similar characteristics in terms of diagnosis, disease duration, joint count, age, sex and concomitant medications. Data on clinical outcomes were recorded at 3 months after baseline, and then at 6-monthly intervals. Outcomes data that were missing at a recorded visit were imputed.Effectiveness of the originator, relative to the biosimilar, for reducing DAS28CRP over time was modelled in the matched population using linear mixed models with both random intercepts and slopes to allow for individual heterogeneity, and weighting of individuals by inverse probability of treatment weights to ensure comparability between treatment groups. Time was modelled as a combination of linear, quadratic and cubic continuous variables.Persistence on the originator or biosimilar was analysed using survival analysis (log-rank test).Results:Reduction in DAS28CRP was associated with both time and etanercept originator treatment (Table 1). The conditional R-squared for the model was 0.31. The average predicted DAS28CRP at baseline, 3 months, 6 months, 9 months and 12 months were 4.0 and 4.4, 3.1 and 3.4, 2.6 and 2.8, 2.3 and 2.6, and 2.2 and 2.4 for the originator and biosimilar, respectively, indicating a clinically meaningful effect of time for patients on either drug and an additional modest improvement for patients on the originator.Median time to 50% of patients stopping treatment was 25.5 months for the originator and 24.1 months for the biosimilar (p=0.53). An adverse event was the reason for discontinuing treatment in 33 patients (14.5%) on the originator and 18 patients (12.9%) on the biosimilar.Conclusion:Analysis using a large national real-world dataset showed treatment with either the etanercept originator or the biosimilar was associated with a reduction in DAS28CRP over time, with the originator being associated with a further modest reduction in DAS28CRP that was not clinically significant. Persistence on treatment was not different between the two drugs.Table 1.Respondent characteristics.Fixed EffectEstimate95% Confidence Intervalp-valueTime (linear)0.900.89, 0.911.5e-63Time (quadratic)1.011.00, 1.011.3e-33Time (cubic)1.001.00, 1.007.1e-23Originator0.910.86, 0.960.0013Acknowledgements:The authors acknowledge the members of OPAL Rheumatology Ltd and their patients for providing clinical data for this study, and Software4Specialists Pty Ltd for providing the Audit4 platform.Supported in part by a research grant from Investigator-Initiated Studies Program of Merck & Co Inc, Kenilworth, NJ, USA. The opinions expressed in this paper are those of the authors and do not necessarily represent those of Merck & Co Inc, Kenilworth, NJ, USA.Disclosure of Interests:Claire Deakin: None declared, Geoff Littlejohn Consultant of: Over the last 5 years Geoffrey Littlejohn has received educational grants and consulting fees from AbbVie, Bristol Myers Squibb, Eli Lilly, Gilead, Novartis, Pfizer, Janssen, Sandoz, Sanofi and Seqirus., Hedley Griffiths Consultant of: AbbVie, Gilead, Novartis and Lilly., Tegan Smith: None declared, Catherine OSullivan: None declared, Paul Bird Speakers bureau: Eli Lilly, abbvie, pfizer, BMS, UCB, Gilead, Novartis

Download Full-text

Overlapping Community Detection Based on Attribute Augmented Graph

Entropy ◽

10.3390/e23060680 ◽

2021 ◽

Vol 23 (6) ◽

pp. 680

Author(s):

Hanyang Lin ◽

Yongzhao Zhan ◽

Zizheng Zhao ◽

Yuzhong Chen ◽

Chen Dong

Keyword(s):

Community Detection ◽

Real World ◽

Detection Algorithm ◽

Overlapping Community Detection ◽

Overlapping Communities ◽

Adjustment Strategy ◽

Topology Information ◽

Overlapping Community ◽

Real World Datasets ◽

Community Detection Algorithm

There is a wealth of information in real-world social networks. In addition to the topology information, the vertices or edges of a social network often have attributes, with many of the overlapping vertices belonging to several communities simultaneously. It is challenging to fully utilize the additional attribute information to detect overlapping communities. In this paper, we first propose an overlapping community detection algorithm based on an augmented attribute graph. An improved weight adjustment strategy for attributes is embedded in the algorithm to help detect overlapping communities more accurately. Second, we enhance the algorithm to automatically determine the number of communities by a node-density-based fuzzy k-medoids process. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed algorithms can effectively detect overlapping communities with fewer parameters compared to the baseline methods.

Download Full-text