PrivateRide: A Privacy-Enhanced Ride-Hailing Service

AbstractIn the past few years, we have witnessed a rise in the popularity of ride-hailing services (RHSs), an online marketplace that enables accredited drivers to use their own cars to drive ride-hailing users. Unlike other transportation services, RHSs raise significant privacy concerns, as providers are able to track the precise mobility patterns of millions of riders worldwide. We present the first survey and analysis of the privacy threats in RHSs. Our analysis exposes high-risk privacy threats that do not occur in conventional taxi services. Therefore, we propose PrivateRide, a privacy-enhancing and practical solution that offers anonymity and location privacy for riders, and protects drivers’ information from harvesting attacks. PrivateRide lowers the high-risk privacy threats in RHSs to a level that is at least as low as that of many taxi services. Using real data-sets from Uber and taxi rides, we show that PrivateRide significantly enhances riders’ privacy, while preserving tangible accuracy in ride matching and fare calculation, with only negligible effects on convenience. Moreover, by using our Android implementation for experimental evaluations, we show that PrivateRide’s overhead during ride setup is negligible. In short, we enable privacy-conscious riders to achieve levels of privacy that are not possible in current RHSs and even in some conventional taxi services, thereby offering a potential business differentiator.

Download Full-text

Open Data Release and Privacy Concerns: Complexity in Mitigating Vulnerability with Controlled Perturbation

Journal of Food Quality ◽

10.1155/2021/9929049 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Shah Imran Alam ◽

Ihtiram Raza Khan ◽

Syed Imtiyaz Hassan ◽

Farheen Siddiqui ◽

M. Afshar Alam ◽

...

Keyword(s):

Data Quality ◽

Open Data ◽

Real Data ◽

Privacy Concerns ◽

The Past ◽

Privacy Laws ◽

Data Quality Assessment ◽

Anonymized Data ◽

Evaluation Parameters ◽

Technology Domain

The benefits of open data were realised worldwide since the past decades, and the efforts to move more data under the license of open data intensified. There was a steep rise of open data in government repositories. In our study, we point out that privacy is one of the consistent and leading barriers among others. Strong privacy laws restrict data owners from opening the data freely. In this paper, we attempted to study the applied solutions and to the best of our knowledge, we found that anonymity-preserving algorithms did a substantial job to protect privacy in the release of the structured microdata. Such anonymity-preserving algorithms argue and compete in objectivethat not only could the released anonymized data preserve privacy but also the anonymized data preserve the required level of quality. K-anonymity algorithm was the foundation of many of its successor algorithms of all privacy-preserving algorithms. l-diversity claims to add another dimension of privacy protection. Both these algorithms used together are known to provide a good balance between privacy and quality control of the dataset as a whole entity. In this research, we have used the K-anonymity algorithm and compared the results with the addon of l-diversity. We discussed the gap and reported the benefits and loss with various combinations of K and l values, taken in combination with released data quality from an analyst’s perspective. We first used dummy fictitious data to explain the general expectations and then concluded the contrast in the findings with the real data from the food technology domain. The work contradicts the general assumptions with a specific set of evaluation parameters for data quality assessment. Additionally, it is intended to argue in favour of pushing for research contributions in the field of anonymity preservation and intensify the effort for major trends of research, considering its importance and potential to benefit people.

Download Full-text

INCREASING POWER BY USING HAPLOTYPE SIMILARITY IN A MULTIMARKER TRANSMISSION/DISEQUILIBRIUM TEST

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001250014x ◽

2013 ◽

Vol 11 (02) ◽

pp. 1250014 ◽

Cited By ~ 4

Author(s):

MARÍA M. ABAD-GRAU ◽

NURIA MEDINA-MEDINA ◽

SERAFÍN MORAL ◽

ROSANA MONTES-SOLDADO ◽

SERGIO TORRES-SÁNCHEZ ◽

...

Keyword(s):

High Risk ◽

Prior Knowledge ◽

Transmission Disequilibrium Test ◽

State Of The Art ◽

Real Data ◽

Low Risk ◽

Data Sets ◽

Transmission Disequilibrium ◽

Maximum Extent ◽

Haplotype Similarity

It is already known that power in multimarker transmission/disequilibrium tests may improve with the number of markers as some associations may require several markers to be captured. However, a mechanism such as haplotype grouping must be used to avoid incremental complexity with the number of markers. 2G, a state-of-the-art transmission/disequilibrium test, implements this mechanism to its maximum extent by grouping haplotypes into only two groups, high and low-risk haplotypes, so that the test has only one degree of freedom regardless of the number of markers. The test checks whether those haplotypes more often transmitted from parents to offspring are truly high-risk haplotypes. In this paper we use haplotype similarity as prior knowledge to classify haplotypes as high or low risk ones and start with those haplotypes in which the prior will have lower impact i.e. those with the largest differences between transmission and non-transmission counts. If their counts are very different, the prior knowledge has little effect and haplotypes are classified as low or high risk as 2G does. We show a substantial gain in power achieved by this approach, in both simulation and real data sets.

Download Full-text

Challenge to Use the Interval Censorship Estimators for Time to Response Evaluation By Data from Chronic Myeloid Leukemia Registry

Blood ◽

10.1182/blood.v124.21.3140.3140 ◽

2014 ◽

Vol 124 (21) ◽

pp. 3140-3140

Author(s):

Sergey M. Kulikov ◽

Irina A. Tischenko ◽

Ekaterina Yu. Chelysheva ◽

Olga V. Lazareva ◽

Anna G. Turkina

Keyword(s):

High Risk ◽

Chronic Myeloid Leukemia ◽

Myeloid Leukemia ◽

Real Data ◽

Cytogenetic Response ◽

Long Term Results ◽

Surrogate Endpoints ◽

Data Sets ◽

Dash Line

Abstract Introduction: Most surrogate endpoints are based on periodical measurement and the assessment of event time uses data censored by both sides. Kaplan-Maier (KM) estimators are calculated from right censored data and as result they are biased and sensitive to irregularity in measurements. High rate of missing data and irregularity is a common problem for registries. Interval Censorship Estimators (ICE) are relative complex but more reliable and robust than KM. Cytogenetic response is a major prognostic factor for long-term results of therapy of chronic myeloid leukemia (CML) and is often used as surrogate endpoint. The challenge of ICE usage instead of KM’s for cytogenetic response estimation is illustrated on real data from the registry of patients with CML. Methods and data source: We compare data in 2 studies of similar population of CML patients. The studies are distinguished by the design and completeness of data. First one is retrospective, second is prospective controlled registry population study. For evaluation of time to event characteristics two estimators were used and compared: classical KM estimators and ICE estimators based on Turnbul’s algorithm, realized as SAS Macro [1]. The EUropean Treatment Outcome Study (EUTOS) is a registry based international investigation started in June 2007 running for 3 years. The aim is to study the epidemiology of CML. The first part of the study is OUT Study section (EUTOS-OSP) with retrospectively collect data form patients who are not included into the local or international clinical trials. The second part of EUTOS study is online registry so called Population Based Sections (PBS EUTOS) aimed to estimate incidence of CML in EUROPE. Results: For the analysis we made 2 data sets. First one includes data of 508 patients with 2005-2008 years of diagnosis from study EUTOS-OSP collected retrospectively from 36 regions of Russian. Median age at diagnosis was 49,3 years, range from 18 to 82, 47,6% of men, 6.7% in AC,BC phase, 29,3% at high risk by Sokal. Second set includes data of all 200 patients of PBS EUTOS study prospectively collected form 6 regions of Russia in 2008-2012 years. Median age at diagnosis was 50,4 years, range from 16 to 82, 50,8% of men, 6.0% in AC,BC phase, 31,7% at high risk by Sokal. Progression free survival (PFS) and complete cytogenetic responds (CCyR) probability were calculated by traditional KM method in OSP and PBS data sets. Also ICE estimations of the CCyR was done in this data sets. The 3-year PFS was estimated as 89.6% in OSP set and 88.8% in PBS set (fig. 1). KM estimations gives median time to CCyR =17.5 months in OSP and 12 months in PBS group, delta=5.5 moths (fig.2), if ICE estimations is used median time to CCyR =12 month in OSP, 8.5 month in PBS group, delta=3.5 moths (fig.3). The difference of CCyR between OSP and PBS was much less on fig.3 than on fig.2. The completeness of cytogenetic data in both studies was quite distinguishing. Percentages of patients with reported cytogenetic tests were following: in 6±3 month – 333/497 (67%) in OSP and 151/174 (87%) in PBS, in 24±3 month – 254/470 (54%) in OSP and 52/79 (66%) in PBS. Figure 1. KM estimations PFS in OSP (dash line) and PBS (solid line) sets Figure 1. KM estimations PFS in OSP (dash line) and PBS (solid line) sets Figure 2. KM estimations of CCyR in OSP (dash line) and PBS (solid line) sets. Figure 2. KM estimations of CCyR in OSP (dash line) and PBS (solid line) sets. Figure 3. ICE estimations of CCyR in OSP (dash line) and PBS (solid line) sets. Figure 3. ICE estimations of CCyR in OSP (dash line) and PBS (solid line) sets. Long term results of studies are almost identical (p=0.3, fig.1) although the KM estimations of response rates are essentially different (p<0.001, fig.2). Difference in CCyR estimations almost disappear (p=0.04,fig 4) if more robust ICE technique is used. So illusion of difference in CCyR is caused by completeness of data and not by the clinical reasons. Conclusions: Irregularity of time assessment of surrogate endpoints and missing data may lead to bias of classical estimations and then to wrong interpretations. The challenge of ICE usage instead of KM is illustrated on real data from CML registries. ICE estimations showed to be reliable and robust in comparison to classic right censored estimations. 1. References: So Y., Johnston G. Kim S.H.: // Analyzing Interval-Censored Survival Data with SAS® Software. SAS Global Forum 2010. Disclosures No relevant conflicts of interest to declare.

Download Full-text

1610: High-Risk Patients Undergoing Radical Prostatectomy Today are Less High-Risk than in the Past

The Journal of Urology ◽

10.1016/s0022-5347(18)35732-x ◽

2005 ◽

Vol 173 (4S) ◽

pp. 436-436

Author(s):

Christopher J. Kane ◽

Martha K. Terris ◽

William J. Aronson ◽

Joseph C. Presti ◽

Christopher L. Amling ◽

...

Keyword(s):

High Risk ◽

Radical Prostatectomy ◽

High Risk Patients ◽

The Past ◽

Risk Patients

Download Full-text

It's All About 'Location, Location, Location' - Privacy Concerns and the RFID Debate

SSRN Electronic Journal ◽

10.2139/ssrn.1367629 ◽

2009 ◽

Author(s):

Adrian Patrick Bannon

Keyword(s):

Location Privacy ◽

Privacy Concerns

Download Full-text

Transforming variables to central normality

Machine Learning ◽

10.1007/s10994-021-05960-5 ◽

2021 ◽

Author(s):

Jakob Raymaekers ◽

Peter J. Rousseeuw

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimator ◽

Simulation Study ◽

Real Data ◽

Data Sets ◽

Transformation Parameter ◽

Likelihood Estimator ◽

Extensive Simulation ◽

Highly Sensitive

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.

Download Full-text

A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data

Entropy ◽

10.3390/e23010062 ◽

2020 ◽

Vol 23 (1) ◽

pp. 62

Author(s):

Zhengwei Liu ◽

Fukang Zhu

Keyword(s):

Likelihood Estimation ◽

Real Data ◽

Autoregressive Models ◽

Superior Performance ◽

Data Sets ◽

Binomial Thinning ◽

Free Case ◽

Two Parameters ◽

Conditional Maximum ◽

Thinning Operator

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.

Download Full-text

Goodness-of-Fit Tests for Bivariate Time Series of Counts

Econometrics ◽

10.3390/econometrics9010010 ◽

2021 ◽

Vol 9 (1) ◽

pp. 10

Author(s):

Šárka Hudecová ◽

Marie Hušková ◽

Simos G. Meintanis

Keyword(s):

Goodness Of Fit ◽

Probability Generating Function ◽

Parametric Bootstrap ◽

Real Data ◽

Data Sets ◽

Test Statistics ◽

Finite Sample ◽

Generalized Poisson ◽

Goodness Of Fit Tests ◽

Monte Carlo Experiments

This article considers goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The finite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.

Download Full-text

TraceAll: A Real-Time Processing for Contact Tracing Using Indoor Trajectories

Information ◽

10.3390/info12050202 ◽

2021 ◽

Vol 12 (5) ◽

pp. 202

Author(s):

Louai Alarabi ◽

Saleh Basalamah ◽

Abdeltawab Hendawi ◽

Mohammed Abdalla

Keyword(s):

Infectious Diseases ◽

Infected Patient ◽

Public Health Problem ◽

Real Data ◽

Exposure Period ◽

Contact Tracing ◽

Data Sets ◽

Major Public Health Problem ◽

Real Time Processing ◽

Recent Developments

The rapid spread of infectious diseases is a major public health problem. Recent developments in fighting these diseases have heightened the need for a contact tracing process. Contact tracing can be considered an ideal method for controlling the transmission of infectious diseases. The result of the contact tracing process is performing diagnostic tests, treating for suspected cases or self-isolation, and then treating for infected persons; this eventually results in limiting the spread of diseases. This paper proposes a technique named TraceAll that traces all contacts exposed to the infected patient and produces a list of these contacts to be considered potentially infected patients. Initially, it considers the infected patient as the querying user and starts to fetch the contacts exposed to him. Secondly, it obtains all the trajectories that belong to the objects moved nearby the querying user. Next, it investigates these trajectories by considering the social distance and exposure period to identify if these objects have become infected or not. The experimental evaluation of the proposed technique with real data sets illustrates the effectiveness of this solution. Comparative analysis experiments confirm that TraceAll outperforms baseline methods by 40% regarding the efficiency of answering contact tracing queries.

Download Full-text

Implications of Recent Revelations from Basic and Clinical Studies of Barrett’s Esophagus for Screening and Surveillance Strategies

Foregut: The Journal of the American Foregut Society ◽

10.1177/2634516121990959 ◽

2021 ◽

Vol 1 (1) ◽

pp. 86-92

Author(s):

Stuart Jon Spechler ◽

Rhonda F. Souza

Keyword(s):

High Risk ◽

Barrett’S Esophagus ◽

Barrett's Esophagus ◽

Prevention Strategies ◽

Endoscopic Screening ◽

Screening Practices ◽

The Past ◽

Screening And Surveillance ◽

Gerd Symptoms ◽

Current Screening

During the past several decades, while the incidence of esophageal adenocarcinoma (EAC) has risen dramatically, our primary EAC-prevention strategies have been endoscopic screening of individuals with GERD symptoms for Barrett’s esophagus (BE), and endoscopic surveillance for those found to have BE. Unfortunately, current screening practices have failed to identify most patients who develop EAC, and the efficacy of surveillance remains highly questionable. We review potential reasons for failure of these practices including recent evidence that most EACs develop through a rapid genomic doubling pathway, and recent data suggesting that many EACs develop from segments of esophageal intestinal metaplasia too short to be recognized as BE. We highlight need for a biomarker to identify BE patients at high risk for neoplasia (who would benefit from early therapeutic intervention), and BE patients at low risk (who would not benefit from surveillance). Promising recent efforts to identify such a biomarker are reviewed herein.

Download Full-text