partitioned data
Recently Published Documents


TOTAL DOCUMENTS

243
(FIVE YEARS 33)

H-INDEX

25
(FIVE YEARS 2)

2021 ◽  
Author(s):  
TIONG GOH ◽  
MengJun Liu

The ability to predict COVID-19 patients' level of severity (death or survival) enables clinicians to prioritise treatment. Recently, using three blood biomarkers, an interpretable machine learning model was developed to predict the mortality of COVID-19 patients. The method was reported to be suffering from performance stability because the identified biomarkers are not consistent predictors over an extended duration. To sustain performance, the proposed method partitioned data into three different time windows. For each window, an end-classifier, a mid-classifier and a front-classifier were designed respectively using the XGboost single tree approach. These time window classifiers were integrated into a majority vote classifier and tested with an isolated test data set. The voting classifier strengthens the overall performance of 90% cumulative accuracy from a 14 days window to a 21 days prediction window. An additional 7 days of prediction window can have a considerable impact on a patient's chance of survival. This study validated the feasibility of the time window voting classifier and further support the selection of biomarkers features set for the early prognosis of patients with a higher risk of mortality.


2021 ◽  
Author(s):  
Bart Kamphorst ◽  
Thomas Rooijakkers ◽  
Thijs Veugen ◽  
Matteo Cellamare ◽  
Daan Knoors

Abstract Background: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained. Methods: We use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed. Results: Our secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient. Conclusions: Our secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy.


10.2196/26598 ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. e26598
Author(s):  
Dongchul Cha ◽  
MinDong Sung ◽  
Yu-Rang Park

Background Machine learning (ML) is now widely deployed in our everyday lives. Building robust ML models requires a massive amount of data for training. Traditional ML algorithms require training data centralization, which raises privacy and data governance issues. Federated learning (FL) is an approach to overcome this issue. We focused on applying FL on vertically partitioned data, in which an individual’s record is scattered among different sites. Objective The aim of this study was to perform FL on vertically partitioned data to achieve performance comparable to that of centralized models without exposing the raw data. Methods We used three different datasets (Adult income, Schwannoma, and eICU datasets) and vertically divided each dataset into different pieces. Following the vertical division of data, overcomplete autoencoder-based model training was performed for each site. Following training, each site’s data were transformed into latent data, which were aggregated for training. A tabular neural network model with categorical embedding was used for training. A centrally based model was used as a baseline model, which was compared to that of FL in terms of accuracy and area under the receiver operating characteristic curve (AUROC). Results The autoencoder-based network successfully transformed the original data into latent representations with no domain knowledge applied. These altered data were different from the original data in terms of the feature space and data distributions, indicating appropriate data security. The loss of performance was minimal when using an overcomplete autoencoder; accuracy loss was 1.2%, 8.89%, and 1.23%, and AUROC loss was 1.1%, 0%, and 1.12% in the Adult income, Schwannoma, and eICU dataset, respectively. Conclusions We proposed an autoencoder-based ML model for vertically incomplete data. Since our model is based on unsupervised learning, no domain-specific knowledge is required in individual sites. Under the circumstances where direct data sharing is not available, our approach may be a practical solution enabling both data protection and building a robust model.


10.2196/21459 ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. e21459
Author(s):  
Qoua Her ◽  
Thomas Kent ◽  
Yuji Samizo ◽  
Aleksandra Slavkovic ◽  
Yury Vilk ◽  
...  

Background In clinical research, important variables may be collected from multiple data sources. Physical pooling of patient-level data from multiple sources often raises several challenges, including proper protection of patient privacy and proprietary interests. We previously developed an SAS-based package to perform distributed regression—a suite of privacy-protecting methods that perform multivariable-adjusted regression analysis using only summary-level information—with horizontally partitioned data, a setting where distinct cohorts of patients are available from different data sources. We integrated the package with PopMedNet, an open-source file transfer software, to facilitate secure file transfer between the analysis center and the data-contributing sites. The feasibility of using PopMedNet to facilitate distributed regression analysis (DRA) with vertically partitioned data, a setting where the data attributes from a cohort of patients are available from different data sources, was unknown. Objective The objective of the study was to describe the feasibility of using PopMedNet and enhancements to PopMedNet to facilitate automatable vertical DRA (vDRA) in real-world settings. Methods We gathered the statistical and informatic requirements of using PopMedNet to facilitate automatable vDRA. We enhanced PopMedNet based on these requirements to improve its technical capability to support vDRA. Results PopMedNet can enable automatable vDRA. We identified and implemented two enhancements to PopMedNet that improved its technical capability to perform automatable vDRA in real-world settings. The first was the ability to simultaneously upload and download multiple files, and the second was the ability to directly transfer summary-level information between the data-contributing sites without a third-party analysis center. Conclusions PopMedNet can be used to facilitate automatable vDRA to protect patient privacy and support clinical research in real-world settings.


2021 ◽  
Vol 235 ◽  
pp. 03029
Author(s):  
Zhengwei Zhao

As one of the most popular short video platform, Douyin has accumulated more than half of the Chinese netizens as its daily active users. Many users spend plenty of time viewing Douyin short videos, which make Douyin addiction become a widespread phenomenon. In this paper, the author analyzes algorithm principles used in Douyin. Combining both perspectives of mass media communication and algorithm technology to explain how it effects Douyin addiction. For one thing, the recommendation algorithm caters users by fully meeting their needs. Using the hierarchical interest label tree, the user persona and the partitioned data buckets strategy to recommend more accurate and personalized contents. For another, the algorithm uses the collaborative filtering algorithm and low-cost interaction design mechanism to make traps for users. The author also finds that there is a closed-loop relationship between Douyin addiction and algorithm optimization. The algorithm principles positively effects users’ continuance intention. Meanwhile, the more frequent the user uses Douyin, the more accurate the algorithm will be. If not intervened, the addiction may be severely exacerbated. So, the author comes up with a few suggestions for Douyin developers and users, trying to break the closed-loop.


2021 ◽  
Vol 12 (3) ◽  
pp. 412-423
Author(s):  
Hirofumi Miyajima ◽  
Noritaka Shigei ◽  
Hiromi Miyajima ◽  
Norio Shiratori

Author(s):  
Ayşenur Budak ◽  
Alp Ustundag ◽  
Bülent Güloğlu

The impacts of optimal pricing are rarely explored when it comes to truckload transportation. In this study, the question of what price should be given to which customer and for what service characteristics are investigated for truckload transportation. Accordingly, customers' attitudes and responses to the bid price must be modeled, and their flexibility in regards to the price must be analyzed. Bid response function is developed, and logit model is considered. The bid response function is examined from two different perspectives: the first one is a general model by which all data is used, and the second one is the logit model by using partitioned data obtained by clustering customers. Logit model sensitivity analysis is applied. After developing bid response functions, non-linear optimization model is developed to determine the bid price. The developed model will contribute to the logistics companies' profit margins in the long term.


Sign in / Sign up

Export Citation Format

Share Document