scholarly journals Federated Learning in a Medical Context: A Systematic Literature Review

2021 ◽  
Vol 21 (2) ◽  
pp. 1-31
Author(s):  
Bjarne Pfitzner ◽  
Nico Steckhan ◽  
Bert Arnrich

Data privacy is a very important issue. Especially in fields like medicine, it is paramount to abide by the existing privacy regulations to preserve patients’ anonymity. However, data is required for research and training machine learning models that could help gain insight into complex correlations or personalised treatments that may otherwise stay undiscovered. Those models generally scale with the amount of data available, but the current situation often prohibits building large databases across sites. So it would be beneficial to be able to combine similar or related data from different sites all over the world while still preserving data privacy. Federated learning has been proposed as a solution for this, because it relies on the sharing of machine learning models, instead of the raw data itself. That means private data never leaves the site or device it was collected on. Federated learning is an emerging research area, and many domains have been identified for the application of those methods. This systematic literature review provides an extensive look at the concept of and research into federated learning and its applicability for confidential healthcare datasets.

2019 ◽  
Author(s):  
Edward W Huang ◽  
Ameya Bhope ◽  
Jing Lim ◽  
Saurabh Sinha ◽  
Amin Emad

ABSTRACTPrediction of clinical drug response (CDR) of cancer patients, based on their clinical and molecular profiles obtained prior to administration of the drug, can play a significant role in individualized medicine. Machine learning models have the potential to address this issue, but training them requires data from a large number of patients treated with each drug, limiting their feasibility. While large databases of drug response and molecular profiles of preclinical in-vitro cancer cell lines (CCLs) exist for many drugs, it is unclear whether preclinical samples can be used to predict CDR of real patients.We designed a systematic approach to evaluate how well different algorithms, trained on gene expression and drug response of CCLs, can predict CDR of patients. Using data from two large databases, we evaluated various linear and non-linear algorithms, some of which utilized information on gene interactions. Then, we developed a new algorithm called TG-LASSO that explicitly integrates information on samples’ tissue of origin with gene expression profiles to improve prediction performance. Our results showed that regularized regression methods provide significantly accurate prediction. However, including the network information or common methods of including information on the tissue of origin did not improve the results. On the other hand, TG-LASSO improved the predictions and distinguished resistant and sensitive patients for 7 out of 13 drugs. Additionally, TG-LASSO identified genes associated with the drug response, including known targets and pathways involved in the drugs’ mechanism of action. Moreover, genes identified by TG-LASSO for multiple drugs in a tissue were associated with patient survival. In summary, our analysis suggests that preclinical samples can be used to predict CDR of patients and identify biomarkers of drug sensitivity and survival.AUTHOR SUMMARYCancer is among the leading causes of death globally and perdition of the drug response of patients to different treatments based on their clinical and molecular profiles can enable individualized cancer medicine. Machine learning algorithms have the potential to play a significant role in this task; but, these algorithms are designed based the premise that a large number of labeled training samples are available, and these samples are accurate representation of the profiles of real tumors. However, due to ethical and technical reasons, it is not possible to screen humans for many drugs, significantly limiting the size of training data. To overcome this data scarcity problem, machine learning models can be trained using large databases of preclinical samples (e.g. cancer cell line cultures). However, due to the major differences between preclinical samples and real tumors, it is unclear how accurately such preclinical-to-clinical computational models can predict the clinical drug response of cancer patients.Here, first we systematically evaluate a variety of different linear and nonlinear machine learning algorithms for this particular task using two large databases of preclinical (GDSC) and tumor samples (TCGA). Then, we present a novel method called TG-LASSO that utilizes a new approach for explicitly incorporating the tissue of origin of samples in the prediction task. Our results show that TG-LASSO outperforms all other algorithms and can accurately distinguish resistant and sensitive patients for the majority of the tested drugs. Follow-up analysis reveal that this method can also identify biomarkers of drug sensitivity in each cancer type.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Roberto Salazar-Reyna ◽  
Fernando Gonzalez-Aleu ◽  
Edgar M.A. Granda-Gutierrez ◽  
Jenny Diaz-Ramirez ◽  
Jose Arturo Garza-Reyes ◽  
...  

PurposeThe objective of this paper is to assess and synthesize the published literature related to the application of data analytics, big data, data mining and machine learning to healthcare engineering systems.Design/methodology/approachA systematic literature review (SLR) was conducted to obtain the most relevant papers related to the research study from three different platforms: EBSCOhost, ProQuest and Scopus. The literature was assessed and synthesized, conducting analysis associated with the publications, authors and content.FindingsFrom the SLR, 576 publications were identified and analyzed. The research area seems to show the characteristics of a growing field with new research areas evolving and applications being explored. In addition, the main authors and collaboration groups publishing in this research area were identified throughout a social network analysis. This could lead new and current authors to identify researchers with common interests on the field.Research limitations/implicationsThe use of the SLR methodology does not guarantee that all relevant publications related to the research are covered and analyzed. However, the authors' previous knowledge and the nature of the publications were used to select different platforms.Originality/valueTo the best of the authors' knowledge, this paper represents the most comprehensive literature-based study on the fields of data analytics, big data, data mining and machine learning applied to healthcare engineering systems.


2021 ◽  
Author(s):  
Luc Thomès ◽  
Rebekka Burkholz ◽  
Daniel Bojar

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.


2021 ◽  
Author(s):  
Eren Asena ◽  
Henk Cremers

Introduction. Biological psychiatry has yet to find clinically useful biomarkers despite mucheffort. Is this because the field needs better methods and more data, or are current conceptualizations of mental disorders too reductionistic? Although this is an important question, there seems to be no consensus on what it means to be a “reductionist”. Aims. This paper aims to; a) to clarify the views of researchers on different types of reductionism; b) to examine the relationship between these views and the degree to which researchers believe mental disorders can be predicted from biomarkers; c) to compare these predictability estimates with the performance of machine learning models that have used biomarkers to distinguish cases from controls. Methods. We created a survey on reductionism and the predictability of mental disorders from biomarkers, and shared it with researchers in biological psychiatry. Furthermore, a literature review was conducted on the performance of machine learning models in predicting mental disorders from biomarkers. Results. The survey results showed that 9% of the sample were dualists and 57% were explanatory reductionists. There was no relationship between reductionism and perceived predictability. The estimated predictability of 11 mental disorders using currently available methods ranged between 65-80%, which was comparable to the results from the literature review. However, the participants were highly optimistic about the ability of future methods in distinguishing cases from controls. Moreover, although behavioral data were rated as the most effective data type in predicting mental disorders, the participants expected biomarkers to play a significant role in not just predicting, but also defining mental disorders in the future.


Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 452 ◽  
Author(s):  
Farida Habib Semantha ◽  
Sami Azam ◽  
Kheng Cher Yeo ◽  
Bharanidharan Shanmugam

In this digital age, we are observing an exponential proliferation of sophisticated hardware- and software-based solutions that are able to interact with the users at almost every sensitive aspect of our lives, collecting and analysing a range of data about us. These data, or the derived information out of it, are often too personal to fall into unwanted hands, and thus users are almost always wary of the privacy of such private data that are being continuously collected through these digital mediums. To further complicate the issue, the infringement cases of such databanks are on a sharp rise. Several frameworks have been devised in various parts of the globe to safeguard the issue of data privacy; in parallel, constant research is also being conducted on closing the loopholes within these frameworks. This study aimed to analyse the contemporary privacy by design frameworks to identify the key limitations. Seven contemporary privacy by design frameworks were examined in-depth in this research that was based on a systematic literature review. The result, targeted at the healthcare sector, is expected to produce a high degree of fortification against data breaches in the personal information domain.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2726
Author(s):  
Ryan Moore ◽  
Kristin R. Archer ◽  
Leena Choi

Accelerometers are increasingly being used in biomedical research, but the analysis of accelerometry data is often complicated by both the massive size of the datasets and the collection of unwanted data from the process of delivery to study participants. Current methods for removing delivery data involve arduous manual review of dense datasets. We aimed to develop models for the classification of days in accelerometry data as activity from human wear or the delivery process. These models can be used to automate the cleaning of accelerometry datasets that are adulterated with activity from delivery. We developed statistical and machine learning models for the classification of accelerometry data in a supervised learning context using a large human activity and delivery labeled accelerometry dataset. Model performances were assessed and compared using Monte Carlo cross-validation. We found that a hybrid convolutional recurrent neural network performed best in the classification task with an F1 score of 0.960 but simpler models such as logistic regression and random forest also had excellent performance with F1 scores of 0.951 and 0.957, respectively. The best performing models and related data processing techniques are made publicly available in the R package, Physical Activity.


Author(s):  
Alberto Purpura ◽  
Karolina Buchner ◽  
Gianmaria Silvello ◽  
Gian Antonio Susto

AbstractLEarning TO Rank (LETOR) is a research area in the field of Information Retrieval (IR) where machine learning models are employed to rank a set of items. In the past few years, neural LETOR approaches have become a competitive alternative to traditional ones like LambdaMART. However, neural architectures performance grew proportionally to their complexity and size. This can be an obstacle for their adoption in large-scale search systems where a model size impacts latency and update time. For this reason, we propose an architecture-agnostic approach based on a neural LETOR model to reduce the size of its input by up to 60% without affecting the system performance. This approach also allows to reduce a LETOR model complexity and, therefore, its training and inference time up to 50%.


2021 ◽  

Background: Nowadays, it can be seen that changes have taken place in the process of diseases and their clinical parameters. Accordingly, in some cases, general medical science and the use of clinical statistics based on the experiences of the physicians are not enough for the provision of sufficient tools for an early and accurate diagnosis. Therefore, medical science increasingly seeks to use unconventional methods and machine learning techniques. The issue of diagnosis in the medical world and the error rate of physicians in this regard are among the main challenges of the condition of patients and diseases. For this reason, in recent years, artificial intelligence tools have been used to help physicians. However, one of the main problems is that the effectiveness of machine learning tools is not studied much. Due to the sensitivity and high prevalence of diseases, especially gastrointestinal cancer, there is a need for a systematic review to identify methods of machine learning and artificial intelligence and compare their impact on the diagnosis of lower gastrointestinal cancers. Objectives: This systematic review aimed to identify the machine learning methods used for the diagnosis of lower gastrointestinal cancers. Moreover, it aimed to classify the presented methods and compare their effectiveness and evaluation indicators. Methods: This systematic review was conducted using six databases. The systematic literature review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement for systematic reviews. The search strategy consisted of four expressions, namely “machine learning algorithm”, “lower gastrointestinal”, “cancer”, and “diagnosis and screening”, in that order. It should be mentioned that studies based on treatment were excluded from this review. Similarly, studies that presented guidelines, protocols, and instructions were excluded since they only require the focus of clinicians and do not provide progression along an active chain of reasoning. Finally, studies were excluded if they had not undergone a peer-review process. The following aspects were extracted from each article: authors, year, country, machine learning model and algorithm, sample size, the type of data, and the results of the model. The selected studies were classified based on three criteria: 1) machine learning model, 2) cancer type, and 3) effect of machine learning on cancer diagnosis. Results: In total, 44 studies were included in this systematic literature review. The earliest article was published in 2010, and the most recent was from 2019. Among the studies reviewed in this systematic review, one study was performed on the rectum (rectal cancer), one was about the small bowel (small bowel cancer), and 42 studies were on the colon (colon cancer, colorectal cancer, and colonic polyps). In total, 19 out of the 44 (43%) articles from the systematic literature review presented a deep learning model, and 25 (57%) articles used classic machine learning. The models worked mostly on image and all of them were supervised learning models. All studies with deep learning models used Convolutional Neural Network and were published between 2016 and 2019. The studies with classic machine learning models used diverse methods, mostly Support Vector Machine, K-Nearest Neighbors, and Artificial Neural Network. Conclusion: Machine learning methods are suitable tools in the field of cancer diagnosis, especially in cases related to the lower gastrointestinal tract. These methods can not only increase the accuracy of diagnosis and help the doctor to make the right decision, but also help in the early diagnosis of cancer and reduce treatment costs. The methods presented so far have focused more on image data and more than anything else have helped to increase the accuracy of physicians in making the correct diagnosis. Achievement of the right method for early diagnosis requires more accurate data sets and analyses.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


Sign in / Sign up

Export Citation Format

Share Document