scholarly journals Dementias Platform UK (DPUK) Data Portal - World-leading infrastructure facilitating innovative multi-modal research

Author(s):  
Christopher Orton ◽  
John Gallacher ◽  
Ronan Lyons ◽  
David Ford ◽  
Simon Thompson ◽  
...  

IntroductionModern team science requires effective sharing of data and skills. The DPUK Data Portal is a collection of tools, datasets and networks that allows for epidemiologists and specialist researchers alike to access, analyse and investigate cohort and different modalities of routine data across UK and international sources. Objectives and ApproachThe Portal is housed on an instance of UKSeRP (UK Secure eResearch Platform), that allows customisable infrastructure to be used for multi-modal research (thus far live in genetics, imaging and clinical data) for researchers across the world using remote access technology whilst allowing governance to remain with the data provider. A central team at Swansea University is responsible for data curation and processing, and runs an access procedure for researchers to apply to use data from multiple sources to be analysed in a central analysis environment. Other modalities are similarly hosted, with input from partner sites in Cardiff and Oxford. ResultsDPUK facilitates data access and research on 49 cohorts, 40 UK-based and 9 international. The centralised repository model including remote access and ability to store and make available different modalities of data, from phenotypic data, to genetic and imaging data, has allowed DPUK to begin to support research of varying topics, from those studying cognitive decline and Dementia as a disease, to those maturing analytical models. By providing access to data platforms specialising in genetics, imaging and routine clinical data, as well as to specialists in disease and biology to aid with its understanding, DPUK has realised a large-scale research exercise combining major data modalities on a central platform, and allow access to such rich data across the world under an umbrella of robust governance. Conclusion/ImplicationsGlobally, cohorts are pooling data, expertise and desire to enrich their own aims in partnership with a federated research community to enable in-depth scrutiny of the biological origins of dementia and the development and evaluation of novel approach to disease prevention and cure.

2021 ◽  
Author(s):  
Enrico Moiso ◽  
Paolo Provero

Alteration of metabolic pathways in cancer has been investigated for many years, beginning way before the discovery of the role of oncogenes and tumor suppressors, and the last few years have witnessed a renewed interest in this topic. Large-scale molecular and clinical data on tens of thousands of samples allow us today to tackle the problem from a general point of view. Here we show that trancriptomic profiles of tumors can be exploited to define metabolic cancer subtypes, that can be systematically investigated for association with other molecular and clinical data. We find thousands of significant associations between metabolic subtypes and molecular features such as somatic mutations, structural variants, epigenetic modifications, protein abundance and activation; and with clinical/phenotypic data including survival probability, tumor grade, and histological types. Our work provides a methodological framework and a rich database of statistical associations, accessible from https://metaminer.unito.it, that will contribute to the understanding of the role of metabolic alterations in cancer and to the development of precision therapeutic strategies.


2000 ◽  
Vol 09 (03) ◽  
pp. 293-297 ◽  
Author(s):  
D. BUSKULIC ◽  
L. DEROME ◽  
R. FLAMINIO ◽  
F. MARION ◽  
L. MASSONET ◽  
...  

A new generation of large scale and complex Gravitational Wave detectors is building up. They will produce big amount of data and will require intensive and specific interactive/batch data analysis. We will present VEGA, a framework for such data analysis, based on ROOT. VEGA uses the Frame format defined as standard by GW groups around the world. Furthermore, new tools are developed in order to facilitate data access and manipulation, as well as interface with existing algorithms. VEGA is currently evaluated by the VIRGO experiment.


2018 ◽  
Vol 1 (1) ◽  
pp. 263-274 ◽  
Author(s):  
Marylyn D. Ritchie

Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.


2019 ◽  
Author(s):  
S Bauermeister ◽  
C Orton ◽  
S Thompson ◽  
R A Barker ◽  
J R Bauermeister ◽  
...  

AbstractThe Dementias Platform UK (DPUK) Data Portal is a data repository facilitating access to data for 3 370 929 individuals in 42 cohorts. The Data Portal is an end-to-end data management solution providing a secure, fully auditable, remote access environment for the analysis of cohort data. All projects utilising the data are by default collaborations with the cohort research teams generating the data.The Data Portal uses UK Secure eResearch Platform (UKSeRP) infrastructure to provide three core utilities: data discovery, access, and analysis. These are delivered using a 7 layered architecture comprising: data ingestion, data curation, platform interoperability, data discovery, access brokerage, data analysis and knowledge preservation. Automated, streamlined, and standardised procedures reduce the administrative burden for all stakeholders, particularly for requests involving multiple independent datasets, where a single request may be forwarded to multiple data controllers. Researchers are provided with their own secure ‘lab’ using VMware which is accessed using two factor authentication.Over the last 2 years, 160 project proposals involving 579 individual cohort data access requests were received. These were received from 268 applicants spanning 72 institutions (56 academic, 13 commercial, 3 government) in 16 countries with 84 requests involving multiple cohorts. Project are varied including multi-modal, machine learning, and Mendelian randomisation analyses. Data access is usually free at point of use although a small number of cohorts require a data access fee.


2020 ◽  
Author(s):  
Elena Pavlenko ◽  
Daniel Strech ◽  
Holger Langhof

AbstractBackgroundThe promises of improved health care and health research through data-intensive applications rely on a growing amount of health data. At the core of large-scale data integration efforts, clinical data warehouses (CDW) are also responsible of data governance, managing data access and (re)use. As the complexity of the data flow increases, greater transparency and standardization of criteria and procedures is required in order to maintain objective oversight and control. This study assessed the spectrum of data access and use criteria and procedures in clinical data warehouses governance internationally.MethodsWe performed a systematic review of (a) the published scientific literature on CDW and (b) publicly available information on CDW data access, e.g., data access policies. A qualitative thematic analysis was applied to all included literature and policies.ResultsTwenty-three scientific publications and one policy document were included in the final analysis. The qualitative analysis led to a final set of three main thematic categories: (1) requirements, including recipient requirements, reuse requirements, and formal requirements; (2) structures and processes, including review bodies and review values; and (3) access, including access limitations.ConclusionsThe description of data access and use governance in the scientific literature is characterized by a high level of heterogeneity and ambiguity. In practice, this might limit the effective data sharing needed to fulfil the high expectations of data-intensive approaches in medical research and health care. The lack of publicly available information on access policies conflicts with ethical requirements linked to principles of transparency and accountability.CDW should publicly disclose by whom and under which conditions data can be accessed, and provide designated governance structures and policies to increase transparency on data access. The results of this review may contribute to the development of practice-oriented minimal standards for the governance of data access, which could also result in a stronger harmonization, efficiency, and effectiveness of CDW.


2016 ◽  
Author(s):  
Eleanor Williams ◽  
Josh Moore ◽  
Simon W. Li ◽  
Gabriella Rustici ◽  
Aleksandra Tarkowska ◽  
...  

AbstractAccess to primary research data is vital for the advancement of science. To extend the data types supported by community repositories, we built a prototype Image Data Resource (IDR) that collects and integrates imaging data acquired across many different imaging modalities. IDR links high-content screening, super-resolution microscopy, time-lapse and digital pathology imaging experiments to public genetic or chemical databases, and to cell and tissue phenotypes expressed using controlled ontologies. Using this integration, IDR facilitates the analysis of gene networks and reveals functional interactions that are inaccessible to individual studies. To enable re-analysis, we also established a computational resource based on IPython notebooks that allows remote access to the entire IDR. IDR is also an open source platform that others can use to publish their own image data. Thus IDR provides both a novel on-line resource and a software infrastructure that promotes and extends publication and re-analysis of scientific image data.


2020 ◽  
pp. 5-23
Author(s):  
M. V. Ershov

The slowdown in economic growth in the world was noted back in 2019, and in early 2020 it intensified due to the coronavirus pandemic. The large-scale decline in economic activity because of the closure of many leading economies affected all countries, disrupting the usual way of life and business mechanisms. Regulators and governments were forced to urgently implement a wide range of support measures. At the same time, the root causes of the current crisis lie outside the economic or financial sphere, which makes it different from the previous ones. Uncertainty is complemented by the fact that the traditional relations in the economy and in society are fundamentally changing. The established global supply chains are disrupted, the nature of labor relations is beginning to change, and the remote access regime is increasingly practiced. Economies are losing the ability to function as self-sufficient systems and require more and more support measures. As of the end of 2020, the likelihood of a second wave of the pandemic is quite high, while its duration and scale are still unclear; the prospects for further development of the world economy are becoming more and more uncertain, and the scale and mechanisms of stabilization of the situation are becoming more diverse and non-standard.


2021 ◽  
Vol 15 ◽  
Author(s):  
Hossein Mohammadian Foroushani ◽  
Rajat Dhar ◽  
Yasheng Chen ◽  
Jenny Gurney ◽  
Ali Hamzehloo ◽  
...  

Stroke is one of the leading causes of death and disability worldwide. Reducing this disease burden through drug discovery and evaluation of stroke patient outcomes requires broader characterization of stroke pathophysiology, yet the underlying biologic and genetic factors contributing to outcomes are largely unknown. Remedying this critical knowledge gap requires deeper phenotyping, including large-scale integration of demographic, clinical, genomic, and imaging features. Such big data approaches will be facilitated by developing and running processing pipelines to extract stroke-related phenotypes at large scale. Millions of stroke patients undergo routine brain imaging each year, capturing a rich set of data on stroke-related injury and outcomes. The Stroke Neuroimaging Phenotype Repository (SNIPR) was developed as a multi-center centralized imaging repository of clinical computed tomography (CT) and magnetic resonance imaging (MRI) scans from stroke patients worldwide, based on the open source XNAT imaging informatics platform. The aims of this repository are to: (i) store, manage, process, and facilitate sharing of high-value stroke imaging data sets, (ii) implement containerized automated computational methods to extract image characteristics and disease-specific features from contributed images, (iii) facilitate integration of imaging, genomic, and clinical data to perform large-scale analysis of complications after stroke; and (iv) develop SNIPR as a collaborative platform aimed at both data scientists and clinical investigators. Currently, SNIPR hosts research projects encompassing ischemic and hemorrhagic stroke, with data from 2,246 subjects, and 6,149 imaging sessions from Washington University’s clinical image archive as well as contributions from collaborators in different countries, including Finland, Poland, and Spain. Moreover, we have extended the XNAT data model to include relevant clinical features, including subject demographics, stroke severity (NIH Stroke Scale), stroke subtype (using TOAST classification), and outcome [modified Rankin Scale (mRS)]. Image processing pipelines are deployed on SNIPR using containerized modules, which facilitate replicability at a large scale. The first such pipeline identifies axial brain CT scans from DICOM header data and image data using a meta deep learning scan classifier, registers serial scans to an atlas, segments tissue compartments, and calculates CSF volume. The resulting volume can be used to quantify the progression of cerebral edema after ischemic stroke. SNIPR thus enables the development and validation of pipelines to automatically extract imaging phenotypes and couple them with clinical data with the overarching aim of enabling a broad understanding of stroke progression and outcomes.


2020 ◽  
Vol 10 (2) ◽  
pp. 103-106
Author(s):  
ASTEMIR ZHURTOV ◽  

Cruel and inhumane acts that harm human life and health, as well as humiliate the dignity, are prohibited in most countries of the world, and Russia is no exception in this issue. The article presents an analysis of the institution of responsibility for torture in the Russian Federation. The author comes to the conclusion that the current criminal law of Russia superficially and fragmentally regulates liability for torture, in connection with which the author formulated the proposals to define such act as an independent crime. In the frame of modern globalization, the world community pays special attention to the protection of human rights, in connection with which large-scale international standards have been created a long time ago. The Universal Declaration of Human Rights and other international acts enshrine prohibitions of cruel and inhumane acts that harm human life and health, as well as degrade the dignity.Considering the historical experience of the past, these standards focus on the prohibition of any kind of torture, regardless of the purpose of their implementation.


GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Ariel Rokem ◽  
Kendrick Kay

Abstract Background Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable. Results The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets. Conclusion Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.


Sign in / Sign up

Export Citation Format

Share Document