scholarly journals Learnings from Multi-Source Enduring Linked Data Assets (MELDAS)

Author(s):  
Geoff Neideck

IntroductionDemand continues to grow for accessible and large scale linked data assets to answer complex cross-sector, and cross-jurisdiction research questions. To meet this demand, a number of Multi-source, Enduring Linked Data Assets (MELDAs) have emerged including the National Integrated Health Service Infrastructure (NIHSI), National Disability Data Asset (NDDA) and Multi-Agency Data Integration Project (MADIP). Using these MELDAs has proven much more efficient than project-specific linkages, and provides consistent national data assets for multiple uses. However, the development of these assets raises new challenges, including complex data models, governance, and access arrangements, and new approaches to analysis. Objectives and ApproachThrough developing the NIHSI Analytical Asset in collaboration with state/territory and Federal Government partners, the AIHW has identified challenges in traditional linkage approaches, which require innovative approaches to ensure high quality linkage. As AIHW commences scoping on new MELDAs, we are taking lessons from building the NIHSI and applying them to future design. ResultsAIHW’s development of MELDAs across jurisdictions and portfolios provides new learnings on how to address advanced real world data integration issues. This review will focus on lessons learnt at the Australian Institute of Health and Welfare (AIHW) working with new data sharing arrangements, applications of technologies and innovative approaches to streamline MELDA processes. Conclusion / ImplicationsThe learnings from the AIHW development of MELDAs will assist others developing enduring assets to establish effective sharing arrangements, governance and technical solutions to ensure efficient management. These learnings will save time and resources, and prompt further discussion on a gold standard for building MELDAs moving forwards.

Author(s):  
Bindi Kindermann ◽  
Sarah Hinde ◽  
Michael Abbondante

ABSTRACTObjectivesThe Australian Government’s new public sector data management agenda is initiating a national system for integrating public data and opening up access for policy makers and researchers. The Multi-agency Data Integration Project (‘the project’) is central to achieving these goals by bringing together nationally significant population datasets with the aim of streamlining the safe sharing of integrated government data. The project provides policy makers and researchers with safe access to linked, longitudinal information about the delivery of the Australian tax and transfer system, health services, along with rich demographic information. The project has been an essential step towards better enabling the Australian Government and research community to develop evidence-based policy and target services effectively, within a tight fiscal environment. The project has prompted government agencies to find new and more streamlined ways to work collaboratively to share and make best use of public data. ApproachThe first step of the project was to link a 2011 snapshot of four national administrative datasets with the 2011 Census. A cross-agency team of data analysts from five government agencies collaborated to evaluate the datasets and test whether the linked data could be used to answer policy questions. The linkage project included experimentation with different linking methodologies, linking strategies and information models for structuring the linkage. The evaluation tested whether the linked data was representative of key population groups of interest, and explored the validity of the content variables for measuring outcomes of interest. ResultsHigh linkage rates (between 80-95%) were achieved for the two-way linkages, and many population groups of interest were well represented. The work is confirming the value of the linkage for answering policy questions that had been difficult to address using existing approaches. The project developed ways of describing linkage quality to policy users and approaches to addressing linkage bias for different policy uses. ConclusionPublic sector data held by government has the power to improve life course outcomes for Australian people, households and businesses. The project has generated confidence and support for continued development of a central and streamlined integrated data system. It has also generated valuable insights about governance and how to scale up the linkage and dissemination system to support additional datasets and longitudinal data. This will maximise the value and utility of public data to support policy and research, in order to achieve a better understanding of, and deliver better outcomes for, the Australian community.


GigaScience ◽  
2021 ◽  
Vol 10 (9) ◽  
Author(s):  
Jaclyn Smith ◽  
Yao Shi ◽  
Michael Benedikt ◽  
Milos Nikolic

Abstract Background Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. Solution To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. Performance We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on “flattening” complex data structures, and runs efficiently when alternative approaches are unable to perform at all.


2021 ◽  
pp. 1-4
Author(s):  
Michalis Mountantonakis

Michalis Mountantonakis is a Postdoctoral Researcher of the Information Systems Laboratory at FORTH-ICS (Greece) and a Visiting Lecturer in the Computer Science Department at University of Crete (CSD-UoC), Greece. He obtained his PhD degree from the CSD-UoC in 2020. His research interests fall in the areas of large-scale semantic data integration, linked data and semantic data management. The results of his research have been published in more than 20 research papers. For his dissertation, he awarded a) the prestigious SWSA Distinguished Dissertation Award 2020, which is given to the PhD dissertation from the previous year with the highest originality, significance, and impact in the area of semantic web, and b) the Maria Michael Manasaki Legacy's fellowship, which is awarded to the best graduate student of CSD-UoC, once a year. In his dissertation, supervised by Associate Professor Yannis Tzitzikas (Computer Science Department at University of Crete), Michalis Mountantonakis dealt with the problem of Linked Data Integration at large scale, which is a very big challenging problem. He factorized the integration process according to various dimensions, for better understanding the overall problem and for identifying the open challenges, and proposed novel indexes and algorithms for providing core services, which can be exploited for several tasks related to Data Integration, such as: for finding all the URIs and all the available information for an entity, for producing connectivity analytics, for discovering the most relevant datasets for a given task, for dataset enrichment, and many others.


2013 ◽  
Vol 22 (5) ◽  
pp. 665-687 ◽  
Author(s):  
Gianluca Demartini ◽  
Djellel Eddine Difallah ◽  
Philippe Cudré-Mauroux

2020 ◽  
Author(s):  
Jaclyn M Smith ◽  
Yao Shi ◽  
Michael Benedikt ◽  
Milos Nikolic

Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on flattening complex data structures, and runs efficiently when alternative approaches are unable to perform at all.


1996 ◽  
Vol 33 (9) ◽  
pp. 237-244 ◽  
Author(s):  
Ghassan Chebbo ◽  
Dominique Laplace ◽  
André Bachoc ◽  
Yves Sanchez ◽  
Benoit Le Guennec

Solids in combined sewer networks represent two important technical questions: - the clogging of man-entry sewers, and - pollution in urban wet weather discharges, whose main vectors are generally suspended solids. In this paper, we shall present first, curative technical solutions which avoid or remove deposits in man-entry sewers. We shall discuss the partial extraction of the largest solids; selective trapping of bed load solids, which form deposits; and the displacement of deposits using dry weather flow flushing waves. We shall then examine technical solutions to control pollution in urban wet weather discharges. This will show that decantation is an efficient means of fighting pollution. However, it is not always feasible because it involves large scale investments. Complementary methods should, therefore, be developed and used at different points in the water's passage through an urban drainage area.


Author(s):  
Charles Miller ◽  
Lucas Lecheler ◽  
Bradford Hosack ◽  
Aaron Doering ◽  
Simon Hooper

Information visualization involves the visual, and sometimes interactive, presentation and organization of complex data in a clear, compelling representation. Information visualization is an essential element in peoples’ daily lives, especially those in data-driven professions, namely online educators. Although information visualization research and methods are prevalent in the diverse fields of healthcare, statistics, economics, information technology, computer science, and politics, few examples of successful information visualization design or integration exist in online learning. The authors provide a background of information visualization in education, explore a set of potential roles for information visualization in the future design and integration of online learning environments, provide examples of contemporary interactive visualizations in education, and discuss opportunities to move forward with design and research in this emerging area.


2021 ◽  
Vol 22 (5) ◽  
pp. 2659
Author(s):  
Gianluca Costamagna ◽  
Giacomo Pietro Comi ◽  
Stefania Corti

In the last decade, different research groups in the academic setting have developed induced pluripotent stem cell-based protocols to generate three-dimensional, multicellular, neural organoids. Their use to model brain biology, early neural development, and human diseases has provided new insights into the pathophysiology of neuropsychiatric and neurological disorders, including microcephaly, autism, Parkinson’s disease, and Alzheimer’s disease. However, the adoption of organoid technology for large-scale drug screening in the industry has been hampered by challenges with reproducibility, scalability, and translatability to human disease. Potential technical solutions to expand their use in drug discovery pipelines include Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) to create isogenic models, single-cell RNA sequencing to characterize the model at a cellular level, and machine learning to analyze complex data sets. In addition, high-content imaging, automated liquid handling, and standardized assays represent other valuable tools toward this goal. Though several open issues still hamper the full implementation of the organoid technology outside academia, rapid progress in this field will help to prompt its translation toward large-scale drug screening for neurological disorders.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-52
Author(s):  
Lorenzo De Stefani ◽  
Erisa Terolli ◽  
Eli Upfal

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.


Sign in / Sign up

Export Citation Format

Share Document