scholarly journals Big Data and Biomedical Informatics: A Challenging Opportunity

2014 ◽  
Vol 23 (01) ◽  
pp. 08-13 ◽  
Author(s):  
Riccardo Bellazzi

SummaryBig data are receiving an increasing attention in biomedicine and healthcare. It is therefore important to understand the reason why big data are assuming a crucial role for the biomedical informatics community. The capability of handling big data is becoming an enabler to carry out unprecedented research studies and to implement new models of healthcare delivery. Therefore, it is first necessary to deeply understand the four elements that constitute big data, namely Volume, Variety, Velocity, and Veracity, and their meaning in practice. Then, it is mandatory to understand where big data are present, and where they can be beneficially collected. There are research fields, such as translational bioinformatics, which need to rely on big data technologies to withstand the shock wave of data that is generated every day. Other areas, ranging from epidemiology to clinical care, can benefit from the exploitation of the large amounts of data that are nowadays available, from personal monitoring to primary care. However, building big data-enabled systems carries on relevant implications in terms of reproducibility of research studies and management of privacy and data access; proper actions should be taken to deal with these issues. An interesting consequence of the big data scenario is the availability of new software, methods, and tools, such as map-reduce, cloud computing, and concept drift machine learning algorithms, which will not only contribute to big data research, but may be beneficial in many biomedical informatics applications. The way forward with the big data opportunity will require properly applied engineering principles to design studies and applications, to avoid preconceptions or over-enthusiasms, to fully exploit the available technologies, and to improve data processing and data management regulations.

2017 ◽  
Vol 26 (01) ◽  
pp. 96-102 ◽  
Author(s):  
S. Murphy ◽  
V. Castro ◽  
K. Mandl

Summary Objectives: Although patients may have a wealth of imaging, genomic, monitoring, and personal device data, it has yet to be fully integrated into clinical care. Methods: We identify three reasons for the lack of integration. The first is that “Big Data” is poorly managed by most Electronic Medical Record Systems (EMRS). The data is mostly available on “cloud-native” platforms that are outside the scope of most EMRs, and even checking if such data is available on a patient often must be done outside the EMRS. The second reason is that extracting features from the Big Data that are relevant to healthcare often requires complex machine learning algorithms, such as determining if a genomic variant is protein-altering. The third reason is that applications that present Big Data need to be modified constantly to reflect the current state of knowledge, such as instructing when to order a new set of genomic tests. In some cases, applications need to be updated nightly. Results: A new architecture for EMRS is evolving which could unite Big Data, machine learning, and clinical care through a microservice-based architecture which can host applications focused on quite specific aspects of clinical care, such as managing cancer immunotherapy. Conclusion: Informatics innovation, medical research, and clinical care go hand in hand as we look to infuse science-based practice into healthcare. Innovative methods will lead to a new ecosystem of applications (Apps) interacting with healthcare providers to fulfill a promise that is still to be determined.


2021 ◽  
Author(s):  
Sarah Gregory ◽  
Lewis Killin ◽  
Hannah Pullen ◽  
Clare Dolan ◽  
Matthew Hunter ◽  
...  

BACKGROUND Background: Harnessing the power of big data has unexplored potential in the field of dementia and brain health research. However, as interest in big data increases it is important to learn what the public understands about the use of their routinely collected healthcare data for research purposes, and their attitudes to such use. Participants’ data is increasingly collected in studies with open-data access processes in place, and through informed consent processes, participants show their willingness to share their data in this way. There remains an inherent flaw in research studies whereby the participants may not reflect the population at large representing a sampling bias. Access to medical records allows research studies to include a wholly representative sample. OBJECTIVE This study aimed to explore attitudes held by members of the public on the use of their healthcare data for dementia research purposes. METHODS Methods: Data was collected in a series of focus groups with semi-structured discussions. Transcripts from the focus groups were analysed using thematic analysis. RESULTS Results: Participants reported a willingness for their anonymised healthcare data to be accessed and used for research purposes, with some caveats for identifiable or highly sensitive data. Participants were happier for trusted organisations, such as the UK’s National Health Service and universities, to access their data compared to pharmaceutical companies. Clear and transparent communication about both the use of healthcare data in research studies and about study results was highlighted as important to participants. There was general misunderstanding about what healthcare data included and how researchers use healthcare data. CONCLUSIONS Conclusions: Overall, our findings underline the importance of clear communication to build trust and understand in the public about how their healthcare data can be used to support high quality dementia and brain health focussed data research. CLINICALTRIAL NA


2017 ◽  
Vol 26 (01) ◽  
pp. 96-102
Author(s):  
S. Murphy ◽  
V. Castro ◽  
K. Mandl

Summary Objectives: Although patients may have a wealth of imaging, genomic, monitoring, and personal device data, it has yet to be fully integrated into clinical care. Methods: We identify three reasons for the lack of integration. The first is that “Big Data” is poorly managed by most Electronic Medical Record Systems (EMRS). The data is mostly available on “cloud-native” platforms that are outside the scope of most EMRs, and even checking if such data is available on a patient often must be done outside the EMRS. The second reason is that extracting features from the Big Data that are relevant to healthcare often requires complex machine learning algorithms, such as determining if a genomic variant is protein-altering. The third reason is that applications that present Big Data need to be modified constantly to reflect the current state of knowledge, such as instructing when to order a new set of genomic tests. In some cases, applications need to be updated nightly. Results: A new architecture for EMRS is evolving which could unite Big Data, machine learning, and clinical care through a microservice-based architecture which can host applications focused on quite specific aspects of clinical care, such as managing cancer immunotherapy. Conclusion: Informatics innovation, medical research, and clinical care go hand in hand as we look to infuse science-based practice into healthcare. Innovative methods will lead to a new ecosystem of applications (Apps) interacting with healthcare providers to fulfill a promise that is still to be determined.


2021 ◽  
Author(s):  
Karyna Rodriguez ◽  
Neil Hodgson

<p>Seismic data has been and continues to be the main tool for hydrocarbon exploration. Storing very large quantities of seismic data, as well as making it easily accessible and with machine learning functionality, is the way forward to gain regional and local understanding of petroleum systems. Seismic data has been made available as a streamed service through a web-based platform allowing seismic data access on the spot, from large datasets stored in the cloud. A data lake can be defined as transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. The global library of data has been deconstructed from the rigid flat file format traditionally associated with seismic and transformed into a distributed, scalable, big data store. This allows for rapid access, complex queries, and efficient use of computer power – fundamental criteria for enabling Big Data technologies such as deep learning.  </p><p>This data lake concept is already changing the way we access seismic data, enhancing the efficiency of gaining insights into any hydrocarbon basin. Examples include the identification of potentially prolific mixed turbidite/contourite systems in the Trujillo Basin offshore Peru, together with important implications of BSR-derived geothermal gradients, which are much higher than expected in a fore arc setting, opening new exploration opportunities. Another example is de-risking and ranking of offshore Malvinas Basin blocks by gaining new insights into areas until very recently considered to be non-prospective. Further de-risking was achieved by carrying out an in-depth source rock analysis in the Malvinas and conjugate southern South Africa Basins. Additionally, the data lake enabled the development of machine learning algorithms for channel recognition which were successfully applied to data offshore Australia and Norway.</p><p>“On demand” regional seismic dataset access is proving invaluable in our efforts to make hydrocarbon exploration more efficient and successful. Machine learning algorithms are helping to automate the more mechanical tasks, leaving time for the more valuable task of analysing the results. The geological insights gained by combining these 2 aspects confirm the value of seismic data lakes.</p>


10.2196/18044 ◽  
2020 ◽  
Vol 22 (8) ◽  
pp. e18044
Author(s):  
Eli M Cahan ◽  
Purvesh Khatri

Up to 95% of novel interventions demonstrating significant effects at the bench fail to translate to the bedside. In recent years, the windfalls of “big data” have afforded investigators more substrate for research than ever before. However, issues with translation have persisted: although countless biomarkers for diagnostic and therapeutic targeting have been proposed, few of these generalize effectively. We assert that inadequate heterogeneity in datasets used for discovery and validation causes their nonrepresentativeness of the diversity observed in real-world patient populations. This nonrepresentativeness is contrasted with advantages rendered by the solicitation and utilization of data heterogeneity for multisystemic disease modeling. Accordingly, we propose the potential benefits of models premised on heterogeneity to promote the Institute for Healthcare Improvement’s Triple Aim. In an era of personalized medicine, these models can confer higher quality clinical care for individuals, increased access to effective care across all populations, and lower costs for the health care system.


2016 ◽  
Vol 25 (01) ◽  
pp. 211-218 ◽  
Author(s):  
M.G. Kahn ◽  
C. Weng

Summary Objectives: To reflect on the notable events and significant developments in Clinical Research Informatics (CRI) in the year of 2015 and discuss near-term trends impacting CRI. Methods: We selected key publications that highlight not only important recent advances in CRI but also notable events likely to have significant impact on CRI activities over the next few years or longer, and consulted the discussions in relevant scientific communities and an online living textbook for modern clinical trials. We also related the new concepts with old problems to improve the continuity of CRI research. Results: The highlights in CRI in 2015 include the growing adoption of electronic health records (EHR), the rapid development of regional, national, and global clinical data research networks for using EHR data to integrate scalable clinical research with clinical care and generate robust medical evidence. Data quality, integration, and fusion, data access by researchers, study transparency, results reproducibility, and infrastructure sustainability are persistent challenges. Conclusion: The advances in Big Data Analytics and Internet technologies together with the engagement of citizens in sciences are shaping the global clinical research enterprise, which is getting more open and increasingly stakeholder-centered, where stakeholders include patients, clinicians, researchers, and sponsors.


Author(s):  
Xabier Rodríguez-Martínez ◽  
Enrique Pascual-San-José ◽  
Mariano Campoy-Quiles

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Mahdi Torabzadehkashi ◽  
Siavash Rezaei ◽  
Ali HeydariGorji ◽  
Hosein Bobarshad ◽  
Vladimir Alves ◽  
...  

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.


2015 ◽  
Vol 2015 ◽  
pp. 1-16 ◽  
Author(s):  
Ashwin Belle ◽  
Raghuram Thiagarajan ◽  
S. M. Reza Soroushmehr ◽  
Fatemeh Navidi ◽  
Daniel A. Beard ◽  
...  

The rapidly expanding field of big data analytics has started to play a pivotal role in the evolution of healthcare practices and research. It has provided tools to accumulate, manage, analyze, and assimilate large volumes of disparate, structured, and unstructured data produced by current healthcare systems. Big data analytics has been recently applied towards aiding the process of care delivery and disease exploration. However, the adoption rate and research development in this space is still hindered by some fundamental problems inherent within the big data paradigm. In this paper, we discuss some of these major challenges with a focus on three upcoming and promising areas of medical research: image, signal, and genomics based analytics. Recent research which targets utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed. Potential areas of research within this field which have the ability to provide meaningful impact on healthcare delivery are also examined.


Sign in / Sign up

Export Citation Format

Share Document