scholarly journals Big Data for disease prevention and precision medicine

Author(s):  
Marco Moscatelli ◽  
Matteo Gnocchi ◽  
Andrea Manconi ◽  
Luciano Milanesi

Motivation Nowadays, advances in technology has arisen in a huge amount of data in both biomedical research and healthcare systems. This growing amount of data gives rise to the need for new research methods and analysis techniques. Analysis of these data offers new opportunities to define novel diagnostic processes. Therefore, a greater integration between healthcare and biomedical data is essential to devise novel predictive models in the field of biomedical diagnosis. In this context, the digitalization of clinical exams and medical records is becoming essential to collect heterogeneous information. Analysis of these data by means of big data technologies will allow a more in depth understanding of the mechanisms leading to diseases, and contextually it will facilitate the development of novel diagnostics and personalized therapeutics. The recent application of big data technologies in the medical fields will offer new opportunities to integrate enormous amount of medical and clinical information from population studies. Therefore, it is essential to devise new strategies aimed at storing and accessing the data in a standardized way. Moreover, it is important to provide suitable methods to manage these heterogeneous data. Methods In this work, we present a new information technology infrastructure devised to efficiently manage huge amounts of heterogeneous data for disease prevention and precision medicine. A test set based on data produced by a clinical and diagnostic laboratory has been built to set up the infrastructure. When working with clinical data is essential to ensure the confidentiality of sensitive patient data. Therefore, the set up phase has been carried out using "anonymous data". To this end, specific techniques have been adopted with the aim to ensure a high level of privacy in the correlation of the medical records with important secondary information (e.g., date of birth, place of residence). It should be noted that the rigidity of relational databases does not lend to the nature of these data. In our opinion, better results can be obtained using non-relational (NoSQL) databases. Starting from these considerations, the infrastructure has been developed on a NoSQL database with the aim to combine scalability and flexibility performances. In particular, MongoDB [1] has been used as it fits better to manage different types of data on large scale. In doing so, the infrastructure is able to provide an optimized management of huge amounts of heterogeneous data, while ensuring high speed of analysis. Results The presented infrastructure exploits big data technologies in order to overcome the limitations of relational databases when working with large and heterogeneous data. The infrastructure implements a set of interface procedures aimed at preparing the metadata for importing data in a NOSQL DB. Abstract truncated at 3,000 characters - the full version is available in the pdf file

2016 ◽  
Author(s):  
Marco Moscatelli ◽  
Matteo Gnocchi ◽  
Andrea Manconi ◽  
Luciano Milanesi

Motivation Nowadays, advances in technology has arisen in a huge amount of data in both biomedical research and healthcare systems. This growing amount of data gives rise to the need for new research methods and analysis techniques. Analysis of these data offers new opportunities to define novel diagnostic processes. Therefore, a greater integration between healthcare and biomedical data is essential to devise novel predictive models in the field of biomedical diagnosis. In this context, the digitalization of clinical exams and medical records is becoming essential to collect heterogeneous information. Analysis of these data by means of big data technologies will allow a more in depth understanding of the mechanisms leading to diseases, and contextually it will facilitate the development of novel diagnostics and personalized therapeutics. The recent application of big data technologies in the medical fields will offer new opportunities to integrate enormous amount of medical and clinical information from population studies. Therefore, it is essential to devise new strategies aimed at storing and accessing the data in a standardized way. Moreover, it is important to provide suitable methods to manage these heterogeneous data. Methods In this work, we present a new information technology infrastructure devised to efficiently manage huge amounts of heterogeneous data for disease prevention and precision medicine. A test set based on data produced by a clinical and diagnostic laboratory has been built to set up the infrastructure. When working with clinical data is essential to ensure the confidentiality of sensitive patient data. Therefore, the set up phase has been carried out using "anonymous data". To this end, specific techniques have been adopted with the aim to ensure a high level of privacy in the correlation of the medical records with important secondary information (e.g., date of birth, place of residence). It should be noted that the rigidity of relational databases does not lend to the nature of these data. In our opinion, better results can be obtained using non-relational (NoSQL) databases. Starting from these considerations, the infrastructure has been developed on a NoSQL database with the aim to combine scalability and flexibility performances. In particular, MongoDB [1] has been used as it fits better to manage different types of data on large scale. In doing so, the infrastructure is able to provide an optimized management of huge amounts of heterogeneous data, while ensuring high speed of analysis. Results The presented infrastructure exploits big data technologies in order to overcome the limitations of relational databases when working with large and heterogeneous data. The infrastructure implements a set of interface procedures aimed at preparing the metadata for importing data in a NOSQL DB. Abstract truncated at 3,000 characters - the full version is available in the pdf file


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255562
Author(s):  
Eman Khashan ◽  
Ali Eldesouky ◽  
Sally Elghamrawy

The growing popularity of big data analysis and cloud computing has created new big data management standards. Sometimes, programmers may interact with a number of heterogeneous data stores depending on the information they are responsible for: SQL and NoSQL data stores. Interacting with heterogeneous data models via numerous APIs and query languages imposes challenging tasks on multi-data processing developers. Indeed, complex queries concerning homogenous data structures cannot currently be performed in a declarative manner when found in single data storage applications and therefore require additional development efforts. Many models were presented in order to address complex queries Via multistore applications. Some of these models implemented a complex unified and fast model, while others’ efficiency is not good enough to solve this type of complex database queries. This paper provides an automated, fast and easy unified architecture to solve simple and complex SQL and NoSQL queries over heterogeneous data stores (CQNS). This proposed framework can be used in cloud environments or for any big data application to automatically help developers to manage basic and complicated database queries. CQNS consists of three layers: matching selector layer, processing layer, and query execution layer. The matching selector layer is the heart of this architecture in which five of the user queries are examined if they are matched with another five queries stored in a single engine stored in the architecture library. This is achieved through a proposed algorithm that directs the query to the right SQL or NoSQL database engine. Furthermore, CQNS deal with many NoSQL Databases like MongoDB, Cassandra, Riak, CouchDB, and NOE4J databases. This paper presents a spark framework that can handle both SQL and NoSQL Databases. Four scenarios’ benchmarks datasets are used to evaluate the proposed CQNS for querying different NoSQL Databases in terms of optimization process performance and query execution time. The results show that, the CQNS achieves best latency and throughput in less time among the compared systems.


Author(s):  
Muhammad Mazhar Ullah Rathore ◽  
Awais Ahmad ◽  
Anand Paul

Geosocial network data provides the full information on current trends in human, their behaviors, their living style, the incidents and events, the disasters, current medical infection, and much more with respect to locations. Hence, the current geosocial media can work as a data asset for facilitating the national and the government itself by analyzing the geosocial data at real-time. However, there are millions of geosocial network users, who generates terabytes of heterogeneous data with a variety of information every day with high-speed, termed as Big Data. Analyzing such big amount of data and making real-time decisions is an inspiring task. Therefore, this book chapter discusses the exploration of geosocial networks. A system architecture is discussed and implemented in a real-time environment in order to process the abundant amount of various social network data to monitor the earth events, incidents, medical diseases, user trends and thoughts to make future real-time decisions as well as future planning.


2017 ◽  
Vol 12 (01) ◽  
Author(s):  
Shweta Kaushik

Internet assumes an essential part in giving different learning sources to the world, which encourages numerous applications to give quality support of the customers. As the years go on the web is over-burden with parcel of data and it turns out to be difficult to extricate the applicable data from the web. This offers path to the advancement of the Big Data and the volume of the information continues expanding quickly step by step. Enormous Data has increased much consideration from the scholarly world and the IT business. In the advanced and figuring world, data is produced and gathered at a rate that quickly surpasses the limit go. Data mining procedures are utilized to locate the concealed data from the huge information. This Technique is utilized store, oversee, and investigate high speed of information and this information can be in any shape organized or unstructured frame. It is hard to handle substantial volume of information utilizing information base strategy like RDBMS. From one perspective, Big Data is amazingly important to deliver efficiency in organizations and transformative achievements in logical controls, which give us a considerable measure of chances to make incredible advances in many fields. There is most likely the future rivalries in business profitability and advances will without a doubt merge into the Big Data investigations. Then again, Big Data likewise emerges with many difficulties, for example, troubles in information catch, information stockpiling, information investigation and information perception. In this paper we concentrate on the audit of Big Data, its information order techniques and the way it can be mined utilizing different mining strategies.


2019 ◽  
Vol 16 (8) ◽  
pp. 3419-3427
Author(s):  
Shishir K. Shandilya ◽  
S. Sountharrajan ◽  
Smita Shandilya ◽  
E. Suganya

Big Data Technologies are well-accepted in the recent years in bio-medical and genome informatics. They are capable to process gigantic and heterogeneous genome information with good precision and recall. With the quick advancements in computation and storage technologies, the cost of acquiring and processing the genomic data has decreased significantly. The upcoming sequencing platforms will produce vast amount of data, which will imperatively require high-performance systems for on-demand analysis with time-bound efficiency. Recent bio-informatics tools are capable of utilizing the novel features of Hadoop in a more flexible way. In particular, big data technologies such as MapReduce and Hive are able to provide high-speed computational environment for the analysis of petabyte scale datasets. This has attracted the focus of bio-scientists to use the big data applications to automate the entire genome analysis. The proposed framework is designed over MapReduce and Java on extended Hadoop platform to achieve the parallelism of Big Data Analysis. It will assist the bioinformatics community by providing a comprehensive solution for Descriptive, Comparative, Exploratory, Inferential, Predictive and Causal Analysis on Genome data. The proposed framework is user-friendly, fully-customizable, scalable and fit for comprehensive real-time genome analysis from data acquisition till predictive sequence analysis.


Author(s):  
Michele Ianni ◽  
Elio Masciari ◽  
Giancarlo Sperlí

Abstract The pervasive diffusion of Social Networks (SN) produced an unprecedented amount of heterogeneous data. Thus, traditional approaches quickly became unpractical for real life applications due their intrinsic properties: large amount of user-generated data (text, video, image and audio), data heterogeneity and high speed generation rate. More in detail, the analysis of user generated data by popular social networks (i.e Facebook (https://www.facebook.com/), Twitter (https://www.twitter.com/), Instagram (https://www.instagram.com/), LinkedIn (https://www.linkedin.com/)) poses quite intriguing challenges for both research and industry communities in the task of analyzing user behavior, user interactions, link evolution, opinion spreading and several other important aspects. This survey will focus on the analyses performed in last two decades on these kind of data w.r.t. the dimensions defined for Big Data paradigm (the so called Big Data 6 V’s).


2020 ◽  
Vol 69 (1) ◽  
pp. 323-326
Author(s):  
N.B. Zhapsarbek ◽  

In the modern world, specialists and the information systems they create are increasingly faced with the need to store, process and move huge amounts of data. The definition of large amounts of data, Big Data, is used to denote technologies such as storing and analyzing large amounts of data that require high speed and real-time decision making during processing. In this case, large volumes, high accumulation rate, and the lack of a strict internal structure of "big data" are considered. All of this also means that classic relational databases are not well suited for storing them. In this article, we showed solutions for processing large amounts of data for pharmacy chains using NoSQL. This paper presents technologies for modeling large amounts of data using NoSQL, including MongoDB, and also analyzes possible solutions, limitations that do not allow this to be done effectively. This article provides an overview of three modern approaches to working with big data: NoSQL, DataMining and real-time processing of event flows. In this article, as an implementation of the studied methods and technology, we consider a database of pharmacies for processing, searching, analyzing, forecasting big data. Also, when using NoSQL, we showed work with structured and poorly structured data in parallel in different aspects and showed a comparative analysis of the newly developed application for pharmacy workers.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 409
Author(s):  
Balázs Bohár ◽  
David Fazekas ◽  
Matthew Madgwick ◽  
Luca Csabai ◽  
Marton Olbei ◽  
...  

In the era of Big Data, data collection underpins biological research more so than ever before. In many cases this can be as time-consuming as the analysis itself, requiring downloading multiple different public databases, with different data structures, and in general, spending days before answering any biological questions. To solve this problem, we introduce an open-source, cloud-based big data platform, called Sherlock (https://earlham-sherlock.github.io/). Sherlock provides a gap-filling way for biologists to store, convert, query, share and generate biology data, while ultimately streamlining bioinformatics data management. The Sherlock platform provides a simple interface to leverage big data technologies, such as Docker and PrestoDB. Sherlock is designed to analyse, process, query and extract the information from extremely complex and large data sets. Furthermore, Sherlock is capable of handling different structured data (interaction, localization, or genomic sequence) from several sources and converting them to a common optimized storage format, for example to the Optimized Row Columnar (ORC). This format facilitates Sherlock’s ability to quickly and easily execute distributed analytical queries on extremely large data files as well as share datasets between teams. The Sherlock platform is freely available on Github, and contains specific loader scripts for structured data sources of genomics, interaction and expression databases. With these loader scripts, users are able to easily and quickly create and work with the specific file formats, such as JavaScript Object Notation (JSON) or ORC. For computational biology and large-scale bioinformatics projects, Sherlock provides an open-source platform empowering data management, data analytics, data integration and collaboration through modern big data technologies.


In the current day scenario, a huge amount of data is been generated from various heterogeneous sources like social networks, business apps, government sector, marketing, health care system, sensors, machine log data which is created at such a high speed and other sources. Big Data is chosen as one among the upcoming area of research by several industries. In this paper, the author presents wide collection of literature that has been reviewed and analyzed. This paper emphasizes on Big Data Technologies, Application & Challenges, a comparative study on architectures, methodologies, tools, and survey results proposed by various researchers are presented


2021 ◽  
Vol 2 ◽  
pp. 68-69
Author(s):  
E.A. Kirillova ◽  

The study analyzes the legal status and principles of Big Data technology, considers the role, features and significance of these technologies. The relevance of the research is dictated by the large-scale use of Big Data technologies in many areas and the weak legal regulation of the use of Big Data using personal data. The purpose of this study is to determine the legal status of Big Data technology and differentiate the concepts of ‘personal data’ and ‘Big Data technologies’. The study author’s definition of technology ‘Big Data’ and ‘personal data in electronic form’, developed principles for the use of Big Data technologies.


Sign in / Sign up

Export Citation Format

Share Document