An Optimized Generic Client Service API for Managing Large Datasets within a Data Repository

Author(s):  
Ajinkya Prabhune ◽  
Rainer Stotzka ◽  
Thomas Jejkal ◽  
Volker Hartmann ◽  
Margund Bach ◽  
...  
Author(s):  
Javier Quinteros ◽  
Jerry A. Carter ◽  
Jonathan Schaeffer ◽  
Chad Trabant ◽  
Helle A. Pedersen

Abstract New data acquisition techniques are generating data at much finer temporal and spatial resolution, compared to traditional seismic experiments. This is a challenge for data centers and users. As the amount of data potentially flowing into data centers increases by one or two orders of magnitude, data management challenges are found throughout all stages of the data flow. The Incorporated Research Institutions for Seismology—Réseau sismologique et géodésique français and GEOForschungsNetz data centers—carried out a survey and conducted interviews of users working with very large datasets to understand their needs and expectations. One of the conclusions is that existing data formats and services are not well suited for users of large datasets. Data centers are exploring storage solutions, data formats, and data delivery options to meet large dataset user needs. New approaches will need to be discussed within the community, to establish large dataset standards and best practices, perhaps through participation of stakeholders and users in discussion groups and forums.


2017 ◽  
Vol 25 (2) ◽  
pp. 927-960
Author(s):  
Jarod Jacobs

In this article, I discuss three statistical tools that have proven pivotal in linguistic research, particularly those studies that seek to evaluate large datasets. These tools are the Gaussian Curve, significance tests, and hierarchical clustering. I present a brief description of these tools and their general uses. Then, I apply them to an analysis of the variations between the “biblical” DSS and our other witnesses, focusing upon variations involving particles. Finally, I engage the recent debate surrounding the diachronic study of Biblical Hebrew. This article serves a dual function. First, it presents statistical tools that are useful for many linguistic studies. Second, it develops an analysis of the he-locale, as it is used in the “biblical” Dead Sea Scrolls, Masoretic Text, and Samaritan Pentateuch. Through that analysis, this article highlights the value of inferential statistical tools as we attempt to better understand the Hebrew of our ancient witnesses.


2018 ◽  
Author(s):  
Andrew Dalke ◽  
Jerome Hert ◽  
Christian Kramer

We present mmpdb, an open source Matched Molecular Pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large datasets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit. mmpdb is freely available.


2012 ◽  
Vol 38 (11) ◽  
pp. 1831
Author(s):  
Wen-Jun HU ◽  
Shi-Tong WANG ◽  
Juan WANG ◽  
Wen-Hao YING

Author(s):  
Sadaf Qazi ◽  
Muhammad Usman

Background: Immunization is a significant public health intervention to reduce child mortality and morbidity. However, its coverage, in spite of free accessibility, is still very low in developing countries. One of the primary reasons for this low coverage is the lack of analysis and proper utilization of immunization data at various healthcare facilities. Purpose: In this paper, the existing machine learning based data analytics techniques have been reviewed critically to highlight the gaps where this high potential data could be exploited in a meaningful manner. Results: It has been revealed from our review, that the existing approaches use data analytics techniques without considering the complete complexity of Expanded Program on Immunization which includes the maintenance of cold chain systems, proper distribution of vaccine and quality of data captured at various healthcare facilities. Moreover, in developing countries, there is no centralized data repository where all data related to immunization is being gathered to perform analytics at various levels of granularities. Conclusion: We believe that the existing non-centralized immunization data with the right set of machine learning and Artificial Intelligence based techniques will not only improve the vaccination coverage but will also help in predicting the future trends and patterns of its coverage at different geographical locations.


Sign in / Sign up

Export Citation Format

Share Document