A query language for analyzing networks

Author(s):  
Anton Dries ◽  
Siegfried Nijssen ◽  
Luc De Raedt
Keyword(s):  
2020 ◽  
Vol 4 (s1) ◽  
pp. 50-50
Author(s):  
Robert Edward Freundlich ◽  
Gen Li ◽  
Jonathan P Wanderer ◽  
Frederic T Billings ◽  
Henry Domenico ◽  
...  

OBJECTIVES/GOALS: We modeled risk of reintubation within 48 hours of cardiac surgery using variables available in the electronic health record (EHR). This model will guide recruitment for a prospective, pragmatic clinical trial entirely embedded within the EHR among those at high risk of reintubation. METHODS/STUDY POPULATION: All adult patients admitted to the cardiac intensive care unit following cardiac surgery involving thoracotomy or sternotomy were eligible for inclusion. Data were obtained from operational and analytical databases integrated into the Epic EHR, as well as institutional and departmental-derived data warehouses, using structured query language. Variables were screened for inclusion in the model based on clinical relevance, availability in the EHR as structured data, and likelihood of timely documentation during routine clinical care, in the hopes of obtaining a maximally-pragmatic model. RESULTS/ANTICIPATED RESULTS: A total of 2325 patients met inclusion criteria between November 2, 2017 and November 2, 2019. Of these patients, 68.4% were male. Median age was 63.0. The primary outcome of reintubation occurred in 112/2325 (4.8%) of patients within 48 hours and 177/2325 (7.6%) at any point in the subsequent hospital encounter. Univariate screening and iterative model development revealed numerous strong candidate predictors (ANOVA plot, figure 1), resulting in a model with acceptable calibration (calibration plot, figure 2), c = 0.666. DISCUSSION/SIGNIFICANCE OF IMPACT: Reintubation is common after cardiac surgery. Risk factors are available in the EHR. We are integrating this model into the EHR to support real-time risk estimation and to recruit and randomize high-risk patients into a clinical trial comparing post-extubation high flow nasal cannula with usual care. CONFLICT OF INTEREST DESCRIPTION: REF has received grant funding and consulting fees from Medtronic for research on inpatient monitoring.


Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 149
Author(s):  
Petros Zervoudakis ◽  
Haridimos Kondylakis ◽  
Nicolas Spyratos ◽  
Dimitris Plexousakis

HIFUN is a high-level query language for expressing analytic queries of big datasets, offering a clear separation between the conceptual layer, where analytic queries are defined independently of the nature and location of data, and the physical layer, where queries are evaluated. In this paper, we present a methodology based on the HIFUN language, and the corresponding algorithms for the incremental evaluation of continuous queries. In essence, our approach is able to process the most recent data batch by exploiting already computed information, without requiring the evaluation of the query over the complete dataset. We present the generic algorithm which we translated to both SQL and MapReduce using SPARK; it implements various query rewriting methods. We demonstrate the effectiveness of our approach in temrs of query answering efficiency. Finally, we show that by exploiting the formal query rewriting methods of HIFUN, we can further reduce the computational cost, adding another layer of query optimization to our implementation.


1997 ◽  
Vol 26 (3) ◽  
pp. 4-11 ◽  
Author(s):  
Mary Fernandez ◽  
Daniela Florescu ◽  
Alon Levy ◽  
Dan Suciu

2021 ◽  
Vol 11 (5) ◽  
pp. 2405
Author(s):  
Yuxiang Sun ◽  
Tianyi Zhao ◽  
Seulgi Yoon ◽  
Yongju Lee

Semantic Web has recently gained traction with the use of Linked Open Data (LOD) on the Web. Although numerous state-of-the-art methodologies, standards, and technologies are applicable to the LOD cloud, many issues persist. Because the LOD cloud is based on graph-based resource description framework (RDF) triples and the SPARQL query language, we cannot directly adopt traditional techniques employed for database management systems or distributed computing systems. This paper addresses how the LOD cloud can be efficiently organized, retrieved, and evaluated. We propose a novel hybrid approach that combines the index and live exploration approaches for improved LOD join query performance. Using a two-step index structure combining a disk-based 3D R*-tree with the extended multidimensional histogram and flash memory-based k-d trees, we can efficiently discover interlinked data distributed across multiple resources. Because this method rapidly prunes numerous false hits, the performance of join query processing is remarkably improved. We also propose a hot-cold segment identification algorithm to identify regions of high interest. The proposed method is compared with existing popular methods on real RDF datasets. Results indicate that our method outperforms the existing methods because it can quickly obtain target results by reducing unnecessary data scanning and reduce the amount of main memory required to load filtering results.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 621
Author(s):  
Giuseppe Psaila ◽  
Paolo Fosci

Internet technology and mobile technology have enabled producing and diffusing massive data sets concerning almost every aspect of day-by-day life. Remarkable examples are social media and apps for volunteered information production, as well as Open Data portals on which public administrations publish authoritative and (often) geo-referenced data sets. In this context, JSON has become the most popular standard for representing and exchanging possibly geo-referenced data sets over the Internet.Analysts, wishing to manage, integrate and cross-analyze such data sets, need a framework that allows them to access possibly remote storage systems for JSON data sets, to retrieve and query data sets by means of a unique query language (independent of the specific storage technology), by exploiting possibly-remote computational resources (such as cloud servers), comfortably working on their PC in their office, more or less unaware of real location of resources. In this paper, we present the current state of the J-CO Framework, a platform-independent and analyst-oriented software framework to manipulate and cross-analyze possibly geo-tagged JSON data sets. The paper presents the general approach behind the J-CO Framework, by illustrating the query language by means of a simple, yet non-trivial, example of geographical cross-analysis. The paper also presents the novel features introduced by the re-engineered version of the execution engine and the most recent components, i.e., the storage service for large single JSON documents and the user interface that allows analysts to comfortably share data sets and computational resources with other analysts possibly working in different places of the Earth globe. Finally, the paper reports the results of an experimental campaign, which show that the execution engine actually performs in a more than satisfactory way, proving that our framework can be actually used by analysts to process JSON data sets.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Peter Baumann ◽  
Dimitar Misev ◽  
Vlad Merticariu ◽  
Bang Pham Huu

AbstractMulti-dimensional arrays (also known as raster data or gridded data) play a key role in many, if not all science and engineering domains where they typically represent spatio-temporal sensor, image, simulation output, or statistics “datacubes”. As classic database technology does not support arrays adequately, such data today are maintained mostly in silo solutions, with architectures that tend to erode and not keep up with the increasing requirements on performance and service quality. Array Database systems attempt to close this gap by providing declarative query support for flexible ad-hoc analytics on large n-D arrays, similar to what SQL offers on set-oriented data, XQuery on hierarchical data, and SPARQL and CIPHER on graph data. Today, Petascale Array Database installations exist, employing massive parallelism and distributed processing. Hence, questions arise about technology and standards available, usability, and overall maturity. Several papers have compared models and formalisms, and benchmarks have been undertaken as well, typically comparing two systems against each other. While each of these represent valuable research to the best of our knowledge there is no comprehensive survey combining model, query language, architecture, and practical usability, and performance aspects. The size of this comparison differentiates our study as well with 19 systems compared, four benchmarked to an extent and depth clearly exceeding previous papers in the field; for example, subsetting tests were designed in a way that systems cannot be tuned to specifically these queries. It is hoped that this gives a representative overview to all who want to immerse into the field as well as a clear guidance to those who need to choose the best suited datacube tool for their application. This article presents results of the Research Data Alliance (RDA) Array Database Assessment Working Group (ADA:WG), a subgroup of the Big Data Interest Group. It has elicited the state of the art in Array Databases, technically supported by IEEE GRSS and CODATA Germany, to answer the question: how can data scientists and engineers benefit from Array Database technology? As it turns out, Array Databases can offer significant advantages in terms of flexibility, functionality, extensibility, as well as performance and scalability—in total, the database approach of offering “datacubes” analysis-ready heralds a new level of service quality. Investigation shows that there is a lively ecosystem of technology with increasing uptake, and proven array analytics standards are in place. Consequently, such approaches have to be considered a serious option for datacube services in science, engineering and beyond. Tools, though, vary greatly in functionality and performance as it turns out.


Sign in / Sign up

Export Citation Format

Share Document