PATSQL

SQL is one of the most popular tools for data analysis, and it is now used by an increasing number of users without having expertise in databases. Several studies have proposed programming-by-example approaches to help such non-experts to write correct SQL queries. While existing methods support a variety of SQL features such as aggregation and nested query, they suffer a significant increase in computational cost as the scale of example tables increases. In this paper, we propose an efficient algorithm utilizing properties known in relational algebra to synthesize SQL queries from input and output tables. Our key insight is that a projection operator in a program sketch can be lifted above other operators by applying transformation rules in relational algebra, while preserving the semantics of the program. This enables a quick inference of appropriate columns in the projection operator, which is an essential component in synthesis but causes combinatorial explosions in prior work. We also introduce a novel form of constraints and its top-down propagation mechanism for efficient sketch completion. We implemented this algorithm in our tool PATSQL and evaluated it on 226 queries from prior benchmarks and Kaggle's tutorials. As a result, PATSQL solved 68% of the benchmarks and found 89% of the solutions within a second. Our tool is available at https://naist-se.github.io/patsql/.

Download Full-text

HESS Opinions On the use of laboratory experimentation: "Hydrologists, bring out shovels and garden hoses and hit the dirt"

Hydrology and Earth System Sciences ◽

10.5194/hess-14-369-2010 ◽

2010 ◽

Vol 14 (2) ◽

pp. 369-382 ◽

Cited By ~ 34

Author(s):

M. G. Kleinhans ◽

M. F. P. Bierkens ◽

M. van der Perk

Keyword(s):

Data Analysis ◽

Laboratory Experiments ◽

Field Work ◽

Experimental Investigations ◽

Top Down ◽

Laboratory Experimentation ◽

Controlled Experimentation ◽

Modelling Approaches

Abstract. From an outsider's perspective, hydrology combines field work with modelling, but mostly ignores the potential for gaining understanding and conceiving new hypotheses from controlled laboratory experiments. Sivapalan (2009) pleaded for a question- and hypothesis-driven hydrology where data analysis and top-down modelling approaches lead to general explanations and understanding of general trends and patterns. We discuss why and how such understanding is gained very effectively from controlled experimentation in comparison to field work and modelling. We argue that many major issues in hydrology are open to experimental investigations. Though experiments may have scale problems, these are of similar gravity as the well-known problems of fieldwork and modelling and have not impeded spectacular progress through experimentation in other geosciences.

Download Full-text

PENGARUH JARAK TANAM PADA SISTEM TANAM JAJAR LEGOWO TERHADAP PERTUMBUHAN, PRODUKTIVITAS DAN PENDAPATAN PETANI PADI SAWAH DI KABUPATEN SRAGEN JAWA TENGAH

SEPA Jurnal Sosial Ekonomi Pertanian dan Agribisnis ◽

10.20961/sepa.v13i2.21030 ◽

2018 ◽

Vol 13 (2) ◽

pp. 188

Author(s):

Tota Suhendrata

Keyword(s):

Data Analysis ◽

Paddy Rice ◽

Plant Spacing ◽

Rice Seedlings ◽

Financial Feasibility ◽

Input And Output ◽

Planting Distance ◽

Paired T Test ◽

Planting System ◽

Padi Rice

Abstract : One of the efforts to increase the productivity of paddy rice by setting the right spacing. At this time, developing technology engine planting of rice seedlings (rice transplanter) which introducing plant spacing ranging from narrow spacing to large plant spacing both on legowo row planting system and tile planting system. With regard to the introduction of these technologies, further research is needed in the effect of plant spacing on growth, productivity (grain yield) and income of paddy rice farmers. The assessment was carried out on the wetland of farmer group of Rukun Tani Sulur Blimbing Village of Sragen Regency on July – October 2014. The assessment consisted of 3 planting distance treatment of legowo row 2: 1 planting system, ie 20 x 10 x 40 cm, 20 x 13 x 15 cm and 20 x 15 x 40 cm, each treatment repeated 7 times. The area of each treatment is about 0.33 ha. The assessment involves 7 farmers, each farmer carrying out 3 treatment. The seedlings using legowo 2:1planting system of rice transplanter. This rice transplanter has 3 combination of plant distance, that is 20 x 10 x 40 cm, 20 x 13 x 15 cm and 20 x 15 x 40 cm. The data collected includes the number of productive tillers, productivity, input and output of farming. Data analysis to compare between 3 treatment by using paired t test. While the analysis of financial feasibility of paddy farming technology using partial budget analysis. The results of this assessment showed that a legowo row 2:1 planting system with plant distance 20 x 15 x 40 cm resulted in highest productive tillers, productivity and income compared to the legowo 2:1 with a narrower plant distance 20 x 10 x 40 cm and 20 x 13 x 40 cm. Abstrak: Salah satu upaya untuk meningkatkan produktivitas padi sawah melalui pengaturan jarak tanam yang tepat. Pada saat ini, berkembang teknologi mesin tanam bibit padi (rice transplanter) yang mengintroduksikan jarak tanam mulai dari jarak tanam sempit sampai dengan jarak tanam lebar baik pada sistem tanam jajar legowo maupun sistem tanam tegel. Berkenaan dengan introduksi teknologi tersebut perlu dilakukan penelitian lebih dalam pengaruh jarak tanam terhadap pertumbuhan, produktivitas (hasil gabah) dan pendapatan petani padi sawah. Pengkajian dilaksanakan pada lahan sawah kelompok tani Rukun Tani Sulur Desa Blimbing Kec. Sambirejo Kab. Sragen Jawa Tengah pada MT-3 2014. Pengkajian terdiri dari 3 perlakuan jarak tanam pada sistem tanam jajar legowo 2:1, yaitu 20 x 10 x 40 cm, 20 x 13 x 15 cm dan 20 x 15 x 40 cm dengan 7 kali ulangan. Luas masing-masing perlakuan sekitar 0,33 ha. Pengkajian melibatkan 7 orang petani, setiap petani melaksanakan 3 perlakuan. Tanam bibit padi menggunakan mesin tanam bibit padi 4 baris sistem tanam jajar legowo 2:1. Mesin tanam ini mempunyai 3 kombinasi jarak tanam, yaitu 20 x 10 x 40 cm, 20 x 13 x 15 cm dan 20 x 15 x 40 cm. Data yang dikumpulkan meliputi jumlah anakan produktif, produktivitas, input dan output usahatani. Analisis data untuk membandingkan antara 3 perlakuan jarak tanam dilakukan uji t berpasangan dengan menggunakan software SPSS Statistics 17.0. Sedangkan analisis kelayakan finansial teknologi usahatani padi sawah menggunakan analisis anggaran parsial. Hasil pengkajian menunjukkan bahwa sistem tanam jajar legowo 2:1 dengan jarak tanam lebar (20 x 15 x 40 cm) menghasilkan jumlah anakan produktif, produktivitas dan pendapatan yang lebih tinggi dibandingkan sistem tanam jajar legowo 2:1 dengan jarak tanam yang lebih sempit (20 x 10 x 40 cm dan 20 x 13 x 40 cm).

Download Full-text

Data Analysis and Interpretation in Metabolomics

Bioinformatics ◽

10.4018/978-1-4666-3604-0.ch077 ◽

2013 ◽

pp. 1494-1521

Author(s):

Jose M. Garcia-Manteiga

Keyword(s):

Mass Spectrometry ◽

Nuclear Magnetic Resonance ◽

Data Analysis ◽

Systems Biology ◽

Magnetic Resonance ◽

Biological System ◽

Analytical Techniques ◽

Point Of View ◽

The Other ◽

Top Down

Metabolomics represents the new ‘omics’ approach of the functional genomics era. It consists in the identification and quantification of all small molecules, namely metabolites, in a given biological system. While metabolomics refers to the analysis of any possible biological system, metabonomics is specifically applied to disease and physiopathological situations. The data collected within these approaches is highly integrative of the other higher levels and is hence amenable to be explored with a top-down systems biology point of view. The aim of this chapter is to give a global view of the state of the art in metabolomics describing the two analytical techniques usually used to give rise to this kind of data, nuclear magnetic resonance, NMR, and mass spectrometry. In addition, the author will focus on the different data analysis tools that can be applied to such studies to extract information with special interest at the attempts to integrate metabolomics with other ‘omics’ approaches and its relevance in systems biology modeling.

Download Full-text

Data Analysis and Interpretation in Metabolomics

Systemic Approaches in Bioinformatics and Computational Systems Biology - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-61350-435-2.ch002 ◽

2011 ◽

pp. 29-56

Author(s):

Jose M. Garcia-Manteiga

Keyword(s):

Mass Spectrometry ◽

Nuclear Magnetic Resonance ◽

Data Analysis ◽

Systems Biology ◽

Magnetic Resonance ◽

Biological System ◽

Analytical Techniques ◽

Point Of View ◽

The Other ◽

Top Down

Download Full-text

Methodology of Schema Integration for New Database Applications

Human Computer Interaction Development & Management ◽

10.4018/978-1-931777-13-1.ch011 ◽

2002 ◽

pp. 194-218

Author(s):

Joseph Fong ◽

Kamalakar Karlapalem ◽

Qing Li ◽

Irene Kwan

Keyword(s):

Data Analysis ◽

Functional Data Analysis ◽

Functional Data ◽

Schema Integration ◽

Top Down ◽

Database Applications ◽

Bottom Up ◽

Integration Techniques ◽

New Applications

A practitioner’s approach to integrate databases and evolve them so as to support new database applications is presented. The approach consists of a joint bottom-up and top-down methodology; the bottom-up approach is taken to integrate existing database using standard schema integration techniques (B-Schema), the top-down approach is used to develop a database schema for the new applications (T-Schema). The T-Schema uses a joint functional-data analysis. The B-schema is evolved by comparing it with the generated T-schema. This facilitates an evolutionary approach to integrate existing databases to support new applications as and when needed. The mutual completeness check of the T-Schema against B-Schema derive the schema modification steps to be performed on B-Schema to meet the requirements of the new database applications. A case study is presented to illustrate the methodology.

Download Full-text

Symbolic Regression-Based Genetic Approximations of the Colebrook Equation for Flow Friction

Water ◽

10.3390/w10091175 ◽

2018 ◽

Vol 10 (9) ◽

pp. 1175 ◽

Cited By ~ 10

Author(s):

Pavel Praks ◽

Dejan Brkić

Keyword(s):

Friction Factor ◽

Computational Cost ◽

Pipe Surface ◽

Flow Friction ◽

Relative Roughness ◽

Programming Tool ◽

Input And Output ◽

Input Parameters ◽

Logarithmic Law ◽

Low Computational Cost

Widely used in hydraulics, the Colebrook equation for flow friction relates implicitly to the input parameters; the Reynolds number, Re and the relative roughness of an inner pipe surface, ε/D with an unknown output parameter; the flow friction factor, λ; λ = f (λ, Re, ε/D). In this paper, a few explicit approximations to the Colebrook equation; λ ≈ f (Re, ε/D), are generated using the ability of artificial intelligence to make inner patterns to connect input and output parameters in an explicit way not knowing their nature or the physical law that connects them, but only knowing raw numbers, {Re, ε/D}→{λ}. The fact that the used genetic programming tool does not know the structure of the Colebrook equation, which is based on computationally expensive logarithmic law, is used to obtain a better structure of the approximations, which is less demanding for calculation but also enough accurate. All generated approximations have low computational cost because they contain a limited number of logarithmic forms used for normalization of input parameters or for acceleration, but they are also sufficiently accurate. The relative error regarding the friction factor λ, in in the best case is up to 0.13% with only two logarithmic forms used. As the second logarithm can be accurately approximated by the Padé approximation, practically the same error is obtained also using only one logarithm.

Download Full-text

Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges

Genome Biology ◽

10.1186/s13059-019-1794-0 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 7

Author(s):

Kyle Ellrott ◽

Alex Buchanan ◽

Allison Creason ◽

Michael Mason ◽

Thomas Schaffter ◽

...

Keyword(s):

Data Analysis ◽

Data Sharing ◽

Software Architectures ◽

Biomedical Data ◽

Output File ◽

Biomedical Data Analysis ◽

Input And Output ◽

Software Packages ◽

File Formats ◽

Computing Environments

Abstract Challenges are achieving broad acceptance for addressing many biomedical questions and enabling tool assessment. But ensuring that the methods evaluated are reproducible and reusable is complicated by the diversity of software architectures, input and output file formats, and computing environments. To mitigate these problems, some challenges have leveraged new virtualization and compute methods, requiring participants to submit cloud-ready software packages. We review recent data challenges with innovative approaches to model reproducibility and data sharing, and outline key lessons for improving quantitative biomedical data analysis through crowd-sourced benchmarking challenges.

Download Full-text

Public Attitudes Toward Immigration in the United States, France, and Germany. By Joel S. Fetzer. Cambridge: Cambridge University Press, 2000. 272p. $54.95 cloth, $19.95 paper.

American Political Science Review ◽

10.1017/s0003055402334348 ◽

2002 ◽

Vol 96 (1) ◽

pp. 260-262

Author(s):

Gallya Lahav

Keyword(s):

United States ◽

Public Opinion ◽

Data Analysis ◽

Public Attitudes ◽

The United States ◽

Cambridge University ◽

Input And Output ◽

Industrialized Democracies ◽

Immigration Politics ◽

Attitudes Toward Immigration

Joel Fetzer is to be congratulated for a serious attempt to bring a public opinion approach to comparative immigration politics. His book represents an ambitious step toward bridging the gap between policy input and output in the immigration equation of advanced industrialized democracies. Its occasional choppy organization and underdeveloped data analysis tend to distract from the import of the work and leave the reader yearning for a deeper and more substantive discussion.

Download Full-text

Predicting the Dynamic Response of Dual-Rotor System Subject to Interval Parametric Uncertainties Based on the Non-Intrusive Metamodel

Mathematics ◽

10.3390/math8050736 ◽

2020 ◽

Vol 8 (5) ◽

pp. 736 ◽

Cited By ~ 2

Author(s):

Chao Fu ◽

Guojin Feng ◽

Jiaojiao Ma ◽

Kuan Lu ◽

Yongfeng Yang ◽

...

Keyword(s):

Steady State ◽

Equations Of Motion ◽

Computational Cost ◽

Distribution Functions ◽

Rotor System ◽

Propagation Mechanism ◽

Parametric Uncertainties ◽

Scanning Method ◽

Rotor Systems ◽

Interval Uncertainties

In this paper, the non-probabilistic steady-state dynamics of a dual-rotor system with parametric uncertainties under two-frequency excitations are investigated using the non-intrusive simplex form mathematical metamodel. The Lagrangian formulation is employed to derive the equations of motion (EOM) of the system. The simplex form metamodel without the distribution functions of the interval uncertainties is formulated in a non-intrusive way. In the multi-uncertain cases, strategies aimed at reducing the computational cost are incorporated. In numerical simulations for different interval parametric uncertainties, the special propagation mechanism is observed, which cannot be found in single rotor systems. Validations of the metamodel in terms of efficiency and accuracy are also carried out by comparisons with the scanning method. The results will be helpful to understand the dynamic behaviors of dual-rotor systems subject to uncertainties and provide guidance for robust design and analysis.

Download Full-text

Incremental execution of rule-based model transformation

International Journal on Software Tools for Technology Transfer ◽

10.1007/s10009-020-00583-y ◽

2020 ◽

Cited By ~ 1

Author(s):

Artur Boronat

Keyword(s):

Model Transformation ◽

Source Model ◽

Model Transformations ◽

Propagation Mechanism ◽

Change Propagation ◽

Computational Costs ◽

Domain Specific ◽

Transformation Rules ◽

Domain Models ◽

Event Based

Abstract When model transformations are used to implement consistency relations between very large models, incrementality plays a cornerstone role in detecting and resolving inconsistencies efficiently when models are updated. Given a directed consistency relation between two models, the problem studied in this work consists in propagating model changes from a source model to a target model in order to ensure consistency while minimizing computational costs. The mechanism that enforces such consistency is called consistency maintainer and, in this context, its scalability is a required non-functional requirement. State-of-the-art model transformation engines with support for incrementality normally rely on an observer pattern for linking model changes, also known as deltas, to the application of model transformation rules, in so-called dependencies, at run time. These model changes can then be propagated along an already executed model transformation. Only a few approaches to model transformation provide domain-specific languages for representing and storing model changes in order to enable their use in asynchronous, event-based execution environments. The principal contribution of this work is the design of a forward change propagation mechanism for incremental execution of model transformations, which decouples dependency tracking from change propagation using two innovations. First, the observer pattern-based model is replaced with dependency injection, decoupling domain models from consistency maintainers. Second, a standardized representation of model changes is reused, enabling interoperability with EMF-compliant tools, both for defining model changes and for processing them asynchronously. This procedure has been implemented in a model transformation engine, whose performance has been evaluated experimentally using the VIATRA CPS benchmark. In the experiments performed, the new transformation engine shows gains in the form of several orders of magnitude in the initial phase of the incremental execution of the benchmark model transformation and change propagation is performed in real time for those model sizes that are processable by other tools and, in addition, is able to process much larger models.

Download Full-text