Causal Inference without Balance Checking: Coarsened Exact Matching

We discuss a method for improving causal inferences called “Coarsened Exact Matching” (CEM), and the new “Monotonic Imbalance Bounding” (MIB) class of matching methods from which CEM is derived. We summarize what is known about CEM and MIB, derive and illustrate several new desirable statistical properties of CEM, and then propose a variety of useful extensions. We show that CEM possesses a wide range of statistical properties not available in most other matching methods but is at the same time exceptionally easy to comprehend and use. We focus on the connection between theoretical properties and practical applications. We also make available easy-to-use open source software forR, Stata, andSPSSthat implement all our suggestions.

Download Full-text

Monitoring Cropping Systems: From Data Collection to Cloud Database Storage Using Open Source Software

Proceedings ◽

10.3390/proceedings2019030079 ◽

2020 ◽

Vol 30 (1) ◽

pp. 79

Author(s):

Ioanna Panagea ◽

Dangol Anuja ◽

Marc Olijslagers ◽

Jan Diels ◽

Guido Wyseure

Keyword(s):

Open Source ◽

Open Source Software ◽

Web Application ◽

Cropping Systems ◽

Cropping System ◽

Query Languages ◽

Climatic Conditions ◽

Raw Data ◽

Wide Range ◽

Study Sites

Agricultural cropping systems and experiments include complex interactions of processes and various management practices and/or treatments under a wide range of environmental and climatic conditions. The use of standardized formats to monitor and document these systems and experiments can help researchers and stakeholders to efficiently exchange data, promote interdisciplinary collaborations, and simplify modelling and analysis procedures. In the scope of the SoilCare Horizon 2020 project monitoring and assessment work package, an integrated scheme to collect, validate, store, and access cropping system information and experimental data from 16 study sites, was created. The aim of the scheme is to make the data readily available in a way that the information is useful, easy to access and download, and safe, relying only on open source software. The database design considers data and metadata required to properly and easily monitor, process, and analyse cropping systems and/or agricultural experiments. The scheme allows for the storage of data and metadata regarding the experimental set-up, associated people and institutions, information about field management operations and experimental procedures which are clearly separated for making analysis procedures faster, links between system components, and information about the environmental and climatic conditions. Raw data are entered by the users into a structured spreadsheet. The quality is checked before storing the data into the database. Providing raw data allows processing and analysing as each other user needs. A desktop import application has been created to upload the information from spreadsheet to database, which includes automated error checks of relationship tables, data types, data constraints, etc. The final component of the scheme is the database web application interface, which enables users to access and query the database across the study sites without the knowledge of query languages and to download the required data. For this system design, PostgreSQL is used for storing the data, pgAdmin 4 for database management administration, MongoDB for user management and authentication, Python for the development of the import application, Angular and Node.js/Express for the web application and spreadsheets compatible with LibreOffice Calc. The system is currently tested with data provided by the SoilCare study sites. Preliminary testing indicated that extended quality control of the spreadsheets was required from the system’s administrator to meet the standards and restrictions of the import application. Initial comments from the users indicate that the database scheme, even if it initially seems complicated, includes all the variables and details required for a complete monitoring and modelling of an agricultural cropping system.

Download Full-text

Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference

Political Analysis ◽

10.1093/pan/mpl013 ◽

2007 ◽

Vol 15 (3) ◽

pp. 199-236 ◽

Cited By ~ 1797

Author(s):

Daniel E. Ho ◽

Kosuke Imai ◽

Gary King ◽

Elizabeth A. Stuart

Keyword(s):

Causal Inference ◽

Mean Squared Error ◽

Statistical Properties ◽

Parametric Models ◽

Unified Approach ◽

Causal Inferences ◽

Fast Growing ◽

Squared Error ◽

Unique Model ◽

Functional Forms

Although published works rarely include causal estimates from more than a few model specifications, authors usually choose the presented estimates from numerous trial runs readers never see. Given the often large variation in estimates across choices of control variables, functional forms, and other modeling assumptions, how can researchers ensure that the few estimates presented are accurate or representative? How do readers know that publications are not merely demonstrations that it ispossibleto find a specification that fits the author's favorite hypothesis? And how do we evaluate or even define statistical properties like unbiasedness or mean squared error when no unique model or estimator even exists? Matching methods, which offer the promise of causal inference with fewer assumptions, constitute one possible way forward, but crucial results in this fast-growing methodological literature are often grossly misinterpreted. We explain how to avoid these misinterpretations and propose a unified approach that makes it possible for researchers to preprocess data with matching (such as with the easy-to-use software we offer) and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.

Download Full-text

Free and Open Source Software for Computational Chemistry Education

10.33774/chemrxiv-2021-hr1r0-v2 ◽

2021 ◽

Author(s):

Susi Lehtola ◽

Antti Karttunen

Keyword(s):

Computational Chemistry ◽

Open Source ◽

Open Source Software ◽

Density Functional ◽

Chemistry Education ◽

Tight Binding ◽

Industrial Applications ◽

Bring Your Own Device ◽

Wide Range ◽

The Masses

Abstract Long in the making, computational chemistry for the masses [J. Chem. Educ. 1996, 73, 104] is finally here. Our brief review on various free and open source software (FOSS) quantum chemistry packages points out the existence of software offering a wide range of functionality, all the way from approximate semiempirical calculations with tight-binding density functional theory to sophisticated ab initio wave function methods such as coupled-cluster theory, both for molecular and for solid-state systems. Combined with the remarkable increase in the computing power of personal devices, which now rivals that of the fastest supercomputers in the world of the 1990s, we demonstrate that a decentralized model for teaching computational chemistry is now possible thanks to FOSS computational chemistry packages, enabling students to perform reasonable modeling on their own computing devices, in the bring your own device (BYOD) scheme. FOSS software can be made trivially simple to install and keep up to date, eliminating the need for departmental support, and also enables comprehensive teaching strategies, as various algorithms' actual implementations can be used in teaching. We exemplify what kinds of calculations are feasible with four FOSS electronic structure programs, assuming only extremely modest computational resources, to illustrate how FOSS packages enable decentralized approaches to computational chemistry education within the BYOD scheme. FOSS also has further benefits: the open access to the source code of FOSS packages democratizes the science of computational chemistry, and FOSS packages can be used without limitation also beyond education, in academic and industrial applications, for example. For these reasons, we believe FOSS will become ever more pervasive in computational chemistry.

Download Full-text

FRAME-Sim: A Free-Software, Multibody-Based, Pilot in the Loop Rotorcraft Flight Simulator

Volume 6: 14th International Conference on Multibody Systems, Nonlinear Dynamics, and Control ◽

10.1115/detc2018-85289 ◽

2018 ◽

Author(s):

Andrea Zanoni ◽

Luca Conti ◽

Pierangelo Masarati

Keyword(s):

Open Source ◽

Open Source Software ◽

Development Process ◽

Free Software ◽

Design Stage ◽

Careful Planning ◽

Modern Approach ◽

Handling Qualities ◽

Wide Range ◽

Representative Model

In the context of a modern approach to the design of rotocraft, handling qualities should be the result of careful planning, rather than the output of a multitude of other choices, made primarily focusing on more immediate constraints. For a wide range of flight conditions and mission task elements, the test pilot feedback is the essential measure upon which the design choices are made. Thus, it is becoming of fundamental importance to be able to simulate a representative model of the vehicle in a pilot-in-the-loop environment as early as possible in the design stage. This work is intended to document the development process of one such system currently being realized at the facilities belonging to the Aerospace Science and Technology Department of Politecnico di Milano. Particular attention is given to the software architecture, based on the free and open-source multibody solver MBDyn. The development of a module specifically designed to exploit the environment visualization capabilities of FlightGear, also a free and open-source software, is presented.

Download Full-text

A Longitudinal Study of Fan-In and Fan-Out Coupling in Open-Source Systems

International Journal of Information System Modeling and Design ◽

10.4018/jismd.2011100101 ◽

2011 ◽

Vol 2 (4) ◽

pp. 1-26 ◽

Cited By ~ 2

Author(s):

Asma Mubarak ◽

Steve Counsell ◽

Robert M. Hierons

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Classes ◽

Future Problem ◽

Wide Range ◽

Type Classes ◽

Low Levels ◽

The Relationship ◽

Open Source Systems ◽

Maintenance Problem

Excessive coupling between object-oriented classes is widely acknowledged as a maintenance problem that can result in a higher propensity for faults in systems and a ‘stored up’ future problem. This paper explores the relationship between ‘fan-in’ and ‘fan-out’ coupling metrics over multiple versions of open-source software. More specifically, the relationship between the two metrics is explored to determine patterns of growth in each over the course of time. The JHawk tool was used to extract the two metrics from five open-source systems. Results show a wide range of traits in the classes to explain both high and low levels of fan-in and fan-out. Evidence was also found of certain ‘key’ classes (with both high fan-in and fan-out) and ‘client’ and ‘server’-type classes with high fan-out and fan-in, respectively. This paper provides an explanation of the composition and existence of such classes as well as for disproportionate increases in each of the two metrics over time. Finally, it was found that high fan-in class values tended to be associated with small classes; classes with high fan-out on the other hand tended to be relatively large classes.

Download Full-text

MOSQITO: an open-source and free toolbox for sound quality metrics in the industry and education

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-1767 ◽

2021 ◽

Vol 263 (5) ◽

pp. 1164-1175

Author(s):

Roberto San Millán-Castillo ◽

Eduardo Latorre-Iglesias ◽

Martin Glesser ◽

Salomé Wanty ◽

Daniel Jiménez-Caminero ◽

...

Keyword(s):

Open Source ◽

Programming Languages ◽

Open Source Software ◽

Objective Assessment ◽

Sound Quality ◽

Quality Metrics ◽

Point Of View ◽

Wide Range ◽

Efficient Learning ◽

High Level

Sound quality metrics provide an objective assessment of the psychoacoustics of sounds. A wide range of metrics has been already standardised while others remain as active research topics. Calculation algorithms are available in commercial equipment or Matlab scripts. However, they may not present available data on general documentation and validation procedures. Moreover, the use of these tools might be unaffordable for some students and independent researchers. In recent years, the scientific and technical community has been developing uncountable open-source software projects in several knowledge fields. The permission to use, study, modify, improve and distribute open-source software make it extremely valuable. It encourages collaboration and sharing, and thus transparency and continuous improvement of the coding. Modular Sound Quality Integrated Toolbox (MOSQITO) project relies on one of the most popular high-level and free programming languages: Python. The main objective of MOSQITO is to provide a unified and modular framework of key sound quality and psychoacoustics metrics, free and open-source, which supports reproducible testing. Moreover, open-source projects can be efficient learning tools at University degrees. This paper presents the current structure of the toolbox from a technical point of view. Besides, it discusses open-source development contributions to graduates training.

Download Full-text

Open Source Software Development Challenges

Research Anthology on Usage and Development of Open Source Software ◽

10.4018/978-1-7998-9158-1.ch003 ◽

2021 ◽

pp. 33-62

Author(s):

Abdulkadir Seker ◽

Banu Diri ◽

Halil Arslan ◽

Mehmet Fatih Amasyalı

Keyword(s):

Literature Review ◽

Software Development ◽

Open Source ◽

Systematic Literature Review ◽

Open Source Software ◽

Digital Libraries ◽

Common Code ◽

Wide Range ◽

Data Source ◽

Source Study

GitHub is the most common code hosting and repository service for open-source software (OSS) projects. Thanks to the great variety of features, researchers benefit from GitHub to solve a wide range of OSS development challenges. In this context, the authors thought that was important to conduct a literature review on studies that used GitHub data. To reach these studies, they conducted this literature review based on a GitHub dataset source study instead of a keyword-based search in digital libraries. Since GHTorrent is the most widely known GitHub dataset according to the literature, they considered the studies that cite this dataset for the systematic literature review. In this study, they reviewed the selected 172 studies according to some criteria that used the dataset as a data source. They classified them within the scope of OSS development challenges thanks to the information they extract from the metadata of studies. They put forward some issues about the dataset and they offered the focused and attention-grabbing fields and open challenges that we encourage the researchers to study on them.

Download Full-text

The Role of Free/Libre and Open Source Software in Learning Health Systems

Yearbook of Medical Informatics ◽

10.1055/s-0037-1606527 ◽

2017 ◽

Vol 26 (01) ◽

pp. 53-58

Author(s):

C. Paton ◽

T. Karopka

Keyword(s):

Open Source ◽

Health Systems ◽

Open Source Software ◽

Grey Literature ◽

Critical Role ◽

Patient Data ◽

It Infrastructure ◽

Management Tools ◽

Wide Range

Summary Objective: To give an overview of the role of Free/Libre and Open Source Software (FLOSS) in the context of secondary use of patient data to enable Learning Health Systems (LHSs). Methods: We conducted an environmental scan of the academic and grey literature utilising the MedFLOSS database of open source systems in healthcare to inform a discussion of the role of open source in developing LHSs that reuse patient data for research and quality improvement. Results: A wide range of FLOSS is identified that contributes to the information technology (IT) infrastructure of LHSs including operating systems, databases, frameworks, interoperability software, and mobile and web apps. The recent literature around the development and use of key clinical data management tools is also reviewed. Conclusions: FLOSS already plays a critical role in modern health IT infrastructure for the collection, storage, and analysis of patient data. The nature of FLOSS systems to be collaborative, modular, and modifiable may make open source approaches appropriate for building the digital infrastructure for a LHS.

Download Full-text

The Neyman— Rubin Model of Causal Inference and Estimation Via Matching Methods

10.1093/oxfordhb/9780199286546.003.0011 ◽

2009 ◽

Cited By ~ 11

Author(s):

Jasjeet Sekhon

Keyword(s):

Machine Learning ◽

Causal Inference ◽

Observational Data ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Conditional Probabilities ◽

Causal Inferences ◽

Matching Problem ◽

Matching Methods ◽

Causal Mechanisms

This article presents a detailed discussion of the Neyman-Rubin model of causal inference. Additionally, it describes under what conditions ‘matching’ approaches can lead to valid inferences, and what kinds of compromises sometimes have to be made with respect to generalizability to ensure valid causal inferences. Moreover, the article summarizes Mill's first three canons and shows the importance of taking chance into account and comparing conditional probabilities when chance variations cannot be ignored. The significance of searching for causal mechanisms is often overestimated by political scientists and this sometimes leads to an underestimate of the importance of comparing conditional probabilities. The search for causal mechanisms is probably especially useful when working with observational data. Machine learning algorithms can be used against the matching problem.

Download Full-text

Karawun: assisting evaluation of advances in multimodal imaging for neurosurgical planning and intraoperative neuronavigation

10.1101/2021.09.09.21262253 ◽

2021 ◽

Author(s):

Richard Beare ◽

Bonnie Alexander ◽

Aaron Warren ◽

Michael Kean ◽

Marc Seal ◽

...

Keyword(s):

Open Source ◽

Open Source Software ◽

Image Guidance ◽

Image Data ◽

Software Tool ◽

Neurosurgical Planning ◽

Wide Range ◽

Imaging Research ◽

The Impact ◽

Integrate Research

AbstractSubmitted to Magnetic Resonance in MedicinePurposeTo introduce a tool allowing neurosurgeons to evaluate the results of research tractography workflows for presurgical planning and intraoperative image-guidance, using standard neurosurgical navigation platforms.Theory and MethodsImproving communication between neurosurgeons and researchers developing new image acquisition and processing methods is critical for rapid translation of research to surgical practice. Presenting research outputs within existing clinical workflows is one approach that can assist such interdisciplinary communication. Neurosurgical navigation platforms can display and manipulate a wide range of medical image data and associated delineations and thus allow clinicians to evaluate the impact of new imaging research on their work. Currently, it is extremely difficult to integrate research-based image processing outputs into standard neurosurgical navigation platforms.ResultsIn this note we introduce Karawun, an open-source software tool for converting outputs from research imaging pipelines, especially diffusion MRI tractography reconstructions using advanced methodologies currently unavailable on commercial navigation platforms, into forms that can be imported into the Brainlab neurosurgical navigation platform (Brainlab AG, Munich, Germany). The externally created tractography images and delineations can be viewed and manipulated as if they were created by Brainlab. We illustrate how two surgical workups, created using open-source tools and different processing choices can be presented to the neurosurgeon who can evaluate the impact of the differences between the two workups on surgical decisions.ConclusionKarawun allows researchers developing novel imaging methodologies to display their results in environments that are familiar to clinical end-users, especially neurosurgeons, thus assisting translation of research into clinical practice.

Download Full-text