An Introduction to Bayesian Inference via Variational Approximations

Abstract Citizen science is fundamentally shifting the future of biodiversity research. But although citizen science observations are contributing an increasingly large proportion of biodiversity data, they only feature in a relatively small percentage of research papers on biodiversity. We provide our perspective on three frontiers of citizen science research, areas that we feel to date have had minimal scientific exploration but that we believe deserve greater attention as they present substantial opportunities for the future of biodiversity research: sampling the undersampled, capitalizing on citizen science's unique ability to sample poorly sampled taxa and regions of the world, reducing taxonomic and spatial biases in global biodiversity data sets; estimating abundance and density in space and time, develop techniques to derive taxon-specific densities from presence or absence and presence-only data; and capitalizing on secondary data collection, moving beyond data on the occurrence of single species and gain further understanding of ecological interactions among species or habitats. The contribution of citizen science to understanding the important biodiversity questions of our time should be more fully realized.

Download Full-text

The R-Squared: Some Straight Talk

Political Analysis ◽

10.1093/pan/2.1.153 ◽

1990 ◽

Vol 2 ◽

pp. 153-171 ◽

Cited By ~ 18

Author(s):

Michael S. Lewis-Beck ◽

Andrew Skalaban

Keyword(s):

Political Science ◽

Standard Error ◽

Goodness Of Fit ◽

Science Research ◽

Data Sets ◽

Variance Explained

In political science research these days, the R2 is out of fashion. A chorus of our best methodologists sounds notes of caution, at varying degrees of pitch. Berry and Feldman (1985, 15) remark in their popular regression monograph: “A researcher should be careful to recognize the limitations of R2 as a measure of goodness of fit.” In their more general statistics text, Hanushek and Jackson (1977, 59) claim that “one must be extremely cautious in interpreting the R2 value for an estimation and particularly in comparing R2 values for models that have been estimated with different data sets.” Perhaps the most pointed attack comes from Achen (1982, 61), who argues that the R2 “measures nothing of serious importance.” His contention is that it should be abandoned, and the standard error of the regression (SEE) substituted as a goodness-of-fit measure. Developing these lines of inquiry further, King (1986) provides the latest set of criticisms. Accordingly, “In most practical political science situations, it makes little sense to use [the R2]” (King 1986, 669). And, concerning the “proportion of variance explained” definition more particularly, “it is not clear how this interpretation adds meaning to political analyses.” (King 1986, 678).

Download Full-text

Markov Chain Monte Carlo Methods for State-Space Models with Point Process Observations

Neural Computation ◽

10.1162/neco_a_00281 ◽

2012 ◽

Vol 24 (6) ◽

pp. 1462-1486 ◽

Cited By ~ 10

Author(s):

Ke Yuan ◽

Mark Girolami ◽

Mahesan Niranjan

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chain Monte Carlo ◽

State Space ◽

Point Process ◽

Large Data ◽

State Space Models ◽

Superior Performance ◽

Mcmc Methods ◽

Data Sets

This letter considers how a number of modern Markov chain Monte Carlo (MCMC) methods can be applied for parameter estimation and inference in state-space models with point process observations. We quantified the efficiencies of these MCMC methods on synthetic data, and our results suggest that the Reimannian manifold Hamiltonian Monte Carlo method offers the best performance. We further compared such a method with a previously tested variational Bayes method on two experimental data sets. Results indicate similar performance on the large data sets and superior performance on small ones. The work offers an extensive suite of MCMC algorithms evaluated on an important class of models for physiological signal analysis.

Download Full-text

An exploratory design science study on theory testing using crowdsourcing

10.26686/wgtn.17138813 ◽

2021 ◽

Author(s):

◽

Ijay Ushaka

Keyword(s):

Conceptual Framework ◽

Design Science ◽

Large Data ◽

Science Research ◽

Design Science Research ◽

Theory Testing ◽

Data Sets ◽

Decision Tool ◽

Research Activities ◽

Principles Of Design

<p>Theory in Information Systems (IS) is very important to the development of the field. Theory building, and theory testing seeks to accumulate knowledge about the relationships between people and technology. Testing theory can be difficult to accomplish, especially when it involves humans, a diversity of methods and sources, multiple experiments, large data sets, and careful tuning of conditions and instruments. Crowdsourcing is a strategy supporting the distribution of activities to crowd workers, which suggests that it may be used to support theory testing. This exploratory study seeks to analyse the adoption of crowdsourcing in theory testing, and to develop guidance for researchers to instantiate the strategy in their research projects. The study adopts the design science research paradigm to explore incorporating the crowdsourcing strategy in theory testing, and to evaluate its viability and utility. According to the principles of design science research, the study is structured around the construction of several interconnected IS artefacts: 1) a conceptual framework articulating the main principles of theory testing; 2) a pattern model of theory testing, which codifies existing research approaches to theory testing; and 3) a decision tool, which codifies guidelines for researchers making decisions on which research activities to crowdsource. In order to build the conceptual framework and pattern model, the study conducts a systematic review of theory testing in the IS domain. Both the conceptual framework and pattern model are then operationalized in the decision tool. The utility of the various artefacts is then assessed with the participation of research practitioners. This study is relevant because it synthesizes knowledge about theory testing, builds innovative artefacts supporting the adoption of crowdsourcing in theory testing, helps academic researchers understanding the theory testing process, and enables them to adopt crowdsourcing for theory testing.</p>

Download Full-text

Social Science Data Analysis

Advances in Data Mining and Database Management - Ethical Data Mining Applications for Socio-Economic Development ◽

10.4018/978-1-4666-4078-8.ch007 ◽

2013 ◽

pp. 131-147

Author(s):

Anthony Scime ◽

Gregg R. Murray

Keyword(s):

Data Mining ◽

Social Science ◽

Large Data ◽

Data Sets ◽

Health And Wellness ◽

Science Data ◽

Social Scientists ◽

The Social ◽

Social Scientific ◽

Behavioral Predictions

Social scientists address some of the most pressing issues of society such as health and wellness, government processes and citizen reactions, individual and collective knowledge, working conditions and socio-economic processes, and societal peace and violence. In an effort to understand these and many other consequential issues, social scientists invest substantial resources to collect large quantities of data, much of which are not fully explored. This chapter proffers the argument that privacy protection and responsible use are not the only ethical considerations related to data mining social data. Given (1) the substantial resources allocated and (2) the leverage these “big data” give on such weighty issues, this chapter suggests social scientists are ethically obligated to conduct comprehensive analysis of their data. Data mining techniques provide pertinent tools that are valuable for identifying attributes in large data sets that may be useful for addressing important issues in the social sciences. By using these comprehensive analytical processes, a researcher may discover a set of attributes that is useful for making behavioral predictions, validating social science theories, and creating rules for understanding behavior in social domains. Taken together, these attributes and values often present previously unknown knowledge that may have important applied and theoretical consequences for a domain, social scientific or otherwise. This chapter concludes with examples of important social problems studied using various data mining methodologies including ethical concerns.

Download Full-text

Separating the Wheat from the Chaff: Applications of Automated Document Classification Using Support Vector Machines

Political Analysis ◽

10.1093/pan/mpt030 ◽

2014 ◽

Vol 22 (2) ◽

pp. 224-242 ◽

Cited By ~ 31

Author(s):

Vito D'Orazio ◽

Steven T. Landis ◽

Glenn Palmer ◽

Philip Schrodt

Keyword(s):

Data Collection ◽

Political Science ◽

Science Research ◽

Document Classification ◽

Support Vector ◽

Data Sets ◽

Holistic Review ◽

Militarized Interstate Dispute ◽

Research Initiatives ◽

Automated Document Classification

Due in large part to the proliferation of digitized text, much of it available for little or no cost from the Internet, political science research has experienced a substantial increase in the number of data sets and large-nresearch initiatives. As the ability to collect detailed information on events of interest expands, so does the need to efficiently sort through the volumes of available information. Automated document classification presents a particularly attractive methodology for accomplishing this task. It is efficient, widely applicable to a variety of data collection efforts, and considerably flexible in tailoring its application for specific research needs. This article offers a holistic review of the application of automated document classification for data collection in political science research by discussing the process in its entirety. We argue that the application of a two-stage support vector machine (SVM) classification process offers advantages over other well-known alternatives, due to the nature of SVMs being a discriminative classifier and having the ability to effectively address two primary attributes of textual data: high dimensionality and extreme sparseness. Evidence for this claim is presented through a discussion of the efficiency gains derived from using automated document classification on the Militarized Interstate Dispute 4 (MID4) data collection project.

Download Full-text

A Tool to Explore Discrete-Time Data: The Time Series Response Analyser

International Journal of Sport Nutrition and Exercise Metabolism ◽

10.1123/ijsnem.2020-0150 ◽

2020 ◽

Vol 30 (5) ◽

pp. 374-381 ◽

Cited By ~ 2

Author(s):

Benjamin J. Narang ◽

Greg Atkinson ◽

Javier T. Gonzalez ◽

James A. Betts

Keyword(s):

Time Series ◽

Discrete Time ◽

Time Series Data ◽

Large Data ◽

Science Research ◽

Series Data ◽

Data Sets ◽

Time Data ◽

Short Commentary ◽

Computational Errors

The analysis of time series data is common in nutrition and metabolism research for quantifying the physiological responses to various stimuli. The reduction of many data from a time series into a summary statistic(s) can help quantify and communicate the overall response in a more straightforward way and in line with a specific hypothesis. Nevertheless, many summary statistics have been selected by various researchers, and some approaches are still complex. The time-intensive nature of such calculations can be a burden for especially large data sets and may, therefore, introduce computational errors, which are difficult to recognize and correct. In this short commentary, the authors introduce a newly developed tool that automates many of the processes commonly used by researchers for discrete time series analysis, with particular emphasis on how the tool may be implemented within nutrition and exercise science research.

Download Full-text

An exploratory design science study on theory testing using crowdsourcing

10.26686/wgtn.17138813.v1 ◽

2021 ◽

Author(s):

◽

Ijay Ushaka

Keyword(s):

Conceptual Framework ◽

Design Science ◽

Large Data ◽

Science Research ◽

Design Science Research ◽

Theory Testing ◽

Data Sets ◽

Decision Tool ◽

Research Activities ◽

Principles Of Design

<p>Theory in Information Systems (IS) is very important to the development of the field. Theory building, and theory testing seeks to accumulate knowledge about the relationships between people and technology. Testing theory can be difficult to accomplish, especially when it involves humans, a diversity of methods and sources, multiple experiments, large data sets, and careful tuning of conditions and instruments. Crowdsourcing is a strategy supporting the distribution of activities to crowd workers, which suggests that it may be used to support theory testing. This exploratory study seeks to analyse the adoption of crowdsourcing in theory testing, and to develop guidance for researchers to instantiate the strategy in their research projects. The study adopts the design science research paradigm to explore incorporating the crowdsourcing strategy in theory testing, and to evaluate its viability and utility. According to the principles of design science research, the study is structured around the construction of several interconnected IS artefacts: 1) a conceptual framework articulating the main principles of theory testing; 2) a pattern model of theory testing, which codifies existing research approaches to theory testing; and 3) a decision tool, which codifies guidelines for researchers making decisions on which research activities to crowdsource. In order to build the conceptual framework and pattern model, the study conducts a systematic review of theory testing in the IS domain. Both the conceptual framework and pattern model are then operationalized in the decision tool. The utility of the various artefacts is then assessed with the participation of research practitioners. This study is relevant because it synthesizes knowledge about theory testing, builds innovative artefacts supporting the adoption of crowdsourcing in theory testing, helps academic researchers understanding the theory testing process, and enables them to adopt crowdsourcing for theory testing.</p>

Download Full-text

Bayesian inference for LS-SVMs on large data sets using the Nystrom method

Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290) ◽

10.1109/ijcnn.2002.1007588 ◽

2003 ◽

Cited By ~ 4

Author(s):

T. Van Gestel ◽

J.A.K. Suykens ◽

B. De Moor ◽

J. Vandewalle

Keyword(s):

Bayesian Inference ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Nyström Method ◽

Nystrom Method

Download Full-text

An example of spectrum imaging used for comparison of EELS quantitative analysis techniques on Al-Li

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010008794x ◽

1991 ◽

Vol 49 ◽

pp. 726-727

Author(s):

John A. Hunt

Keyword(s):

Quantitative Analysis ◽

Large Data ◽

Difference Spectrum ◽

Large Data Sets ◽

Foil Thickness ◽

Data Sets ◽

Analysis Techniques ◽

Spectrum Imaging ◽

Normal Spectrum ◽

Electron Energy Loss

Spectrum-imaging is a useful technique for comparing different processing methods on very large data sets which are identical for each method. This paper is concerned with comparing methods of electron energy-loss spectroscopy (EELS) quantitative analysis on the Al-Li system. The spectrum-image analyzed here was obtained from an Al-10at%Li foil aged to produce δ' precipitates that can span the foil thickness. Two 1024 channel EELS spectra offset in energy by 1 eV were recorded and stored at each pixel in the 80x80 spectrum-image (25 Mbytes). An energy range of 39-89eV (20 channels/eV) are represented. During processing the spectra are either subtracted to create an artifact corrected difference spectrum, or the energy offset is numerically removed and the spectra are added to create a normal spectrum. The spectrum-images are processed into 2D floating-point images using methods and software described in [1].

Download Full-text