Data Generation Process Modeling for Activity Recognition

This article examines what scholars can learn about civilian killings from newswire data in situations of non-random missingness. It contributes to this understanding by offering a unique view of the data-generation process in the South Sudanese civil war. Drawing on 40 hours of interviews with 32 human rights advocates, humanitarian workers, and journalists who produce ACLED and UCDP-GED’s source data, the article illustrates how non-random missingness leads to biases of inconsistent magnitude and direction. The article finds that newswire data for contexts like South Sudan suffer from a self-fulfilling narrative bias, where journalists select stories and human rights investigators target incidents that conform to international views of what a conflict is about. This is compounded by the way agencies allocate resources to monitor specific locations and types of violence to fit strategic priorities. These biases have two implications: first, in the most volatile conflicts, point estimates about violence using newswire data may be impossible, and most claims of precision may be false; secondly, body counts reveal little if divorced from circumstance. The article presents a challenge to political methodologists by asking whether social scientists can build better cross-national fatality measures given the biases inherent in the data-generation process.

Download Full-text

Towards Characterization of the Data Generation Process

Innovative Applications in Data Mining - Studies in Computational Intelligence ◽

10.1007/978-3-540-88045-5_5 ◽

2009 ◽

pp. 83-105

Author(s):

Vasudha Bhatnagar ◽

Sarabjeet Kochhar

Keyword(s):

Generation Process ◽

Data Generation ◽

Data Generation Process

Download Full-text

Beyond Mining: Characterizing the Data Generation Process

Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007) ◽

10.1109/isda.2007.140 ◽

2007 ◽

Cited By ~ 1

Author(s):

Vasufha Bhatnagar ◽

Sarabjeet Kochhar

Keyword(s):

Generation Process ◽

Data Generation ◽

Data Generation Process

Download Full-text

Beyond Mining: Characterizing the Data Generation Process

Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007) ◽

10.1109/isda.2007.4389656 ◽

2007 ◽

Author(s):

Vasufha Bhatnagar ◽

Sarabjeet Kochhar

Keyword(s):

Generation Process ◽

Data Generation ◽

Data Generation Process

Download Full-text

A survey of methodologies on causal inference methods in meta-analyses of randomized controlled trials

Systematic Reviews ◽

10.1186/s13643-021-01726-1 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Georgios Markozannes ◽

Georgia Vourli ◽

Evangelia Ntzani

Keyword(s):

Causal Inference ◽

Randomized Controlled Trials ◽

Meta Analysis ◽

Controlled Trials ◽

Generation Process ◽

Data Generation ◽

Randomized Controlled ◽

Data Generation Process ◽

Meta Analyses ◽

Inference Methods

Abstract Background Meta-analyses of randomized controlled trials (RCTs) have been considered as the highest level of evidence in the pyramid of the evidence-based medicine. However, the causal interpretation of such results is seldom studied. Methods We systematically searched for methodologies pertaining to the implementation of a causally explicit framework for meta-analysis of randomized controlled trials and discussed the interpretation and scientific relevance of such causal estimands. We performed a systematic search in four databases to identify relevant methodologies, supplemented with hand-search. We included methodologies that described causality under counterfactuals and potential outcomes framework. Results We only identified three efforts explicitly describing a causal framework on meta-analysis of RCTs. Two approaches required individual participant data, while for the last one, only summary data were required. All three approaches presented a sufficient framework under which a meta-analytical estimate is identifiable and estimable. However, several conceptual limitations remain, mainly in regard to the data generation process under which the selected RCTs rise. Conclusions We undertook a review of methodologies on causal inference methods in meta-analyses. Although all identified methodologies provide valid causal estimates, there are limitations in the assumptions regarding the data generation process and sampling of the potential RCTs to be included in the meta-analysis which pose challenges to the interpretation and scientific relevance of the identified causal effects. Despite both causal inference and meta-analysis being extensively studied in the literature, limited effort exists of combining those two frameworks.

Download Full-text

A living journals approach for the remote study of young children’s digital practices in Azerbaijan

Global Studies of Childhood ◽

10.1177/20436106211034179 ◽

2021 ◽

pp. 204361062110341

Author(s):

Sabina Savadova

Keyword(s):

Digital Media ◽

Digital Technologies ◽

Smartphone Application ◽

Generation Process ◽

Data Generation ◽

Data Generation Process ◽

Personal Nature ◽

Mothers And Children ◽

Rich Data ◽

Empirical Contribution

This article proposes the living journals method for remotely studying participants, elevating participant agency in the data generation process and minimising or completely removing the need for a researcher to be physically present in the field. Employing this method, the paper describes how the method was used to explore 5-year-old children’s digital practices in five families in Azerbaijan. Mothers were assigned as ‘proxy’ researchers to generate the data following prompts sent through a smartphone application. Mothers’ answers were used to create journals, and subsequently, fathers separately, and mothers and children together were requested to interpret their own journals and those of other participant children. Allowing other families to comment on one another’s journals further revealed their attitudes towards using digital technologies and enriched the data, emphasising its multivocality and metatextuality. The article describes the living journals method in detail, highlighting its affordances for researchers to generate data from a distance in other contexts. The article also discusses the methodological and empirical contribution of the method to this study about young children’s engagements with digital media at home. By decentring the researcher in the data generation process, the method allows researchers to generate both visually and textually complex and rich data. The visual and personal nature of the method goes beyond text-based research accounts to bring the data to life, allowing the researcher to generate multimodal, multivocal, metatextual and multifunctional data.

Download Full-text

PORTUGUESE STOCK MARKET: A LONG-MEMORY PROCESS?

Verslas teorija ir praktika ◽

10.3846/btp.2011.08 ◽

2011 ◽

Vol 12 (1) ◽

pp. 75-84 ◽

Cited By ~ 2

Author(s):

Sameer Rege ◽

Samuel Gil Martín

Keyword(s):

Stock Market ◽

Stock Prices ◽

Long Memory ◽

Hurst Exponent ◽

Linear Trend ◽

Fluctuation Analysis ◽

Generation Process ◽

Data Generation ◽

Data Generation Process ◽

Daily Returns

This paper gives a basic overview of the various attempts at modelling stochastic processes for stock markets with a specific application to the Portuguese stock market data. Long-memory dependence in the stock prices would completely alter the data generation process and econometric models not considering the long-range dependence would exhibit poor forecasting abilities. The Hurst exponent is used to identify the presence of long-memory or fractal behaviour of the data generation process for the daily returns to ascertain if the process follows a fractional brownian motion. Detrended fluctuation analysis (DFA) using linear and quadratic trends and the Geweke Porter-Hudak methods are applied to detect the presence of long-memory or persistence. We find that the daily returns exhibit a small amount of long memory and that the quadratic trend used in the DFA overestimates the value of the Hurst exponent. These findings are corroborated by the use of the Geweke Porter-Hudak method wherein the Hurst exponent is close to the DFA using the linear trend.

Download Full-text

AN ATTEMPT TO AUTOMATION OF TLS DATA ACQUISITION AND PROCESSING IN THE VECTOR DATA GENERATION PROCESS

16th International Multidisciplinary Scientific GeoConference SGEM2016, Informatics, Geoinformatics and Remote Sensing ◽

10.5593/sgem2016/b22/s10.103 ◽

2016 ◽

Author(s):

Anna Adamek

Keyword(s):

Data Acquisition ◽

Generation Process ◽

Data Generation ◽

Vector Data ◽

Data Generation Process ◽

Data Acquisition And Processing

Download Full-text

Doing Research With Children: Making Choices on Ethics and Methodology That Encourage Children’s Participation

Journal of Childhood Studies ◽

10.18357/jcs442201919059 ◽

2019 ◽

pp. 39-50 ◽

Cited By ~ 1

Author(s):

Krystallia Kyritsi

Keyword(s):

Informed Consent ◽

Research Process ◽

Generation Process ◽

Data Generation ◽

Cultural Artifacts ◽

Methodological Choices ◽

Interview Process ◽

Data Generation Process ◽

Research With Children ◽

Making Choices

The aim of this paper is to discuss examples of ethical and methodological choices that respect children’s rights to participation by encouraging them to be actively involved in the data generation process. The paper introduces the boxes, a model for confidentially obtaining ongoing and informed consent. It also discusses the use of cultural artifacts, chosen by the children themselves, to communicate with the researcher during the interview process. This paper concludes by emphasizing the need to design and cocreate open, flexible approaches in research that encourage children to obtain control and ownership of the research process.

Download Full-text

Human Activity Recognition: A Dynamic Inductive Bias Selection Perspective

Sensors ◽

10.3390/s21217278 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7278

Author(s):

Massinissa Hamidi ◽

Aomar Osmani

Keyword(s):

Activity Recognition ◽

Sensor Data ◽

Transmission Protocol ◽

Generation Process ◽

Data Generation ◽

Stochastic Effects ◽

Substantial Impact ◽

Trade Offs ◽

Bias Selection

In this article, we study activity recognition in the context of sensor-rich environments. In these environments, many different constraints arise at various levels during the data generation process, such as the intrinsic characteristics of the sensing devices, their energy and computational constraints, and their collective (collaborative) dimension. These constraints have a fundamental impact on the final activity recognition models as the quality of the data, its availability, and its reliability, among other things, are not ensured during model deployment in real-world configurations. Current approaches for activity recognition rely on the activity recognition chain which defines several steps that the sensed data undergo: This is an inductive process that involves exploring a hypothesis space to find a theory able to explain the observations. For activity recognition to be effective and robust, this inductive process must consider the constraints at all levels and model them explicitly. Whether it is a bias related to sensor measurement, transmission protocol, sensor deployment topology, heterogeneity, dynamicity, or stochastic effects, it is essential to understand their substantial impact on the quality of the data and ultimately on activity recognition models. This study highlights the need to exhibit the different types of biases arising in real situations so that machine learning models, e.g., can adapt to the dynamicity of these environments, resist sensor failures, and follow the evolution of the sensors’ topology. We propose a metamodeling approach in which these biases are specified as hyperparameters that can control the structure of the activity recognition models. Via these hyperparameters, it becomes easier to optimize the inductive processes, reason about them, and incorporate additional knowledge. It also provides a principled strategy to adapt the models to the evolutions of the environment. We illustrate our approach on the SHL dataset, which features motion sensor data for a set of human activities collected in real conditions. The obtained results make a case for the proposed metamodeling approach; noticeably, the robustness gains achieved when the deployed models are confronted with the evolution of the initial sensing configurations. The trade-offs exhibited and the broader implications of the proposed approach are discussed with alternative techniques to encode and incorporate knowledge into activity recognition models.

Download Full-text