A Two-Stage Joint Model for Nonlinear Longitudinal Response and a Time-to-Event with Application in Transplantation Studies

In transplantation studies, often longitudinal measurements are collected for important markers prior to the actual transplantation. Using only the last available measurement as a baseline covariate in a survival model for the time to graft failure discards the whole longitudinal evolution. We propose a two-stage approach to handle this type of data sets using all available information. At the first stage, we summarize the longitudinal information with nonlinear mixed-effects model, and at the second stage, we include the Empirical Bayes estimates of the subject-specific parameters as predictors in the Cox model for the time to allograft failure. To take into account that the estimated subject-specific parameters are included in the model, we use a Monte Carlo approach and sample from the posterior distribution of the random effects given the observed data. Our proposal is exemplified on a study of the impact of renal resistance evolution on the graft survival.

Download Full-text

Robust Inferences from a Before-and-After Study with Multiple Unaffected Control Groups

Journal of Causal Inference ◽

10.1515/jci-2012-0010 ◽

2013 ◽

Vol 1 (2) ◽

pp. 209-234 ◽

Cited By ~ 1

Author(s):

Pengyuan Wang ◽

Mikhail Traskin ◽

Dylan S. Small

Keyword(s):

Sensitivity Analysis ◽

Control Groups ◽

Two Stage ◽

Time Effects ◽

Unaffected Control ◽

Group Time ◽

Before And After ◽

Group Variation ◽

The Impact ◽

Before And After Study

AbstractThe before-and-after study with multiple unaffected control groups is widely applied to study treatment effects. The current methods usually assume that the control groups’ differences between the before and after periods, i.e. the group time effects, follow a normal distribution. However, there is usually no strong a priori evidence for the normality assumption, and there are not enough control groups to check the assumption. We propose to use a flexible skew-t distribution family to model group time effects, and consider a range of plausible skew-t distributions. Based on the skew-t distribution assumption, we propose a robust-t method to guarantee nominal significance level under a wide range of skew-t distributions, and hence make the inference robust to misspecification of the distribution of group time effects. We also propose a two-stage approach, which has lower power compared to the robust-t method, but provides an opportunity to conduct sensitivity analysis. Hence, the overall method of analysis is to use the robust-t method to test for the overall hypothesized range of shapes of group variation; if the test fails to reject, use the two-stage method to conduct a sensitivity analysis to see if there is a subset of group variation parameters for which we can be confident that there is a treatment effect. We apply the proposed methods to two datasets. One dataset is from the Current Population Survey (CPS) to study the impact of the Mariel Boatlift on Miami unemployment rates between 1979 and 1982.The other dataset contains the student enrollment and grade repeating data in West Germany in the 1960s with which we study the impact of the short school year in 1966–1967 on grade repeating rates.

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

Survey Data Quality in Analyzing Harmonized Indicators of Protest Behavior: A Survey Data Recycling Approach

American Behavioral Scientist ◽

10.1177/00027642211021623 ◽

2021 ◽

pp. 000276422110216

Author(s):

Kazimierz M. Slomczynski ◽

Irina Tomescu-Dubrow ◽

Ilona Wysmulek

Keyword(s):

Data Processing ◽

Data Quality ◽

Survey Data ◽

A Priori ◽

Data Sets ◽

New Approach ◽

Survey Quality ◽

Survey Error ◽

Ex Post ◽

The Impact

This article proposes a new approach to analyze protest participation measured in surveys of uneven quality. Because single international survey projects cover only a fraction of the world’s nations in specific periods, researchers increasingly turn to ex-post harmonization of different survey data sets not a priori designed as comparable. However, very few scholars systematically examine the impact of the survey data quality on substantive results. We argue that the variation in source data, especially deviations from standards of survey documentation, data processing, and computer files—proposed by methodologists of Total Survey Error, Survey Quality Monitoring, and Fitness for Intended Use—is important for analyzing protest behavior. In particular, we apply the Survey Data Recycling framework to investigate the extent to which indicators of attending demonstrations and signing petitions in 1,184 national survey projects are associated with measures of data quality, controlling for variability in the questionnaire items. We demonstrate that the null hypothesis of no impact of measures of survey quality on indicators of protest participation must be rejected. Measures of survey documentation, data processing, and computer records, taken together, explain over 5% of the intersurvey variance in the proportions of the populations attending demonstrations or signing petitions.

Download Full-text

The impact of two-stage discharging on the exergoeconomic performance of a storage-type domestic water-heater

Energy ◽

10.1016/j.energy.2015.02.031 ◽

2015 ◽

Vol 83 ◽

pp. 379-386 ◽

Cited By ~ 1

Author(s):

U. Atikol ◽

L.B.Y. Aldabbagh

Keyword(s):

Water Heater ◽

Domestic Water ◽

Two Stage ◽

The Impact ◽

Storage Type

Download Full-text

Human Performance Evaluation of a Metaphor Graphic Display for Respiratory Data

Methods of Information in Medicine ◽

10.1055/s-0038-1635042 ◽

1994 ◽

Vol 33 (04) ◽

pp. 390-396 ◽

Cited By ~ 30

Author(s):

J. G. Stewart ◽

W. G. Cole

Keyword(s):

Performance Evaluation ◽

Human Performance ◽

Pattern Detection ◽

Data Sets ◽

Complex Task ◽

Graphic Display ◽

Literal Sense ◽

Single Number ◽

The Impact ◽

Human Problem

Abstract:Metaphor graphics are data displays designed to look like corresponding variables in the real world, but in a non-literal sense of “look like”. Evaluation of the impact of these graphics on human problem solving has twice been carried out, but with conflicting results. The present experiment attempted to clarify the discrepancies between these findings by using a complex task in which expert subjects interpreted respiratory data. The metaphor graphic display led to interpretations twice as fast as a tabular (flowsheet) format, suggesting that conflict between earlier studies is due either to differences in training or to differences in goodness of metaphor, Findings to date indicate that metaphor graphics work with complex as well as simple data sets, pattern detection as well as single number reporting tasks, and with expert as well as novice subjects.

Download Full-text

The Midlatitude Continental Convective Clouds Experiment (MC3E) sounding network: operations, processing and analysis

Atmospheric Measurement Techniques ◽

10.5194/amt-8-421-2015 ◽

2015 ◽

Vol 8 (1) ◽

pp. 421-434 ◽

Cited By ~ 18

Author(s):

M. P. Jensen ◽

T. Toto ◽

D. Troyan ◽

P. E. Ciesielski ◽

D. Holdridge ◽

...

Keyword(s):

Large Scale ◽

Scale Model ◽

Data Sets ◽

Central Plains ◽

Data Set ◽

Convective Systems ◽

Convective Clouds ◽

Quality Checks ◽

Network Operations ◽

The Impact

Abstract. The Midlatitude Continental Convective Clouds Experiment (MC3E) took place during the spring of 2011 centered in north-central Oklahoma, USA. The main goal of this field campaign was to capture the dynamical and microphysical characteristics of precipitating convective systems in the US Central Plains. A major component of the campaign was a six-site radiosonde array designed to capture the large-scale variability of the atmospheric state with the intent of deriving model forcing data sets. Over the course of the 46-day MC3E campaign, a total of 1362 radiosondes were launched from the enhanced sonde network. This manuscript provides details on the instrumentation used as part of the sounding array, the data processing activities including quality checks and humidity bias corrections and an analysis of the impacts of bias correction and algorithm assumptions on the determination of convective levels and indices. It is found that corrections for known radiosonde humidity biases and assumptions regarding the characteristics of the surface convective parcel result in significant differences in the derived values of convective levels and indices in many soundings. In addition, the impact of including the humidity corrections and quality controls on the thermodynamic profiles that are used in the derivation of a large-scale model forcing data set are investigated. The results show a significant impact on the derived large-scale vertical velocity field illustrating the importance of addressing these humidity biases.

Download Full-text

Considerations of the Scale of Radiocarbon Offsets in the East Mediterranean, and Considering a Case for the Latest (Most Recent) Likely Date for the Santorini Eruption

Radiocarbon ◽

10.1017/s0033822200047202 ◽

2012 ◽

Vol 54 (3-4) ◽

pp. 449-474 ◽

Cited By ~ 13

Author(s):

Sturt W Manning ◽

Bernd Kromer

Keyword(s):

Time Horizon ◽

Weighted Average ◽

17Th Century ◽

Data Sets ◽

East Mediterranean ◽

Average Value ◽

Accuracy And Precision ◽

Wide Range ◽

The Impact ◽

East Mediterranean Region

The debate over the dating of the Santorini (Thera) volcanic eruption has seen sustained efforts to criticize or challenge the radiocarbon dating of this time horizon. We consider some of the relevant areas of possible movement in the14C dating—and, in particular, any plausible mechanisms to support as late (most recent) a date as possible. First, we report and analyze data investigating the scale of apparent possible14C offsets (growing season related) in the Aegean-Anatolia-east Mediterranean region (excluding the southern Levant and especially pre-modern, pre-dam Egypt, which is a distinct case), and find no evidence for more than very small possible offsets from several cases. This topic is thus not an explanation for current differences in dating in the Aegean and at best provides only a few years of latitude. Second, we consider some aspects of the accuracy and precision of14C dating with respect to the Santorini case. While the existing data appear robust, we nonetheless speculate that examination of the frequency distribution of the14C data on short-lived samples from the volcanic destruction level at Akrotiri on Santorini (Thera) may indicate that the average value of the overall data sets is not necessarily the most appropriate14C age to use for dating this time horizon. We note the recent paper of Soter (2011), which suggests that in such a volcanic context some (small) age increment may be possible from diffuse CO2emissions (the effect is hypothetical at this stage and hasnotbeen observed in the field), and that "if short-lived samples from the same stratigraphic horizon yield a wide range of14C ages, the lower values may be the least altered by old CO2." In this context, it might be argued that a substantive “low” grouping of14C ages observable within the overall14C data sets on short-lived samples from the Thera volcanic destruction level centered about 3326–3328 BP is perhaps more representative of the contemporary atmospheric14C age (without any volcanic CO2contamination). This is a subjective argument (since, in statistical terms, the existing studies using the weighted average remain valid) that looks to support as late a date as reasonable from the14C data. The impact of employing this revised14C age is discussed. In general, a late 17th century BC date range is found (to remain) to be most likelyeven ifsuch a late-dating strategy is followed—a late 17th century BC date range is thus a robust finding from the14C evidence even allowing for various possible variation factors. However, the possibility of a mid-16th century BC date (within ∼1593–1530 cal BC) is increased when compared against previous analyses if the Santorini data are considered in isolation.

Download Full-text

The Impact of Multiple Recurrences in Disease-Free Survival of Breast Cancer: An Extended Cox Model

Tumori Journal ◽

10.1177/030089161209800405 ◽

2012 ◽

Vol 98 (4) ◽

pp. 428-433 ◽

Cited By ~ 3

Author(s):

Mahmood Reza Gohari ◽

Reza Khodabakhshi ◽

Javad Shahidi ◽

Zeinab Moghadami Fard ◽

Hossein Foadzi ◽

...

Keyword(s):

Breast Cancer ◽

Cox Model ◽

Disease Free Survival ◽

Free Survival ◽

Multiple Recurrences ◽

The Impact ◽

Disease Free ◽

Extended Cox Model

Download Full-text

Modeling Mine Workforce Fatigue: Finding Leading Indicators of Fatigue in Operational Data Sets

Minerals ◽

10.3390/min11060621 ◽

2021 ◽

Vol 11 (6) ◽

pp. 621

Author(s):

Elaheh Talebi ◽

W. Pratt Rogers ◽

Tyler Morgan ◽

Frank A. Drews

Keyword(s):

Health And Safety ◽

Time Of Day ◽

Weather Data ◽

Data Sets ◽

Leading Indicators ◽

Safety Control ◽

Model Complex ◽

Fatigue Monitoring ◽

The Impact ◽

Operational Data

Mine workers operate heavy equipment while experiencing varying psychological and physiological impacts caused by fatigue. These impacts vary in scope and severity across operators and unique mine operations. Previous studies show the impact of fatigue on individuals, raising substantial concerns about the safety of operation. Unfortunately, while data exist to illustrate the risks, the mechanisms and complex pattern of contributors to fatigue are not understood sufficiently, illustrating the need for new methods to model and manage the severity of fatigue’s impact on performance and safety. Modern technology and computational intelligence can provide tools to improve practitioners’ understanding of workforce fatigue. Many mines have invested in fatigue monitoring technology (PERCLOS, EEG caps, etc.) as a part of their health and safety control system. Unfortunately, these systems provide “lagging indicators” of fatigue and, in many instances, only provide fatigue alerts too late in the worker fatigue cycle. Thus, the following question arises: can other operational technology systems provide leading indicators that managers and front-line supervisors can use to help their operators to cope with fatigue levels? This paper explores common data sets available at most modern mines and how these operational data sets can be used to model fatigue. The available data sets include operational, health and safety, equipment health, fatigue monitoring and weather data. A machine learning (ML) algorithm is presented as a tool to process and model complex issues such as fatigue. Thus, ML is used in this study to identify potential leading indicators that can help management to make better decisions. Initial findings confirm existing knowledge tying fatigue to time of day and hours worked. These are the first generation of models and future models will be forthcoming.

Download Full-text

Effect of female executives on Chinese cultural media enterprise efficiency

Gender in Management An International Journal ◽

10.1108/gm-07-2020-0232 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Ying Li ◽

Yung-Ho Chiu ◽

Tai-Yu Lin ◽

Hongyi Cen

Keyword(s):

Boards Of Directors ◽

Company Performance ◽

Tobit Regression ◽

Female Executives ◽

Two Stage ◽

Content Type ◽

Female Directors ◽

The Impact ◽

Cultural Media ◽

Better Than

Purpose As more women are now being appointed to senior and top management positions and invited to sit on boards of directors, they are now directly participating in strategic company decision-making. As female directors have been found to provide new ideas, increase company competitiveness, efficiency and performance and bring a greater number of external resources to a company than male directors, this paper aims to put female directors as a variable into the data envelopment analysis (DEA) and statistical models to explore the effect of female directors on operating performances. The DEA first quantified and measured the company efficiencies, after which the statistical model analyzed the correlations between the variables to specifically identify the impact of female decision makers on the operating efficiencies in state-owned and private enterprises. Design/methodology/approach A novel two-stage, meta-hybrid dynamic DEA was developed to explore Chinese cultural media company efficiencies under optimal input and output resource allocations, after which Tobit Regression was applied to determine the effect of female executives on these efficiencies. Findings From 2012 to 2016, the overall efficiencies in Chinese state-owned cultural media enterprises were better than in the private cultural media enterprises. The overall technology gaps (TGs) in the state-owned cultural media enterprises were better than in the private cultural media enterprises. Originality/value Previous research has tended to focus on the causal relationships between female senior executives and business performances; however, there have been few studies on the relationships between female executives and company performance from an efficiency perspective (optimal resource allocation). This paper, therefore, is the first to develop a novel two-stage, meta-hybrid dynamic DEA to examine Chinese cultural media enterprise efficiencies, and the first to apply Tobit Regression to assess the effect of female executives on those efficiencies.

Download Full-text