Leveraging auxiliary information on marginal distributions in nonignorable models for item and unit nonresponse

Abstract In the presence of nonresponse, unadjusted estimators are vulnerable to nonresponse bias when the characteristics of the respondents differ from those of the nonrespondents. To reduce the bias, it is common practice to postulate a nonresponse model linking the response indicators and a set of fully observed variables. Estimated response probabilities are obtained by fitting the selected model, which are then used to adjust the base weights. The resulting estimator, referred to as the propensity score-adjusted estimator, is consistent provided the nonresponse model is correctly specified. In this article, we propose a weighting procedure that may improve the efficiency of propensity score estimators for survey variables identified as key variables by making a more extensive use of the auxiliary information available at the nonresponse treatment stage. Results from a simulation study suggest that the proposed procedure performs well in terms of efficiency when the data are missing at random and also achieves an efficient bias reduction when the data are not missing at random. We further apply our proposed methods to 2017–2018 National Health Nutrition and Examination Survey.

Sequentially additive nonignorable missing data modelling using auxiliary marginal information

Biometrika ◽

10.1093/biomet/asz054 ◽

2019 ◽

Vol 106 (4) ◽

pp. 889-911

Author(s):

Mauricio Sadinle ◽

Jerome P Reiter

Keyword(s):

Missing Data ◽

Auxiliary Information ◽

Categorical Variables ◽

Item Nonresponse ◽

Data Modelling ◽

Nonignorable Missing Data ◽

Marginal Distributions ◽

Nonignorable Missingness ◽

Nonignorable Missing ◽

Multivariate Categorical

Summary We study a class of missingness mechanisms, referred to as sequentially additive nonignorable, for modelling multivariate data with item nonresponse. These mechanisms explicitly allow the probability of nonresponse for each variable to depend on the value of that variable, thereby representing nonignorable missingness mechanisms. These missing data models are identified by making use of auxiliary information on marginal distributions, such as marginal probabilities for multivariate categorical variables or moments for numeric variables. We prove identification results and illustrate the use of these mechanisms in an application.

Recalibration Estimation for Unit Nonresponse at the Two Levels Auxiliary Information

Communications for Statistical Applications and Methods ◽

10.5351/ckss.2003.10.3.665 ◽

2003 ◽

Vol 10 (3) ◽

pp. 665-678

Author(s):

Joon Keun Yum ◽

Chang Kyoon Son ◽

Young Mee Jeung

Keyword(s):

Auxiliary Information ◽

Unit Nonresponse

The Impact of Missing and Error-Prone Auxiliary Information on Sparse-Matrix Sub-Population Parameter Estimates

Methodology ◽

10.1027/1614-2241/a000095 ◽

2015 ◽

Vol 11 (3) ◽

pp. 89-99 ◽

Cited By ~ 1

Author(s):

Leslie Rutkowski ◽

Yan Zhou

Keyword(s):

Sparse Matrix ◽

Small Body ◽

Auxiliary Information ◽

Poor Quality ◽

Quality Data ◽

Estimation Methods ◽

Parameter Estimates ◽

Population Parameter ◽

Conditioning Model ◽

The Impact

Abstract. Given a consistent interest in comparing achievement across sub-populations in international assessments such as TIMSS, PIRLS, and PISA, it is critical that sub-population achievement is estimated reliably and with sufficient precision. As such, we systematically examine the limitations to current estimation methods used by these programs. Using a simulation study along with empirical results from the 2007 cycle of TIMSS, we show that a combination of missing and misclassified data in the conditioning model induces biases in sub-population achievement estimates, the magnitude and degree to which can be readily explained by data quality. Importantly, estimated biases in sub-population achievement are limited to the conditioning variable with poor-quality data while other sub-population achievement estimates are unaffected. Findings are generally in line with theory on missing and error-prone covariates. The current research adds to a small body of literature that has noted some of the limitations to sub-population estimation.

GENERALISED SYNTHETIC ESTIMATOR USING DOUBLE SAMPLING SCHEME AND AUXILIARY INFORMATION

Mathematical Journal of Interdisciplinary Sciences ◽

10.15415/mjis.2015.41002 ◽

2015 ◽

Vol 4 (1) ◽

pp. 15-21

Author(s):

Shashi Bahl ◽

Sangeeta .

Keyword(s):

Auxiliary Information ◽

Sampling Scheme ◽

Double Sampling

Privacy-preserving Collaborative Training for Medical Image Analysis Based on Multi-Blockchain

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323666201022110616 ◽

2020 ◽

Vol 23 ◽

Author(s):

Wanlu Zhang ◽

Qigang Wang ◽

Mei Li

Keyword(s):

Medical Image ◽

Data Privacy ◽

Medical Image Analysis ◽

Auxiliary Information ◽

Training Process ◽

Private Data ◽

Medical Institutions ◽

Model Training ◽

Collaborative Training ◽

Similar Task

Background: As artificial intelligence and big data analysis develop rapidly, data privacy, especially patient medical data privacy, is getting more and more attention. Objective: To strengthen the protection of private data while ensuring the model training process, this article introduces a multi-Blockchain-based decentralized collaborative machine learning training method for medical image analysis. In this way, researchers from different medical institutions are able to collaborate to train models without exchanging sensitive patient data. Method: Partial parameter update method is applied to prevent indirect privacy leakage during model propagation. With the peer-to-peer communication in the multi-Blockchain system, a machine learning task can leverage auxiliary information from another similar task in another Blockchain. In addition, after the collaborative training process, personalized models of different medical institutions will be trained. Results: The experimental results show that our method achieves similar performance with the centralized model-training method by collecting data sets of all participants and prevents private data leakage at the same time. Transferring auxiliary information from similar task on another Blockchain has also been proven to effectively accelerate model convergence and improve model accuracy, especially in the scenario of absence of data. Personalization training process further improves model performance. Conclusion: Our approach can effectively help researchers from different organizations to achieve collaborative training without disclosing their private data.

A Survey of Network Embedding for Drug Analysis and Prediction

Current Protein and Peptide Science ◽

10.2174/1389203721666200702145701 ◽

2020 ◽

Vol 21 ◽

Author(s):

Zhixian Liu ◽

Qingfeng Chen ◽

Wei Lan ◽

Jiahai Liang ◽

Yiping Pheobe Chen ◽

...

Keyword(s):

Deep Learning ◽

Protein Function ◽

Dimensional Space ◽

Auxiliary Information ◽

Matrix Decomposition ◽

Drug Analysis ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Network Embedding ◽

Similarity Estimation

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.

A simulated annealing-based algorithm for selecting balanced samples

Computational Statistics ◽

10.1007/s00180-021-01113-3 ◽

2021 ◽

Author(s):

Roberto Benedetti ◽

Maria Michela Dickson ◽

Giuseppe Espa ◽

Francesco Pantalone ◽

Federica Piersimoni

Keyword(s):

Simulated Annealing ◽

Optimization Problem ◽

Sample Selection ◽

Auxiliary Information ◽

Real Data ◽

Simulation Experiments ◽

Balanced Sampling ◽

Inclusion Probabilities ◽

Random Method ◽

Annealing Algorithms

AbstractBalanced sampling is a random method for sample selection, the use of which is preferable when auxiliary information is available for all units of a population. However, implementing balanced sampling can be a challenging task, and this is due in part to the computational efforts required and the necessity to respect balancing constraints and inclusion probabilities. In the present paper, a new algorithm for selecting balanced samples is proposed. This method is inspired by simulated annealing algorithms, as a balanced sample selection can be interpreted as an optimization problem. A set of simulation experiments and an example using real data shows the efficiency and the accuracy of the proposed algorithm.

Short-term Traffic Flow Prediction Based on Multi-Auxiliary Information

2020 the 4th International Conference on Big Data Research (ICBDR'20) ◽

10.1145/3445945.3445951 ◽

2020 ◽

Author(s):

Kai Zhang ◽

Buliao Jia ◽

Yuhan Dong

Keyword(s):

Traffic Flow ◽

Auxiliary Information ◽

Short Term ◽

Traffic Flow Prediction ◽

Flow Prediction

Practical Wavelet Tree Construction

Journal of Experimental Algorithmics ◽

10.1145/3457197 ◽

2021 ◽

Vol 26 ◽

pp. 1-67

Author(s):

Patrick Dinklage ◽

Jonas Ellert ◽

Johannes Fischer ◽

Florian Kurpicz ◽

Marvin Löbel

Keyword(s):

Parallel Algorithms ◽

Shared Memory ◽

Distributed Memory ◽

Auxiliary Information ◽

Parallel Computers ◽

External Memory ◽

Sequential Algorithms ◽

Bottom Up ◽

Memory Efficiency ◽

Tree Construction

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.