Generating Alternative Item Types Using Auxiliary Information

Author(s):  
Mark J. Gierl ◽  
Hollis Lai ◽  
Vasily Tanygin
Methodology ◽  
2015 ◽  
Vol 11 (3) ◽  
pp. 89-99 ◽  
Author(s):  
Leslie Rutkowski ◽  
Yan Zhou

Abstract. Given a consistent interest in comparing achievement across sub-populations in international assessments such as TIMSS, PIRLS, and PISA, it is critical that sub-population achievement is estimated reliably and with sufficient precision. As such, we systematically examine the limitations to current estimation methods used by these programs. Using a simulation study along with empirical results from the 2007 cycle of TIMSS, we show that a combination of missing and misclassified data in the conditioning model induces biases in sub-population achievement estimates, the magnitude and degree to which can be readily explained by data quality. Importantly, estimated biases in sub-population achievement are limited to the conditioning variable with poor-quality data while other sub-population achievement estimates are unaffected. Findings are generally in line with theory on missing and error-prone covariates. The current research adds to a small body of literature that has noted some of the limitations to sub-population estimation.


Author(s):  
Wanlu Zhang ◽  
Qigang Wang ◽  
Mei Li

Background: As artificial intelligence and big data analysis develop rapidly, data privacy, especially patient medical data privacy, is getting more and more attention. Objective: To strengthen the protection of private data while ensuring the model training process, this article introduces a multi-Blockchain-based decentralized collaborative machine learning training method for medical image analysis. In this way, researchers from different medical institutions are able to collaborate to train models without exchanging sensitive patient data. Method: Partial parameter update method is applied to prevent indirect privacy leakage during model propagation. With the peer-to-peer communication in the multi-Blockchain system, a machine learning task can leverage auxiliary information from another similar task in another Blockchain. In addition, after the collaborative training process, personalized models of different medical institutions will be trained. Results: The experimental results show that our method achieves similar performance with the centralized model-training method by collecting data sets of all participants and prevents private data leakage at the same time. Transferring auxiliary information from similar task on another Blockchain has also been proven to effectively accelerate model convergence and improve model accuracy, especially in the scenario of absence of data. Personalization training process further improves model performance. Conclusion: Our approach can effectively help researchers from different organizations to achieve collaborative training without disclosing their private data.


Author(s):  
Zhixian Liu ◽  
Qingfeng Chen ◽  
Wei Lan ◽  
Jiahai Liang ◽  
Yiping Pheobe Chen ◽  
...  

: Traditional network-based computational methods have shown good results in drug analysis and prediction. However, these methods are time consuming and lack universality, and it is difficult to exploit the auxiliary information of nodes and edges. Network embedding provides a promising way for alleviating the above problems by transforming network into a low-dimensional space while preserving network structure and auxiliary information. This thus facilitates the application of machine learning algorithms for subsequent processing. Network embedding has been introduced into drug analysis and prediction in the last few years, and has shown superior performance over traditional methods. However, there is no systematic review of this issue. This article offers a comprehensive survey of the primary network embedding methods and their applications in drug analysis and prediction. The network embedding technologies applied in homogeneous network and heterogeneous network are investigated and compared, including matrix decomposition, random walk, and deep learning. Especially, the Graph neural network (GNN) methods in deep learning are highlighted. Further, the applications of network embedding in drug similarity estimation, drug-target interaction prediction, adverse drug reactions prediction, protein function and therapeutic peptides prediction are discussed. Several future potential research directions are also discussed.


Author(s):  
Roberto Benedetti ◽  
Maria Michela Dickson ◽  
Giuseppe Espa ◽  
Francesco Pantalone ◽  
Federica Piersimoni

AbstractBalanced sampling is a random method for sample selection, the use of which is preferable when auxiliary information is available for all units of a population. However, implementing balanced sampling can be a challenging task, and this is due in part to the computational efforts required and the necessity to respect balancing constraints and inclusion probabilities. In the present paper, a new algorithm for selecting balanced samples is proposed. This method is inspired by simulated annealing algorithms, as a balanced sample selection can be interpreted as an optimization problem. A set of simulation experiments and an example using real data shows the efficiency and the accuracy of the proposed algorithm.


2021 ◽  
Vol 26 ◽  
pp. 1-67
Author(s):  
Patrick Dinklage ◽  
Jonas Ellert ◽  
Johannes Fischer ◽  
Florian Kurpicz ◽  
Marvin Löbel

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.


2020 ◽  
Vol 36 (4) ◽  
pp. 955-961
Author(s):  
Rizky Zulkarnain ◽  
Dwi Jayanti ◽  
Tri Listianingrum

The increasing needs for more disaggregated data motivates National Statistical Offices (NSOs) to develop efficient methods for producing official statistics without compromising on quality. In Indonesia, regional autonomy requires that Sustainable Development Goals (SDGs) indicators are available up to the district level. However, several surveys such as the Indonesian Demographic and Health Survey produce estimates up to the provincial level only. This generates gaps in support for district level policies. Small area estimation (SAE) techniques are often considered as alternatives for overcoming this issue. SAE enables more reliable estimation of the small areas by utilizing auxiliary information from other sources. However, the standard SAE approach has limitations in estimating non-sampled areas. This paper introduces an approach to estimating the non-sampled area random effect by utilizing cluster information. This model is demonstrated via the estimation of contraception prevalence rates at district levels in North Sumatera province. The results showed that small area estimates considering cluster information (SAE-cluster) produce more precise estimates than the direct method. The SAE-cluster approach revises the direct estimates upward or downward. This approach has important implications for improving the quality of disaggregated SDGs indicators without increasing cost. The paper was prepared under the kind mentorship of Professor James J. Cochran, Associate Dean for Research, Prof. of Statistics and Operations Research, University of Alabama.


Sign in / Sign up

Export Citation Format

Share Document