target data
Recently Published Documents


TOTAL DOCUMENTS

270
(FIVE YEARS 128)

H-INDEX

13
(FIVE YEARS 6)

Author(s):  
Aibo Guo ◽  
Xinyi Li ◽  
Ning Pang ◽  
Xiang Zhao

Community Q&A forum is a special type of social media that provides a platform to raise questions and to answer them (both by forum participants), to facilitate online information sharing. Currently, community Q&A forums in professional domains have attracted a large number of users by offering professional knowledge. To support information access and save users’ efforts of raising new questions, they usually come with a question retrieval function, which retrieves similar existing questions (and their answers) to a user’s query. However, it can be difficult for community Q&A forums to cover all domains, especially those emerging lately with little labeled data but great discrepancy from existing domains. We refer to this scenario as cross-domain question retrieval. To handle the unique challenges of cross-domain question retrieval, we design a model based on adversarial training, namely, X-QR , which consists of two modules—a domain discriminator and a sentence matcher. The domain discriminator aims at aligning the source and target data distributions and unifying the feature space by domain-adversarial training. With the assistance of the domain discriminator, the sentence matcher is able to learn domain-consistent knowledge for the final matching prediction. To the best of our knowledge, this work is among the first to investigate the domain adaption problem of sentence matching for community Q&A forums question retrieval. The experiment results suggest that the proposed X-QR model offers better performance than conventional sentence matching methods in accomplishing cross-domain community Q&A tasks.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
James J. Yang ◽  
Xi Luo ◽  
Elisa M. Trucco ◽  
Anne Buu

Abstract Background/aim The polygenic risk score (PRS) shows promise as a potentially effective approach to summarize genetic risk for complex diseases such as alcohol use disorder that is influenced by a combination of multiple variants, each of which has a very small effect. Yet, conventional PRS methods tend to over-adjust confounding factors in the discovery sample and thus have low power to predict the phenotype in the target sample. This study aims to address this important methodological issue. Methods This study proposed a new method to construct PRS by (1) approximating the polygenic model using a few principal components selected based on eigen-correlation in the discovery data; and (2) conducting principal component projection on the target data. Secondary data analysis was conducted on two large scale databases: the Study of Addiction: Genetics and Environment (SAGE; discovery data) and the National Longitudinal Study of Adolescent to Adult Health (Add Health; target data) to compare performance of the conventional and proposed methods. Result and conclusion The results show that the proposed method has higher prediction power and can handle participants from different ancestry backgrounds. We also provide practical recommendations for setting the linkage disequilibrium (LD) and p value thresholds.


2022 ◽  
pp. 108285
Author(s):  
Robin Strickstrock ◽  
Marco Hülsmann ◽  
Dirk Reith ◽  
Karl N. Kirschner

Author(s):  
М. A. Gorbachev ◽  
V. V. Svistov ◽  
E. A. Ulyanova

Based on the determined ground surface clutter spectrum, we analyse the specific features of functioning of the active homing head (AHH) in relation to different types of radiated signals. With respect to AHH functioning, the paper gives recommendations for use of target data acquired against the Earth's background by monopulse synthetic aperture radars.


2021 ◽  
Author(s):  
Ethan Weinberger ◽  
Chris Lin ◽  
Su-In Lee

Single-cell RNA sequencing (scRNA-seq) technologies enable a better understanding of previously unexplored biological diversity. Oftentimes, researchers are specifically interested in modeling the latent structures and variations enriched in one target scRNA-seq dataset as compared to another background dataset generated from sources of variation irrelevant to the task at hand. For example, we may wish to isolate factors of variation only present in measurements from patients with a given disease as opposed to those shared with data from healthy control subjects. Here we introduce Contrastive Variational Inference (contrastiveVI; https://github.com/suinleelab/contrastiveVI), a framework for end-to-end analysis of target scRNA-seq datasets that decomposes the variations into shared and target-specific factors of variation. On three target-background dataset pairs we demonstrate that contrastiveVI learns latent representations that recover known subgroups of target data points better than previous methods and finds differentially expressed genes that agree with known ground truths.


2021 ◽  
Author(s):  
Shing Wan Choi ◽  
Timothy Shin Heng Mak ◽  
Clive J. Hoggart ◽  
Paul F. O'Reilly

Background: Polygenic risk score (PRS) analyses are now routinely applied in biomedical research, with great hope that they will aid in our understanding of disease aetiology and contribute to personalized medicine. The continued growth of multi-cohort genome-wide association studies (GWASs) and large-scale biobank projects has provided researchers with a wealth of GWAS summary statistics and individual-level data suitable for performing PRS analyses. However, as the size of these studies increase, the risk of inter-cohort sample overlap and close relatedness increases. Ideally sample overlap would be identified and removed directly, but this is typically not possible due to privacy laws or consent agreements. This sample overlap, whether known or not, is a major problem in PRS analyses because it can lead to inflation of type 1 error and, thus, erroneous conclusions in published work. Results: Here, for the first time, we report the scale of the sample overlap problem for PRS analyses by generating known sample overlap across sub-samples of the UK Biobank data, which we then use to produce GWAS and target data to mimic the effects of inter-cohort sample overlap. We demonstrate that inter-cohort overlap results in a significant and often substantial inflation in the observed PRS-trait association, coefficient of determination (R2) and false-positive rate. This inflation can be high even when the absolute number of overlapping individuals is small if this makes up a notable fraction of the target sample. We develop and introduce EraSOR (Erase Sample Overlap and Relatedness), a software for adjusting inflation in PRS prediction and association statistics in the presence of sample overlap or close relatedness between the GWAS and target samples. A key component of the EraSOR approach is inference of the degree of sample overlap from the intercept of a bivariate LD score regression applied to the GWAS and target data, making it powered in settings where both have sample sizes over 1,000 individuals. Through extensive benchmarking using UK Biobank and HapGen2 simulated genotype-phenotype data, we demonstrate that PRSs calculated using EraSOR-adjusted GWAS summary statistics are robust to inter-cohort overlap in a wide range of realistic scenarios and are even robust to high levels of residual genetic and environmental stratification. Conclusion: The results of all PRS analyses for which sample overlap cannot be definitively ruled out should be considered with caution given high type 1 error observed in the presence of even low overlap between base and target cohorts. Given the strong performance of EraSOR in eliminating inflation caused by sample overlap in PRS studies with large (>5k) target samples, we recommend that EraSOR be used in all future such PRS studies to mitigate the potential effects of inter-cohort overlap and close relatedness.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8258
Author(s):  
Seokwon Lee ◽  
Inmo Ban ◽  
Myeongjin Lee ◽  
Yunho Jung ◽  
Wookyung Lee

This paper explores novel architectures for fast backprojection based video synthetic aperture radar (BP-VISAR) with multiple GPUs. The video SAR frame rate is analyzed for non-overlapped and overlapped aperture modes. For the parallelization of the backprojection process, a processing data unit is defined as the phase history data or range profile data from partial synthetic-apertures divided from the full resolution target data. Considering whether full-aperture processing is performed and range compression or backprojection are parallelized on a GPU basis, we propose six distinct architectures, each having a single-stream pipeline with a single GPU. The performance of these architectures is evaluated in both non-overlapped and overlapped modes. The efficiency of the BP-VISAR architecture with sub-aperture processing in the overlapped mode is accelerated further by filling the processing gap from the idling GPU resources with multi-stream based backprojection on multiple GPUs. The frame rate of the proposed BP-VISAR architecture with sub-aperture processing is scalable with the number of GPU devices for large pixel resolution. It can generate 4096 × 4096 video SAR frames of 0.5 m cross-range resolution in 23.0 Hz on a single GPU and 73.5 Hz on quad GPUs.


Author(s):  
Ali Ozdagli ◽  
Xenofon Koutsoukos

In the last decade, the interest in machine learning (ML) has grown significantly within the structural health monitoring (SHM) community. Traditional supervised ML approaches for detecting faults assume that the training and test data come from similar distributions. However, real-world applications, where an ML model is trained, for example, on numerical simulation data and tested on experimental data, are deemed to fail in detecting the damage. The deterioration in the prediction performance is mainly related to the fact that the numerical and experimental data are collected under different conditions and they do not share the same underlying features. This paper proposes a domain adaptation approach for ML-based damage detection and localization problems where the classifier has access to the labeled training (source) and unlabeled test (target) data, but the source and target domains are statistically different. The proposed domain adaptation method seeks to form a feature space that is capable of representing both source and target domains by implementing a domain-adversarial neural network. This neural network uses H-divergence criteria to minimize the discrepancy between the source and target domain in a latent feature space. To evaluate the performance, we present two case studies where we design a neural network model for classifying the health condition of a variety of systems. The effectiveness of the domain adaptation is shown by computing the classification accuracy of the unlabeled target data with and without domain adaptation. Furthermore, the performance gain of the domain adaptation over a well-known transfer knowledge approach called Transfer Component Analysis is also demonstrated. Overall, the results demonstrate that the domain adaption is a valid approach for damage detection applications where access to labeled experimental data is limited.


2021 ◽  
Author(s):  
◽  
Muhammad Ghifary

<p>Machine learning has achieved great successes in the area of computer vision, especially in object recognition or classification. One of the core factors of the successes is the availability of massive labeled image or video data for training, collected manually by human. Labeling source training data, however, can be expensive and time consuming. Furthermore, a large amount of labeled source data may not always guarantee traditional machine learning techniques to generalize well; there is a potential bias or mismatch in the data, i.e., the training data do not represent the target environment.  To mitigate the above dataset bias/mismatch, one can consider domain adaptation: utilizing labeled training data and unlabeled target data to develop a well-performing classifier on the target environment. In some cases, however, the unlabeled target data are nonexistent, but multiple labeled sources of data exist. Such situations can be addressed by domain generalization: using multiple source training sets to produce a classifier that generalizes on the unseen target domain. Although several domain adaptation and generalization approaches have been proposed, the domain mismatch in object recognition remains a challenging, open problem – the model performance has yet reached to a satisfactory level in real world applications.  The overall goal of this thesis is to progress towards solving dataset bias in visual object recognition through representation learning in the context of domain adaptation and domain generalization. Representation learning is concerned with finding proper data representations or features via learning rather than via engineering by human experts. This thesis proposes several representation learning solutions based on deep learning and kernel methods.  This thesis introduces a robust-to-noise deep neural network for handwritten digit classification trained on “clean” images only, which we name Deep Hybrid Network (DHN). DHNs are based on a particular combination of sparse autoencoders and restricted Boltzmann machines. The results show that DHN performs better than the standard deep neural network in recognizing digits with Gaussian and impulse noise, block and border occlusions.  This thesis proposes the Domain Adaptive Neural Network (DaNN), a neural network based domain adaptation algorithm that minimizes the classification error and the domain discrepancy between the source and target data representations. The experiments show the competitiveness of DaNN against several state-of-the-art methods on a benchmark object dataset.  This thesis develops the Multi-task Autoencoder (MTAE), a domain generalization algorithm based on autoencoders trained via multi-task learning. MTAE learns to transform the original image into its analogs in multiple related domains simultaneously. The results show that the MTAE’s representations provide better classification performance than some alternative autoencoder-based models as well as the current state-of-the-art domain generalization algorithms.  This thesis proposes a fast kernel-based representation learning algorithm for both domain adaptation and domain generalization, Scatter Component Analysis (SCA). SCA finds a data representation that trades between maximizing the separability of classes, minimizing the mismatch between domains, and maximizing the separability of the whole data points. The results show that SCA performs much faster than some competitive algorithms, while providing state-of-the-art accuracy in both domain adaptation and domain generalization.  Finally, this thesis presents the Deep Reconstruction-Classification Network (DRCN), a deep convolutional network for domain adaptation. DRCN learns to classify labeled source data and also to reconstruct unlabeled target data via a shared encoding representation. The results show that DRCN provides competitive or better performance than the prior state-of-the-art model on several cross-domain object datasets.</p>


Sign in / Sign up

Export Citation Format

Share Document