scholarly journals Grouping of genomic markers in populations with family structure

2021 ◽  
Author(s):  
Dörte Wittenburg ◽  
Michael Doschoris ◽  
Jan Klosa

Abstract Background : Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as r 2 may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. Results : Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the calculation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. Conclusions : Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dörte Wittenburg ◽  
Michael Doschoris ◽  
Jan Klosa

Abstract Background Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as $$r^2$$ r 2 may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. Results Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package . This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the derivation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. Conclusions Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.


2020 ◽  
Author(s):  
Dörte Wittenburg ◽  
Michael Doschoris ◽  
Jan Klosa

Abstract Background: Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as r 2 may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. Results: Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the calculation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. Conclusions: Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.


2020 ◽  
Author(s):  
Dörte Wittenburg ◽  
Michael Doschoris ◽  
Jan Klosa

Abstract Background : Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as $r^2$ may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. Methods : he derivation of the covariance between markers in a population consisting of half- or full-sib families is described: it requires a genetic map and haplotype information of the common parent(s). A strategy, available in the literature, for grouping of markers based on a measure of population LD has been adapted to account for the dependence between markers if family stratification is present. Groups are built depending on the strength of association between markers; largest groups are built at first. As an option, a representative marker is selected for each group. Results : We provide an implementation of the theoretical covariance between biallelic markers for half- or full-sib families and the calculation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. Conclusions : We offer an R~package for calculating the pairwise dependence between markers when family stratification is present. Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Furthermore, grouping of genomic markers according to their interdependence is available. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach.


Sign in / Sign up

Export Citation Format

Share Document