scholarly journals Cleaning genotype data from Diversity Outbred mice

2019 ◽  
Author(s):  
Karl W. Broman ◽  
Daniel M. Gatti ◽  
Karen L. Svenson ◽  
Śaunak Sen ◽  
Gary A. Churchill

AbstractData cleaning is an important first step in most statistical analyses, including efforts to map the genetic loci that contribute to variation in quantitative traits. Here we illustrate approaches to quality control and cleaning of array-based genotyping data for multiparent populations (experimental crosses derived from more than two founder strains), using MegaMUGA array data from a set of 291 from Diversity Outbred (DO) mice. Our approach employs data visualizations that can reveal problems at the level of individual mice or with individual SNP markers. We find that the proportion of missing genotypes for each mouse is an effective indicator of sample quality. We use microarray probe intensities for SNPs on the X and Y chromosomes to confirm the sex of each mouse, and we use the proportion of matching SNP genotypes between pairs of mice to detect sample duplicates. We use a hidden Markov model (HMM) reconstruction of the founder haplotype mosaic across each mouse genome to estimate the number of crossovers and to identify potential genotyping errors. To evaluate marker quality, we find that missing data and genotyping error rates are the most effective diagnostics. We also examine the SNP genotype frequencies with markers grouped according to their minor allele frequency in the founder strains. For markers with high apparent error rates, a scatterplot of the allele-specific probe intensities can reveal the underlying cause of incorrect genotype calls. The decision to include or exclude low-quality samples can have a significant impact on the mapping results for a given study. We find that the impact of low-quality markers on a given study is often minimal, but reporting problematic markers can improve the utility of the genotyping array across many studies.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 1307.1-1308
Author(s):  
E. Siniauskaya ◽  
T. Kuzhir ◽  
V. Yagur ◽  
R. Goncharova

Background:Rheumatoid arthritis (RA) is a chronic systemic disorder of the connective tissue of still unknown aetiology and complex autoimmune pathogenesis that primarily affects small joints. HLA alleles provide for 11-37% of the RA heritability, suggesting the substantial role of the non-HLA loci in genetic predisposition to RA. Among non-HLA loci,IL6, IL6RandSTAT4genes attract attention, however, the data concerning their influence on RA risk are somewhat contradictory.Objectives:The aim of the study was to analyze the involvement of four SNPs (STAT4rs7574865,IL6rs1800795,IL6Rrs2228145 and rs4845618) in RA susceptibility.Methods:187 patients diagnosed with RA (mean age 58.2 ± 11.9), and 380 healthy blood donors (mean age 37.18 ± 10.69 years) were included into the study. DNA extraction from peripheral blood samples was performed using the phenol-chloroform method. SNPs were genotyped using the real-time PCR with fluorescent probes. The allele and genotype frequencies were compared using the χ2 test. Odds ratios (ORs) and 95% confidence intervals (95% CIs) were calculated using the VassarStats online tool.Results:Utilizing recessive genetic model we found an association between TT genotype ofSTAT4rs7574865 (OR = 2.362; 95%CI [1.0378 – 5.376], p = 0.038) and RA. ForIL6rs1800795, it was found that CC genotype had significantly higher frequency among patients with rheumatoid arthritis as compared to that in controls (OR = 1.52; 95%CI [1.02 – 2.27], p = 0.0456). No associations ofIL6Rrs2228145 and rs4845618 SNPs with risk of RA were found in the total group of patients vs. controls. It was also shown thatIL6rs1800795 CC genotype frequency was significantly higher among the patients with RF-negative status (p = 0.0019).Conclusion:Thus, we provide evidence for association of theSTAT4rs7574865 andIL6rs1800795 variants with risk of RA in the Belarusian population, some features of interplay being revealed between gene polymorphisms analyzed and RA antibody status. Abovementioned SNPs may contribute to RA genetic susceptibility in the Belarusian population.Disclosure of Interests:None declared



Cancers ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 529
Author(s):  
Silvia Selene Moreno-Guerrero ◽  
Arturo Ramírez-Pacheco ◽  
Luz María Rocha-Ramírez ◽  
Gabriela Hernández-Pliego ◽  
Pilar Eguía-Aguilar ◽  
...  

There is evidence that high circulating levels of IL-6 and IL-8 are markers of a poor prognosis in various types of cancer, including NB. The participation of these cytokines in the tumor microenvironment has been described to promote progression and metastasis. Our objective was to evaluate the prognostic role of genetic polymorphisms and serum levels of IL-6 and IL-8 in a cohort of Mexican pediatric patients with NB. The detection of the SNPs rs1800795 IL-6 and rs4073 and rs2227306 IL-8 was carried out by PCR-RFLP and the levels of cytokines were determined by the ELISA method. We found elevated circulating levels of IL-8 and IL-6 in NB patients compared to the control group. The genotype frequencies of the rs1800795 IL-6 and rs4073 IL-8 variants were different between the patients with NB and the control group. Likewise, the survival analysis showed that the GG genotypes of rs1800795 IL-6 (p = 0.014) and AA genotypes of rs4073 IL-8 (p = 0.002), as well as high levels of IL-6 (p = 0.009) and IL-8 (p = 0.046), were associated with lower overall survival. We confirmed the impact on an adverse prognosis in a multivariate model. This study suggests that the SNPs rs1800795 IL-6 and rs4073 IL-8 and their serum levels could be promising biomarkers of a poor prognosis, associated with overall survival, metastasis, and a high risk in Mexican children with NB.



2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Zhengjie Chen ◽  
Dengguo Tang ◽  
Jixing Ni ◽  
Peng Li ◽  
Le Wang ◽  
...  

Abstract Background Maize is one of the most important field crops in the world. Most of the key agronomic traits, including yield traits and plant architecture traits, are quantitative. Fine mapping of genes/ quantitative trait loci (QTL) influencing a key trait is essential for marker-assisted selection (MAS) in maize breeding. However, the SNP markers with high density and high polymorphism are lacking, especially kompetitive allele specific PCR (KASP) SNP markers that can be used for automatic genotyping. To date, a large volume of sequencing data has been produced by the next generation sequencing technology, which provides a good pool of SNP loci for development of SNP markers. In this study, we carried out a multi-step screening method to identify kompetitive allele specific PCR (KASP) SNP markers based on the RNA-Seq data sets of 368 maize inbred lines. Results A total of 2,948,985 SNPs were identified in the high-throughput RNA-Seq data sets with the average density of 1.4 SNP/kb. Of these, 71,311 KASP SNP markers (the average density of 34 KASP SNP/Mb) were developed based on the strict criteria: unique genomic region, bi-allelic, polymorphism information content (PIC) value ≥0.4, and conserved primer sequences, and were mapped on 16,161 genes. These 16,161 genes were annotated to 52 gene ontology (GO) terms, including most of primary and secondary metabolic pathways. Subsequently, the 50 KASP SNP markers with the PIC values ranging from 0.14 to 0.5 in 368 RNA-Seq data sets and with polymorphism between the maize inbred lines 1212 and B73 in in silico analysis were selected to experimentally validate the accuracy and polymorphism of SNPs, resulted in 46 SNPs (92.00%) showed polymorphism between the maize inbred lines 1212 and B73. Moreover, these 46 polymorphic SNPs were utilized to genotype the other 20 maize inbred lines, with all 46 SNPs showing polymorphism in the 20 maize inbred lines, and the PIC value of each SNP was 0.11 to 0.50 with an average of 0.35. The results suggested that the KASP SNP markers developed in this study were accurate and polymorphic. Conclusions These high-density polymorphic KASP SNP markers will be a valuable resource for map-based cloning of QTL/genes and marker-assisted selection in maize. Furthermore, the method used to develop SNP markers in maize can also be applied in other species.



2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Asia Mendelevich ◽  
Svetlana Vinogradova ◽  
Saumya Gupta ◽  
Andrey A. Mironov ◽  
Shamil R. Sunyaev ◽  
...  

AbstractA sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.



Author(s):  
A. Raja ◽  
R. Rajendran ◽  
P. Ganapathi

Background: Many genetic variants of beta-casein in different breeds of cattle have been reported. The A1 and A2 are the most common variants. The breeds of Zebu cattle have high frequency of A2 allele or monomorphic for A2 allele. The current study aimed to screen Indian Zebu cattle breeds, Bargur and Umblachery, for A1 and A2 alleles at beta-casein locus.Methods: A total of 48 Bargur and 42 Umblachery cattle were genotyped for β-casein (CSN2) gene using allele-specific PCR. The gene and genotype frequencies were estimated. The theoretical heterozygosity (Heexp), experimental heterozygosity (Heobs), polymorphism information content (PIC), expected homozygosity (E), effective number of alleles (ENA) and level of possible variability realization (V%) were calculated.Result: The investigation revealed the presence of both A1 and A2 alleles at beta-casein locus in both Bargur and Umblachery cattle breeds. The A1A1 genotype was not observed in both the breeds. The frequencies of A1A2 and A2A2 genotypes were 0.125 and 0.875 respectively in Bargur and 0.050 and 0.950 respectively in Umblachery breed. The study indicated the predominance of A2 variant in both the breeds. The frequencies of A1 and A2 alleles were 0.063 and 0.937 respectively in Bargur and 0.02 and 0.98 respectively in Umblachery breed. The values of experimental heterozygosity (Heobs), theoretical heterozygosity (Heexp), polymorphism information content (PIC), expected homozygosity (E), effective number of alleles (ENA), level of possible variability realization (V%) were 0.125, 0.1163, 0.1095,0.8837, 1.131 and 11.88 respectively in Bargur breed. These values were 0.048, 0.0468, 0.0458, 0.9532, 1.049 and 4.79 respectively in Umblachery population. The observed heterozygosity and PIC values revealed the existence of very low genetic variability in the tested populations. The present work will be a contribution to the study on beta-casein locus in Indian zebu cattle.



2021 ◽  
Vol 39 (3_suppl) ◽  
pp. 118-118
Author(s):  
Jingyuan Wang ◽  
Joshua Millstein ◽  
Fotios Loupakis ◽  
Sebastian Stintzing ◽  
Hiroyuki Arai ◽  
...  

118 Background: Antiangiogenic drug (AAD)-triggered oxygen and nutrient depletion through suppression of angiogenesis switches the glucose-dependent metabolism to lipid-dependent metabolism. Blocking fatty acid oxidation can enhance AAD-mediated anti-tumor effects in colorectal cancer. Previous reports suggested that polymorphisms of the lipid metabolism-related genes are associated with the increased risk of CRC and poor clinical outcome in CRC. Therefore, we hypothesized that genetic variants in the lipid metabolism pathway may predict first-line treatment outcome in mCRC pts. Methods: Genomic DNA from blood samples of pts enrolled in two independent randomized trials, FIRE-3 and MAVERICC, was genotyped through the OncoArray, a customized array manufactured by Illumina including approximately 530K SNP markers. The impact on outcome of 25 selected SNPs in 10 genes involved in the lipid metabolism pathway (CD36, FABP4, LPCAT1, LPCAT2, PPARG, CPT1A, ACSS2, SREBF1, FASN, ACACA) was analyzed. Those treated with FOLFIRI/ bevacizumab (bev) in FIRE-3 (n = 107) and MAVERICC (n = 163) served as discovery and validation cohorts respectively, while FIRE-3 FOLFIRI/ cetuximab (cet) (n = 129) arm was used as the control. Interaction between each SNP and treatment was evaluated in FIRE-3 (FOLFIRI/bev arm vs. FOLFIRI/cet arm). Results: In the discovery (FIRE-3 bev) cohort, pts with FASN rs4485435 any C allele (N = 21) showed significantly shorter progression-free survival (PFS) (8.69 vs 13.48 months) compared to carriers of G/G (N = 62) in both univariate (hazard ratio [HR] = 2.88; 95% confidence interval [CI]: 1.57-5.29; p = 0.00037) and multivariate (HR = 2.87; 95%CI 1.4-5.9; p = 0.00675) analyses. These data were validated in the MAVERICC bev cohort in multivariate analysis (11.17 vs 14.06 months; HR = 2.07; 95%CI: 1.15-3.74; p = 0.02). Pts carrying any T allele in PPARG rs3856806 (N = 36) showed significantly longer overall survival (OS) (Not reached vs 42 months) than carriers of C/C (n = 93) in the FIRE-3 cet cohort in both univariate (HR = 0.4; 95%CI 0.17-0.92; p = 0.03) and multivariate (HR = 0.37; 95%CI 0.15-0.93; p = 0.02) analyses, but the association was not observed in the bev cohort of MAVERICC and FIRE-3. In the comparison of bev arm vs cet arm in FIRE-3, interactions were shown with FASN rs4485435 (p = 0.017) on PFS and PPARG rs3856806 (p = 0.059) on OS. Conclusions: Our study demonstrates for the first time that FASN polymorphism could predict outcomes of bev-based treatment in mCRC patients; Meanwhile PPARG polymorphism could predict outcomes of cet-based treatment in mCRC patients. These findings support a possible role of the lipid metabolism pathway in contributing to resistance to anti-VEGF/EGFR treatment.



SLEEP ◽  
2020 ◽  
Vol 43 (Supplement_1) ◽  
pp. A117-A117
Author(s):  
T J Cunningham ◽  
R M Bottary ◽  
E A Kensinger ◽  
R Stickgold

Abstract Introduction The ability to perceive emotions is a socially-relevant skill critical for healthy interpersonal functioning, while deficits in this ability are associated with psychopathology. Total sleep deprivation (TSD) has been shown to have deleterious effects on emotion perception, yet the extent to which these impairments persist across the day with continued wakefulness, or if brief periods of recovery sleep can restore emotion perception abilities, remains unexplored. Methods Participants viewed slideshows of faces ranging in emotional expression and were asked to categorize (Happy, Sad, Angry, Neutral) and rate the emotional intensity (1-9) of each face at baseline (2100; Session 1), at 0900 (Session 2) following a night of sleep or TSD, and at 1400 (Session 3) following either continued wakefulness (wake group) or a 90-minute nap opportunity (nap group). Results Emotion categorization ability marginally improved from Session 1 to Session 2 following overnight sleep, however, no changes in emotion intensity ratings or vigilance were observed. TSD led to an increase in error rates during vigilance testing [t(46)=2.9, p=0.005] and impairment in emotion categorization ability [t(46)=5.5, p<0.001] from Session 1 to Session 2, although by Session 3 performance levels on both measures returned to baseline for all TSD participants. TSD also led to a decrease in emotional intensity ratings from Session 1 to Session 2, particularly for the highest tertile of emotional faces [6-9; t(46)=6.1, p<0.001]. These ratings remained suppressed at Session 3 in both the wake [t(25)=7.8, p<0.001] and nap [t(18)=3.1, p=0.006] groups. Conclusion These results indicate that time of day effects, with or without any additional benefit of a nap, can restore the impairments in vigilance and emotional categorization caused by TSD. The ability to discriminate levels of emotional intensity, however, is not restored by time of day or napping, suggesting that this ability is more sensitive to the impact of TSD. Support  



2019 ◽  
Vol 49 (2) ◽  
pp. 179-189 ◽  
Author(s):  
François Sarrazin ◽  
Luc LeBel ◽  
Nadia Lehoux

The challenges faced recently by the North American forest products industry have forced it to review many of its key operations. Implementing logistics centers for such a context may therefore help in allocating the wood fibre more efficiently and in reducing sorting and transportation costs. This paper aims to better understand the interaction between a forest logistics center and a complex forest network while exploring the business environment favoring the use of such a structure. A profit maximization model is proposed and applied to a real case in the Mauricie region in Quebec, Canada. A total of 18 groups of scenarios are tested, based on the use of a sort yard and of backhauling. Results show that a logistics center already in operation adds $0.52 in profits for each cubic metre of wood available for harvest (over 2 580 411 m3 per year) for the network under study ($1.4 million annually). A sensitivity analysis also highlights that higher prices and sorting error rates have the greatest impact on the logistics center’s profitability.



2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 51-51
Author(s):  
Sajjad Toghiani ◽  
Ling-Yun Chang ◽  
El H Hay ◽  
Andrew J Roberts ◽  
Samuel E Aggrey ◽  
...  

Abstract The dramatic advancement in genotyping technology has greatly reduced the complexity and cost of genotyping. The continuous increase in the density of marker panels is resulting in little to no improvement in the accuracy of genomic selection. Direct inversion of the genomic relationship matrix is infeasible for some livestock populations due to the excessive computational cost. In addition, most animals in genetic evaluation programs are non-genotyped. Including these animals in a genomic evaluation requires the imputation of the missing genotypes when using regression methods. To overcome these challenges, a hybrid approach is proposed. This approach fits a subset of SNP markers selected based on FST scores and a classical polygenic effect. The method was first tested using only genotyped animals and then extended to accommodate non-genotyped animals. The proposed approach was evaluated using simulated data for a trait with heritability of 0.1 and 0.4 and weaning weight in a crossbred beef cattle population. When all animals were genotyped, the hybrid approach using only 2.5% of prioritized SNPs exceeded the prediction accuracies of BayesB, BayesC, and GBLUP by more than 7%. When non-genotyped animals were incorporated, the proposed approach significantly outperformed ss-GBLUP method in terms of prediction accuracy under both simulated heritability scenarios. Although the results seem to depend on the genetic complexity of the trait, the proposed approach resulted in higher prediction accuracies than current methods. Furthermore, its computational costs in terms of CPU time and peak memory are substantially lower than the current methods.



2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Fuyou Fu ◽  
Xunjia Liu ◽  
Rui Wang ◽  
Chun Zhai ◽  
Gary Peng ◽  
...  

Abstract The fungal pathogen Leptosphaeria maculans causes blackleg disease on canola and rapeseed (Brassica napus) in many parts of the world. A B. napus cultivar, ‘Quinta’, has been widely used for the classification of L. maculans into pathogenicity groups. In this study, we confirmed the presence of Rlm1 in a DH line (DH24288) derived from B. napus cultivar ‘Quinta’. Rlm1 was located on chromosome A07, between 13.07 to 22.11 Mb, using a BC1 population made from crosses of F1 plants of DH16516 (a susceptible line) x DH24288 with bulked segregant RNA Sequencing (BSR-Seq). Rlm1 was further fine mapped in a 100 kb region from 19.92 to 20.03 Mb in the BC1 population consisting of 1247 plants and a F2 population consisting of 3000 plants using SNP markers identified from BSR-Seq through Kompetitive Allele-Specific PCR (KASP). A potential resistance gene, BnA07G27460D, was identified in this Rlm1 region. BnA07G27460D encodes a serine/threonine dual specificity protein kinase, catalytic domain and is homologous to STN7 in predicted genes of B. rapa and B. oleracea, and A. thaliana. Robust SNP markers associated with Rlm1 were developed, which can assist in introgression of Rlm1 and confirm the presence of Rlm1 gene in canola breeding programs.



Sign in / Sign up

Export Citation Format

Share Document