scholarly journals Limitations of lymphoblastoid cell lines for establishing genetic reference datasets in the immunoglobulin loci

2021 ◽  
Author(s):  
Oscar L Rodriguez ◽  
Andrew J Sharp ◽  
Corey T Watson

Lymphoblastoid cell lines (LCLs) have been critical to establishing genetic resources for biomedical science. They have been used extensively to study human genetic diversity, genome function, and inform the development of tools and methodologies for augmenting disease genetics research. While the validity of variant callsets from LCLs has been demonstrated for most of the genome, previous work has shown that DNA extracted from LCLs is modified by V(D)J recombination within the immunoglobulin (IG) loci, regions that harbor antibody genes critical to immune system function. However, the impacts of V(D)J on data generated from LCLs has not been extensively investigated. In this study, we used LCL-derived short read sequencing data from the 1000 Genomes Project (n=2,504) to identify signatures of V(D)J recombination. Our analyses revealed sample-level impacts of V(D)J recombination that varied depending on the degree of inferred monoclonality. We showed that V(D)J associated somatic deletions impacted genotyping accuracy, leading to adulterated population-level estimates of allele frequency and linkage disequilibrium. These findings illuminate limitations of using LCLs for building genetic resources in the IG loci, with implications for interpreting previous disease association studies in these regions.

PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261374
Author(s):  
Oscar L. Rodriguez ◽  
Andrew J. Sharp ◽  
Corey T. Watson

Lymphoblastoid cell lines (LCLs) have been critical to establishing genetic resources for biomedical science. They have been used extensively to study human genetic diversity, genome function, and inform the development of tools and methodologies for augmenting disease genetics research. While the validity of variant callsets from LCLs has been demonstrated for most of the genome, previous work has shown that DNA extracted from LCLs is modified by V(D)J recombination within the immunoglobulin (IG) loci, regions that harbor antibody genes critical to immune system function. However, the impacts of V(D)J on short read sequencing data generated from LCLs has not been extensively investigated. In this study, we used LCL-derived short read sequencing data from the 1000 Genomes Project (n = 2,504) to identify signatures of V(D)J recombination. Our analyses revealed sample-level impacts of V(D)J recombination that varied depending on the degree of inferred monoclonality. We showed that V(D)J associated somatic deletions impacted genotyping accuracy, leading to adulterated population-level estimates of allele frequency and linkage disequilibrium. These findings illuminate limitations of using LCLs and short read data for building genetic resources in the IG loci, with implications for interpreting previous disease association studies in these regions.


2018 ◽  
Author(s):  
Inken Wohlers ◽  
Colin Schulz ◽  
Fabian Kilpert ◽  
Lars Bertram

AbstractThe role of microRNAs (miRNAs) in the pathogenesis of Alzheimer’s disease (AD) is currently extensively investigated. In this study, we assessed the potential impact of AD genetic risk variants on miRNA expression by performing large-scale bioinformatic data integration. Our analysis was based on genetic variants from three AD genome-wide association studies (GWAS). Association with miRNA expression was tested by expression quantitative trait loci (eQTL) analysis using next-generation miRNA sequencing data generated in lymphoblastoid cell lines (LCL). While, overall, we did not identify a strong effect of AD GWAS variants on miRNA expression in this cell type we highlight two notable outliers, i.e. miR-29c-5p and miR-6840-5p. MiR-29c-5p was recently reported to be involved in the regulation of BACE1 and SORL1 expression. In conclusion, despite two exceptions our large-scale assessment provides only limited support for the hypothesis that AD GWAS variants act as miRNA eQTLs.


2020 ◽  
Author(s):  
Christine H. O’Connor ◽  
Yinjie Qiu ◽  
Rafael Della Coletta ◽  
Jonathan S. Renk ◽  
Patrick J. Monnahan ◽  
...  

ABSTRACTIntact transposable elements (TEs) account for 65% of the maize genome and can impact gene function and regulation. Although TEs comprise the majority of the maize genome and affect important phenotypes, genome wide patterns of TE polymorphisms in maize have only been studied in a handful of maize genotypes, due to the challenging nature of assessing highly repetitive sequences. We implemented a method to use short read sequencing data from 509 diverse inbred lines to classify the presence/absence of 494,564 non-redundant TEs that were previously annotated in four genome assemblies including B73, Mo17, PH207, and W22. Different orders of TEs (i.e. LTRs, Helitrons, TIRs) had different frequency distributions within the population. Older LTRs were generally more frequent in the population than younger LTRs, though high frequency very young TEs were observed. Age and frequency estimates of nested elements and the outer elements in which they insert revealed that most nesting events occurred very near the timing of the outer element insertion. TEs within genes were at higher frequency than those that were outside of genes and this is particularly true for those not inserted into introns. Many TE insertional polymorphisms observed in this population were not tagged by SNP markers and therefore not captured in previous SNP based marker-trait association studies. This study provides a population scale genome-wide assessment of TE variation in maize, and provides valuable insight on variation in TEs in maize and factors that contribute to this variation.


2012 ◽  
Vol 22 (3) ◽  
pp. 189-196 ◽  
Author(s):  
Sung-Mi Shim ◽  
Hye-Young Nam ◽  
Jae-Eun Lee ◽  
Jun-Woo Kim ◽  
Bok-Ghee Han ◽  
...  

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Danqing Xu ◽  
Chen Wang ◽  
Atlas Khan ◽  
Ning Shang ◽  
Zihuai He ◽  
...  

AbstractLabeling clinical data from electronic health records (EHR) in health systems requires extensive knowledge of human expert, and painstaking review by clinicians. Furthermore, existing phenotyping algorithms are not uniformly applied across large datasets and can suffer from inconsistencies in case definitions across different algorithms. We describe here quantitative disease risk scores based on almost unsupervised methods that require minimal input from clinicians, can be applied to large datasets, and alleviate some of the main weaknesses of existing phenotyping algorithms. We show applications to phenotypic data on approximately 100,000 individuals in eMERGE, and focus on several complex diseases, including Chronic Kidney Disease, Coronary Artery Disease, Type 2 Diabetes, Heart Failure, and a few others. We demonstrate that relative to existing approaches, the proposed methods have higher prediction accuracy, can better identify phenotypic features relevant to the disease under consideration, can perform better at clinical risk stratification, and can identify undiagnosed cases based on phenotypic features available in the EHR. Using genetic data from the eMERGE-seq panel that includes sequencing data for 109 genes on 21,363 individuals from multiple ethnicities, we also show how the new quantitative disease risk scores help improve the power of genetic association studies relative to the standard use of disease phenotypes. The results demonstrate the effectiveness of quantitative disease risk scores derived from rich phenotypic EHR databases to provide a more meaningful characterization of clinical risk for diseases of interest beyond the prevalent binary (case-control) classification.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jiawei Zhou ◽  
Shuo Zhang ◽  
Jie Wang ◽  
Hongmei Shen ◽  
Bin Ai ◽  
...  

AbstractThe chloroplast is one of two organelles containing a separate genome that codes for essential and distinct cellular functions such as photosynthesis. Given the importance of chloroplasts in plant metabolism, the genomic architecture and gene content have been strongly conserved through long periods of time and as such are useful molecular tools for evolutionary inferences. At present, complete chloroplast genomes from over 4000 species have been deposited into publicly accessible databases. Despite the large number of complete chloroplast genomes, comprehensive analyses regarding genome architecture and gene content have not been conducted for many lineages with complete species sampling. In this study, we employed the genus Populus to assess how more comprehensively sampled chloroplast genome analyses can be used in understanding chloroplast evolution in a broadly studied lineage of angiosperms. We conducted comparative analyses across Populus in order to elucidate variation in key genome features such as genome size, gene number, gene content, repeat type and number, SSR (Simple Sequence Repeat) abundance, and boundary positioning between the four main units of the genome. We found that some genome annotations were variable across the genus owing in part from errors in assembly or data checking and from this provided corrected annotations. We also employed complete chloroplast genomes for phylogenetic analyses including the dating of divergence times throughout the genus. Lastly, we utilized re-sequencing data to describe the variations of pan-chloroplast genomes at the population level for P. euphratica. The analyses used in this paper provide a blueprint for the types of analyses that can be conducted with publicly available chloroplast genomes as well as methods for building upon existing datasets to improve evolutionary inference.


Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 615
Author(s):  
Achala Fernando ◽  
Chamikara Liyanage ◽  
Afshin Moradi ◽  
Panchadsaram Janaththani ◽  
Jyotsna Batra

Alternative splicing (AS) is tightly regulated to maintain genomic stability in humans. However, tumor growth, metastasis and therapy resistance benefit from aberrant RNA splicing. Iroquois-class homeodomain protein 4 (IRX4) is a TALE homeobox transcription factor which has been implicated in prostate cancer (PCa) as a tumor suppressor through genome-wide association studies (GWAS) and functional follow-up studies. In the current study, we characterized 12 IRX4 transcripts in PCa cell lines, including seven novel transcripts by RT-PCR and sequencing. They demonstrate unique expression profiles between androgen-responsive and nonresponsive cell lines. These transcripts were significantly overexpressed in PCa cell lines and the cancer genome atlas program (TCGA) PCa clinical specimens, suggesting their probable involvement in PCa progression. Moreover, a PCa risk-associated SNP rs12653946 genotype GG was corelated with lower IRX4 transcript levels. Using mass spectrometry analysis, we identified two IRX4 protein isoforms (54.4 kDa, 57 kDa) comprising all the functional domains and two novel isoforms (40 kDa, 8.7 kDa) lacking functional domains. These IRX4 isoforms might induce distinct functional programming that could contribute to PCa hallmarks, thus providing novel insights into diagnostic, prognostic and therapeutic significance in PCa management.


Sign in / Sign up

Export Citation Format

Share Document