scholarly journals A Tool to Build Up-To-Date Gene Annotations for Affymetrix Microarrays

2017 ◽  
Vol 3 (2) ◽  
pp. 38 ◽  
Author(s):  
Vladislava Milchevskaya ◽  
Grischa Tödt ◽  
Toby James Gibson

Genome-wide expression profiling and genotyping is widely applied in functional genomics research, ranging from stem cell studies to cancer, in drug response studies, and in clinical diagnostics. The Affymetrix GeneChip microarrays represent the most popular platform for such assays. Nevertheless, due to rapid and continuous improvement of the knowledge about the genome, the definition of many of the genes and transcripts change, and new genes are discovered. Thus the original probe information is out-dated for a number of Affymetrix platforms, and needs to be re-defined. It has been demonstrated, that accurate probe set definition improves both coverage of the gene expression analysis and its statistical power. Therefore we developed a method that incorporates the most recent genome annotations into the annotation of the microarray probe sets, using tools from the next generation sequencing. Additionally our method allows to quickly build project specific gene annotation models, as well as for comparison of microarray to RNAseq data.

PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260709
Author(s):  
Shaimaa Mahmoud Ahmed ◽  
Alsamman Mahmoud Alsamman ◽  
Abdulqader Jighly ◽  
Mohamed Hassan Mubarak ◽  
Khaled Al-Shamaa ◽  
...  

Soil salinity is significant abiotic stress that severely limits global crop production. Chickpea (Cicer arietinum L.) is an important grain legume that plays a substantial role in nutritional food security, especially in the developing world. This study used a chickpea population collected from the International Center for Agricultural Research in the Dry Area (ICARDA) genebank using the focused identification of germplasm strategy. The germplasm included 186 genotypes with broad Asian and African origins and genotyped with 1856 DArTseq markers. We conducted phenotyping for salinity in the field (Arish, Sinai, Egypt) and greenhouse hydroponic experiments at 100 mM NaCl concentration. Based on the performance in both hydroponic and field experiments, we identified seven genotypes from Azerbaijan and Pakistan (IGs: 70782, 70430, 70764, 117703, 6057, 8447, and 70249) as potential sources for high salinity tolerance. Multi-trait genome-wide association analysis (mtGWAS) detected one locus on chromosome Ca4 at 10618070 bp associated with salinity tolerance under hydroponic and field conditions. In addition, we located another locus specific to the hydroponic system on chromosome Ca2 at 30537619 bp. Gene annotation analysis revealed the location of rs5825813 within the Embryogenesis-associated protein (EMB8-like), while the location of rs5825939 is within the Ribosomal Protein Large P0 (RPLP0). Utilizing such markers in practical breeding programs can effectively improve the adaptability of current chickpea cultivars in saline soil. Moreover, researchers can use our markers to facilitate the incorporation of new genes into commercial cultivars.


2020 ◽  
Author(s):  
Kyoko Watanabe ◽  
Philip R. Jansen ◽  
Jeanne E. Savage ◽  
Priyanka Nandakumar ◽  
Xin Wang ◽  
...  

AbstractInsomnia is a heritable, highly prevalent sleep disorder, for which no sufficient treatment currently exists. Previous genome-wide association studies (GWASs) with up to 1.3 million subjects identified over 200 associated loci. This extreme polygenicity suggested many more loci to be discovered. The current study almost doubled the sample size to over 2.3 million individuals thereby increasing statistical power. We identified 554 risk loci (confirming 190 previously associated loci and detecting 364 novel), and capitalizing on this large number of loci, we propose a novel strategy to prioritize genes using external biological resources and information on functional interactions between genes across risk loci. Of all 3,898 genes naively implicated from the risk loci, we prioritize 289. For these, we find brain-tissue expression specificity and enrichment in specific gene-sets of synaptic signaling functions and neuronal differentiation. We show that the novel gene prioritization strategy yields specific hypotheses on causal mechanisms underlying insomnia, which would not fully have been detected using traditional approaches.


Marketing ZFP ◽  
2019 ◽  
Vol 41 (4) ◽  
pp. 21-32
Author(s):  
Dirk Temme ◽  
Sarah Jensen

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tejaswi Iyyanki ◽  
Baozhen Zhang ◽  
Qixuan Wang ◽  
Ye Hou ◽  
Qiushi Jin ◽  
...  

Abstract Muscle-invasive bladder cancers are characterized by their distinct expression of luminal and basal genes, which could be used to predict key clinical features such as disease progression and overall survival. Transcriptionally, FOXA1, GATA3, and PPARG are shown to be essential for luminal subtype-specific gene regulation and subtype switching, while TP63, STAT3, and TFAP2 family members are critical for regulation of basal subtype-specific genes. Despite these advances, the underlying epigenetic mechanisms and 3D chromatin architecture responsible for subtype-specific regulation in bladder cancer remain unknown. Result We determine the genome-wide transcriptome, enhancer landscape, and transcription factor binding profiles of FOXA1 and GATA3 in luminal and basal subtypes of bladder cancer. Furthermore, we report the first-ever mapping of genome-wide chromatin interactions by Hi-C in both bladder cancer cell lines and primary patient tumors. We show that subtype-specific transcription is accompanied by specific open chromatin and epigenomic marks, at least partially driven by distinct transcription factor binding at distal enhancers of luminal and basal bladder cancers. Finally, we identify a novel clinically relevant transcription factor, Neuronal PAS Domain Protein 2 (NPAS2), in luminal bladder cancers that regulates other subtype-specific genes and influences cancer cell proliferation and migration. Conclusion In summary, our work identifies unique epigenomic signatures and 3D genome structures in luminal and basal urinary bladder cancers and suggests a novel link between the circadian transcription factor NPAS2 and a clinical bladder cancer subtype.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Michael F. Z. Wang ◽  
Madhav Mantri ◽  
Shao-Pei Chou ◽  
Gaetano J. Scuderi ◽  
David W. McKellar ◽  
...  

AbstractConventional scRNA-seq expression analyses rely on the availability of a high quality genome annotation. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, genome annotations are often incomplete, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant transcriptional activity beyond the scope of the best available genome annotation by performing scRNA-seq analysis on any region in the genome for which transcriptional products are detected. Our tool generates a single-cell expression matrix for all transcriptionally active regions (TARs), performs single-cell TAR expression analysis to identify biologically significant TARs, and then annotates TARs using gene homology analysis. This procedure uses single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts and thereby uncovers biology to which scRNA-seq would otherwise be in the dark.


2021 ◽  
Author(s):  
Robin N Beaumont ◽  
Isabelle K Mayne ◽  
Rachel M Freathy ◽  
Caroline F Wright

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Massimiliano Rossi ◽  
Leena Salmela ◽  
Christina Boucher

AbstractGenome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper.


Biostatistics ◽  
2017 ◽  
Vol 18 (3) ◽  
pp. 477-494 ◽  
Author(s):  
Jakub Pecanka ◽  
Marianne A. Jonker ◽  
Zoltan Bochdanovits ◽  
Aad W. Van Der Vaart ◽  

Summary For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the “missing heritability” of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson’s disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.


PLoS ONE ◽  
2018 ◽  
Vol 13 (3) ◽  
pp. e0193256 ◽  
Author(s):  
Zhaozhong Zhu ◽  
Verneri Anttila ◽  
Jordan W. Smoller ◽  
Phil H. Lee

2002 ◽  
Vol 184 (12) ◽  
pp. 3287-3295 ◽  
Author(s):  
Elaine O. Davis ◽  
Edith M. Dullaghan ◽  
Lucinda Rand

ABSTRACT The bases of the mycobacterial SOS box important for LexA binding were determined by replacing each base with every other and examining the effect on the induction of a reporter gene following DNA damage. This analysis revealed that the SOS box was longer than originally thought by 2 bp in each half of the palindromic site. A search of the Mycobacterium tuberculosis genome sequence with the new consensus, TCGAAC(N)4GTTCGA, identified 4 sites which were perfect matches and 12 sites with a single mismatch which were predicted to bind LexA. Genes which could potentially be regulated by these SOS boxes were ascertained from their positions relative to the sites. Examination of expression data for these genes following DNA damage identified 12 new genes which are most likely regulated by LexA as well as the known M. tuberculosis DNA damage-inducible genes recA, lexA, and ruvC. Of these 12 genes, only 2 have a predicted function: dnaE2, a component of DNA polymerase III, and linB, which is similar to 1,3,4,6-tetrachloro-1,4-cylcohexadiene hydrolase. Curiously, of the remaining 10 genes predicted to be LexA regulated, 7 are members of the M. tuberculosis 13E12 repeat family, which has some of the characteristics of mobile elements.


Sign in / Sign up

Export Citation Format

Share Document