gene length
Recently Published Documents


TOTAL DOCUMENTS

80
(FIVE YEARS 16)

H-INDEX

22
(FIVE YEARS 2)

Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 2367-2367
Author(s):  
Xiaosu Zhou ◽  
Daijing Nie ◽  
Yang Zhang ◽  
Zhixiu Liu ◽  
Jianping Zhang ◽  
...  

Abstract DNTT encodes the most template-independent DNA polymerases TdT. The canonical function of TdT is to boost the diversity of immunoglobulin and T cell receptors by incorporating non-templated nucleotides (NTN) to their variable regions via RAG1/2 mediated DNA breaks and non-homologous end joining (NHEJ) rearrangement process. This study aimed to investigate the relationship between aberrant DNTT expression and illegitimately TdT-aided microhomology-mediated replication-dependent recombination (MMRDR) with the mutagenesis of gene length mutations (LMs) in acute myeloid leukemia (AML), and their prognosis relevance. A cohort of 578 AML cases was enrolled. Fifty healthy donors for allogeneic hematopoietic stem cell transplantation (allo-HSCT), 393 B cell acute lymphoblastic leukemia (B-ALL) cases, 78 T-ALL cases, and 25 mixed-phenotype acute leukemia (MPAL) cases were used as control. Next-generation sequencing was performed for mutation analysis of 86 leukemia driver genes. RNA-seq was used to analyze the expression of DNTT and other non-homologous end joining (NHEJ) associated genes. Prognostic was investigated in a subset of 239 AML cases who underwent anti-thymocyte globulin (ATG) or anti-lymphocyte globulin (ALG) based regimen allo-HSCT. Based on sequence anatomy that considers the MMRDR mechanism and nucleotides characteristics of TdT mediated NTN incorporation, we formulate a classification algorithm for LMs and divide them into four subtypes (type I-IV). Type-I indicates pure duplicated/triplicated germline sequences with identifiable ≥2bp canonical triple microhomology (MH) sequences; type-II indicates pure duplicated/triplicated germline sequences without ≥2bp canonical MH sequences; type-III indicates any LMs with NTN insertions; type-IV indicates any other LMs, mainly deletions. FLT3-LMs has the highest overall incidence and occur across multiple lineage leukemias. We observed a significant FLT3-LMs subtypes distribution bias among acute leukemia subtypes (Figure 1A). Type-I FLT3 LMs are only observed in AML; there are mainly type-III FLT3 LMs in T-ALL and MPAL; type-II, III, IV FLT3 LMs are predominant in B-ALL. The overall DNTT expression was significantly lower in AML than in other leukemia subtypes and control groups (P < 0.001). This supports that FLT3-LMs subtypes distribution bias might be attributed to the difference in the overall DNTT expression among leukemia lineages. A total of 458 LMs events were observed in 295 cases (51.0%) within 25 genes (FLT3, NPM1, CEBPA, RUNX1, KIT, etc.) in our AML cohort. The incidence of type-II and type-III LMs, both of which the mutagenesis relies on TdT-aided MMRDR in theoretically speculate, were 31.2 % and 47.8 %, respectively. Type-I and type-II, which manifested as pure germline sequence duplications, account for 43.6% of the FLT3 LMs; type-III LMs, which with additional inserted NTN sequences, account for as high as 50.4% of the FLT3 LMs. We analyzed the G/C nucleotide contents adjacent to LMs junctions. A significantly high G/C bias was observed at +1 nucleotide position in type-II and type-III subsets (Figure 1B), suggesting that the TdT-aided MMRDR mechanism plays a role in the mutagenesis in these cases. We also observed a strong positive correlation between fragment length and G/C content of the inserted NTN sequences (P < 0.001) within the type-III subset of the 25 LMs genes (Figure 1C), suggesting a higher TdT activity mediates longer inserted sequences. DNTT expressions level of type-III LMs cases were significantly higher than that of type-I, II, IV LMs cases and cases without LMs in the total 25 LMs genes (Figure 1D) and the FLT3 LMs subset. Similar expression signatures of other NHEJ associated genes RAG2, XRCC4, and XRCC6 were also observed. For the survival analysis in the ATG/ALG based regimen allo-HSCT AML subset, we observed a significantly better overall survival (P = 0.024) in cases positive for type-III FLT3-LMs than that of type-I, II (Figure 1E). In this study, we proposed a subclassification algorithm for LMs (type I-IV) in AML. Both DNTT gene expression and sequence character suggesting that TdT-aided MMRDR plays a role in the mutagenesis of type-III and type-II LMs. We also observed AML cases with type-III FLT3 LMs benefit more from ATG/ALG based regimen allo-HSCT than cases with other FLT3 LMs types, which may be attributed to the aberrant lymphoid lineage antigen expression. Figure 1 Figure 1. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Author(s):  
Dyfed Lloyd Evans

Much of the work on the normalization of RNA-seq data has been performed on human, notably cancer tissue. Little work has been done in plants, particularly polyploids and those species with incomplete or no genomes. We present a novel implementation of GeTMM (Gene Length Corrected TMM) that accounts for GC bias and works at the transcript level. The algorithm also employs transcript length as a factor, allowing for incomplete transcripts and alternate transcripts. This significantly improves overall normalization. The GCGeTMM methodology also allows for simultaneous determination of differentially expressed transcripts (and by extension genes) and stably expressed genes to act as references for qRT-PCR and microarray analyses.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Jay C. Brown

This study was carried out to pursue the observation that the level of gene expression is affected by gene length in the human genome. As transcription is a time-dependent process, it is expected that gene expression will be inversely related to gene length, and this is found to be the case. Here, I describe the results of studies performed to test whether the gene length/gene expression linkage is affected by two factors, the chromosome where the gene is located and the tissue where it is expressed. Studies were performed with a database of 3538 human genes that were divided into short, midlength, and long groups. Chromosome groups were then compared in the expression level of genes with the same length. A similar analysis was performed with 19 human tissues. Tissue-specific groups were compared in the expression level of genes with the same length. Both chromosome and tissue studies revealed new information about the role of gene length in control of gene expression. Chromosome studies led to the identification of two chromosome populations that differ in the expression level of short genes. A high level of expression was observed in chromosomes 2-10, 12-15, and 18 and a low level in 1, 11, 16-17, 19-20, 22, and 24. Studies with tissue-specific genes led to the identification of two tissues, brain and liver, which differ in the expression level of short genes. The results are interpreted to support the view that the level of a gene’s expression can be affected by the chromosome and the tissue where the gene is transcribed.


2021 ◽  
Vol 12 ◽  
Author(s):  
Inês Lopes ◽  
Gulam Altab ◽  
Priyanka Raina ◽  
João Pedro de Magalhães

While it is expected for gene length to be associated with factors such as intron number and evolutionary conservation, we are yet to understand the connections between gene length and function in the human genome. In this study, we show that, as expected, there is a strong positive correlation between gene length, transcript length, and protein size as well as a correlation with the number of genetic variants and introns. Among tissue-specific genes, we find that the longest transcripts tend to be expressed in the blood vessels, nerves, thyroid, cervix uteri, and the brain, while the smallest transcripts tend to be expressed in the pancreas, skin, stomach, vagina, and testis. We report, as shown previously, that natural selection suppresses changes for genes with longer transcripts and promotes changes for genes with smaller transcripts. We also observe that genes with longer transcripts tend to have a higher number of co-expressed genes and protein-protein interactions, as well as more associated publications. In the functional analysis, we show that bigger transcripts are often associated with neuronal development, while smaller transcripts tend to play roles in skin development and in the immune system. Furthermore, pathways related to cancer, neurons, and heart diseases tend to have genes with longer transcripts, with smaller transcripts being present in pathways related to immune responses and neurodegenerative diseases. Based on our results, we hypothesize that longer genes tend to be associated with functions that are important in the early development stages, while smaller genes tend to play a role in functions that are important throughout the whole life, like the immune system, which requires fast responses.


2021 ◽  
Author(s):  
◽  
Sivarajan Karunanithi

In the last two decades, our understanding of human gene regulation has improved tremendously. There are plentiful computational methods which focus on integrative data analysis of humans, and model organisms, like mouse and drosophila. However, these tools are not directly employable by researchers working on non-model organisms to answer fundamental biological, and evolutionary questions. We aimed to develop new tools, and adapt existing software for the analysis of transcriptomic and epigenomic data of one such non-model organism, Paramecium tetraurelia, an unicellular eukaryote. Paramecium contains two diploid (2n) germline micronuclei (MIC) and a polyploid (800n) somatic macronuclei (MAC). The transcriptomic and epigenomic regulatory landscape of the MAC genome, which has 80% protein-coding genes and short intergenic regions, is poorly understood. We developed a generic automated eukaryotic short interfering RNA (siRNA) analysis tool, called RAPID. Our tool captures diverse siRNA characteristics from small RNA sequencing data and provides easily navigable visualisations. We also introduced a normalisation technique to facilitate comparison of multiple siRNA-based gene knockdown studies. Further, we developed a pipeline to characterise novel genome-wide endogenous short interfering RNAs (endo-siRNAs). In contrary to many organisms, we found that the endo-siRNAs are not acting in cis, to silence their parent mRNA. We also predicted phasing of siRNAs, which are regulated by the RNA interference (RNAi) pathway. Further, using RAPID, we investigated the aberrations of endo-siRNAs, and their respective transcriptomic alterations caused by an RNAi pathway triggered by feeding small RNAs against a target gene. We find that the small RNA transcriptome is altered, even if a gene unrelated to RNAi pathway is targeted. This is important in the context of investigations of genetically modified organisms (GMOs). We suggest that future studies need to distinguish transcriptomic changes caused by RNAi inducing techniques and actual regulatory changes. Subsequently, we adapted existing epigenomics analysis tools to conduct the first comprehensive epigenomic characterisation of nucleosome positioning and histone modifications of the Paramecium MAC. We identified well positioned nucleosomes shifted downstream of the transcription start site. GC content seems to dictate, in cis, the positioning of nucleosomes, histone marks (H3K4me3, H3K9ac, and H3K27me3), and Pol II in the AT-rich Paramecium genome. We employed a chromatin state segmentation approach, on nucleosomes and histone marks, which revealed genes with active, repressive, and bivalent chromatin states. Further, we constructed a regulatory association network of all the aforementioned data, using the sparse partial correlation network technique. Our analysis revealed subsets of genes, whose expression is positively associated with H3K27me3, different to the otherwise reported negative association with gene expression in many other organisms. Further, we developed a Random Forests classifier to predict gene expression using genic (gene length, intron frequency, etc.) and epigenetic features. Our model has a test performance (PR-AUC) of 0.83. Upon evaluating different feature sets, we found that genic features are as predictive, of gene expression, as the epigenetic features. We used Shapley local feature explanation values, to suggest that high H3K4me3, high intron frequency, low gene length, high sRNA, and high GC content are the most important elements for determining gene expression status. In this thesis, we developed novel tools, and employed several bioinformatics and machine learning methods to characterise the regulatory landscape of the Paramecium’s (epi)genome.


2020 ◽  
Vol 117 (34) ◽  
pp. 20662-20671 ◽  
Author(s):  
Jessica A. Weber ◽  
Seung Gu Park ◽  
Victor Luria ◽  
Sungwon Jeon ◽  
Hak-Min Kim ◽  
...  

The endangered whale shark (Rhincodon typus) is the largest fish on Earth and a long-lived member of the ancient Elasmobranchii clade. To characterize the relationship between genome features and biological traits, we sequenced and assembled the genome of the whale shark and compared its genomic and physiological features to those of 83 animals and yeast. We examined the scaling relationships between body size, temperature, metabolic rates, and genomic features and found both general correlations across the animal kingdom and features specific to the whale shark genome. Among animals, increased lifespan is positively correlated to body size and metabolic rate. Several genomic traits also significantly correlated with body size, including intron and gene length. Our large-scale comparative genomic analysis uncovered general features of metazoan genome architecture: Guanine and cytosine (GC) content and codon adaptation index are negatively correlated, and neural connectivity genes are longer than average genes in most genomes. Focusing on the whale shark genome, we identified multiple features that significantly correlate with lifespan. Among these were very long gene length, due to introns being highly enriched in repetitive elements such as CR1-like long interspersed nuclear elements, and considerably longer neural genes of several types, including connectivity, activity, and neurodegeneration genes. The whale shark genome also has the second slowest evolutionary rate observed in vertebrates to date. Our comparative genomics approach uncovered multiple genetic features associated with body size, metabolic rate, and lifespan and showed that the whale shark is a promising model for studies of neural architecture and lifespan.


2020 ◽  
Vol 48 (16) ◽  
pp. e95-e95 ◽  
Author(s):  
Angus M Sidore ◽  
Calin Plesa ◽  
Joyce A Samson ◽  
Nathan B Lubock ◽  
Sriram Kosuri

Abstract Multiplexed assays allow functional testing of large synthetic libraries of genetic elements, but are limited by the designability, length, fidelity and scale of the input DNA. Here, we improve DropSynth, a low-cost, multiplexed method that builds gene libraries by compartmentalizing and assembling microarray-derived oligonucleotides in vortexed emulsions. By optimizing enzyme choice, adding enzymatic error correction and increasing scale, we show that DropSynth can build thousands of gene-length fragments at >20% fidelity.


2020 ◽  
Author(s):  
Jay C. Brown

AbstractBackgroundThis study was carried out to pursue the observation that the level of gene expression is affected by gene length in the genomes of higher vertebrates. As transcription is a time-dependent process, it is expected that gene expression will be inversely related to gene length, and this is found to be the case. Here I describe the results of studies performed with the human genome to test whether the gene length/gene expression linkage is affected by two factors, the chromosome where the gene is located and the tissue where it is expressed.Experimental designStudies were carried out with a database of 2413 human genes that were divided into short, mid-length and long groups. Each of the 24 human chromosomes was then characterized according to the proportion of each gene length group present. A similar analysis was performed with 19 human tissues. The proportion of short, mid-length and long genes was noted for each tissue.ResultsBoth chromosome and tissue studies revealed new information about the role of gene length in control of gene expression. Chromosome studies led to the identification of two chromosome populations that differ in the level of short gene expression. Tissue studies support the conclusion that short, highly expressed genes are enriched in tissues that produce protein products that are exported from the host cell.


2020 ◽  
Author(s):  
Inês Lopes ◽  
Gulam Altab ◽  
Priyanka Raina ◽  
João Pedro de Magalhães

AbstractWhile it is expected for gene length to be influenced by factors such as intron number and evolutionary conservation, we have yet to fully understand the connection between gene length and function in the human genome.In this study, we show that, as expected, there is a strong positive correlation between gene length and the number of SNPs, introns and protein size. Amongst tissue specific genes, we find that the longest genes are expressed in blood vessels, nerve, thyroid, cervix uteri and brain, while the smallest genes are expressed within the pancreas, skin, stomach, vagina and testis. We report, as shown previously, that natural selection suppresses changes for genes with longer lengths and promotes changes for smaller genes. We also observed that longer genes have a significantly higher number of co-expressed genes and protein-protein interactions. In the functional analysis, we show that bigger genes are often associated with neuronal development, while smaller genes tend to play roles in skin development and in the immune system. Furthermore, pathways related to cancer, neurons and heart diseases tend to have longer genes, with smaller genes being present in pathways related to immune response and neurodegenerative diseases.We hypothesise that longer genes tend to be associated with functions that are important early in life, while smaller genes play a role in functions that are important throughout the organisms’ whole life, like the immune system which require fast responses.Author SummaryEven though the human genome has been fully sequenced, we still do not fully grasp all of its nuances. One such nuance is the length of the genes themselves. Why are certain genes longer than others? Is there a common function shared by longer/smaller genes? What exactly makes gene longer? We tried answering these questions using a variety of analysis. We found that, while there was not a particular strong factor in genes that influenced their size, there could be an influence of several gene characteristics in determining the length of a gene. We also found that longer genes are linked with the development of neurons, cancer, heart diseases and muscle cells, while smaller genes seem to be mostly related with the immune system and the development of the skin. This led us to believe that, whether the gene has an important function early in our life, or throughout our whole lives, or even if the function requires a rapid response, that its gene size will be influenced accordingly.


PLoS Biology ◽  
2019 ◽  
Vol 17 (11) ◽  
pp. e3000481 ◽  
Author(s):  
Shir Mandelboum ◽  
Zohar Manber ◽  
Orna Elroy-Stein ◽  
Ran Elkon

Sign in / Sign up

Export Citation Format

Share Document