scholarly journals Defining the functional significance of intergenic transcribed regions

2017 ◽  
Author(s):  
John P. Lloyd ◽  
Zing Tsung-Yeh Tsai ◽  
Rosalie P. Sowers ◽  
Nicholas L. Panchy ◽  
Shin-Han Shiu

ABSTRACTWith advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning classifiers using Arabidopsis thaliana as a model that accurately distinguish functional sequences (phenotype genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.

2021 ◽  
Author(s):  
Tatiana Kulikova ◽  
Antonina Maslova ◽  
Polina Starshova ◽  
Juan Sebastian Rodriguez ◽  
Alla Krasikova

In diplotene oocyte nuclei of all vertebrate species, except mammals, chromosomes lack interchromosomal contacts and chromatin is linearly compartmentalized into distinct chromomere-loop complexes forming lampbrush chromosomes. However, the mechanisms underlying the formation of chromomere-loop complexes remain unexplored. Here we aimed to juxtapose somatic topologically associating domains (TADs), recently identified in chicken embryonic fibroblasts, with chromomere-loop complexes in lampbrush meiotic chromosomes. By measuring 3D-distances and colocalization between linear equidistantly located genomic loci, positioned within one TAD or separated by a TAD border, we confirmed the presence of predicted TADs in chicken embryonic fibroblast nuclei. Using three-colored FISH with BAC probes we mapped equidistant genomic regions included in several sequential somatic TADs on isolated chicken lampbrush chromosomes. Eight genomic regions, each comprising two or three somatic TADs, were mapped to non-overlapping neighboring lampbrush chromatin domains - lateral loops, chromomeres or chromomere-loop complexes. Genomic loci from the neighboring somatic TADs could localize in one lampbrush chromomere-loop complex, while genomic loci belonging to the same somatic TAD could be localized in neighboring lampbrush chromomere-loop domains. In addition, FISH-mapping of BAC probes to the nascent transcripts on the lateral loops indicates transcription of at least 17 protein-coding genes and 2 non-coding RNA genes during the lampbrush stage of chicken oogenesis, including genes involved in oocyte maturation and early embryo development.


2015 ◽  
Vol 28 (11) ◽  
pp. 1198-1215 ◽  
Author(s):  
Lida Derevnina ◽  
Sebastian Chin-Wo-Reyes ◽  
Frank Martin ◽  
Kelsey Wood ◽  
Lutz Froenicke ◽  
...  

Peronospora tabacina is an obligate biotrophic oomycete that causes blue mold or downy mildew on tobacco (Nicotiana tabacum). It is an economically important disease occurring frequently in tobacco-growing regions worldwide. We sequenced and characterized the genomes of two P. tabacina isolates and mined them for pathogenicity-related proteins and effector-encoding genes. De novo assembly of the genomes using Illumina reads resulted in 4,016 (63.1 Mb, N50 = 79 kb) and 3,245 (55.3 Mb, N50 = 61 kb) scaffolds for isolates 968-J2 and 968-S26, respectively, with an estimated genome size of 68 Mb. The mitochondrial genome has a similar size (approximately 43 kb) and structure to those of other oomycetes, plus several minor unique features. Repetitive elements, primarily retrotransposons, make up approximately 24% of the nuclear genome. Approximately 18,000 protein-coding gene models were predicted. Mining the secretome revealed approximately 120 candidate RxLR, six CRN (candidate effectors that elicit crinkling and necrosis), and 61 WY domain–containing proteins. Candidate RxLR effectors were shown to be predominantly undergoing diversifying selection, with approximately 57% located in variable gene-sparse regions of the genome. Aligning the P. tabacina genome to Hyaloperonospora arabidopsidis and Phytophthora spp. revealed a high level of synteny. Blocks of synteny show gene inversions and instances of expansion in intergenic regions. Extensive rearrangements of the gene-rich genomic regions do not appear to have occurred during the evolution of these highly variable pathogens. These assemblies provide the basis for studies of virulence in this and other downy mildew pathogens.


2016 ◽  
Author(s):  
Valentina Iotchkova ◽  
Graham R.S. Ritchie ◽  
Matthias Geihs ◽  
Sandro Morganella ◽  
Josine L. Min ◽  
...  

Loci discovered by genome-wide association studies (GWAS) predominantly map outside protein-coding genes. The interpretation of functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages GWAS findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding that current methods do not offer. We further assess enrichment statistics for 27 GWAS traits within regulatory regions from the ENCODE and Roadmap projects. We characterise unique enrichment patterns for traits and annotations, driving novel biological insights. The method is implemented in standalone software and R package to facilitate its application by the research community.


Heredity ◽  
2022 ◽  
Author(s):  
Vikas Singh ◽  
Pallavi Sinha ◽  
Jimmy Obala ◽  
Aamir W. Khan ◽  
Annapurna Chitikineni ◽  
...  

AbstractTo identify genomic segments associated with days to flowering (DF) and leaf shape in pigeonpea, QTL-seq approach has been used in the present study. Genome-wide SNP profiling of extreme phenotypic bulks was conducted for both the traits from the segregating population (F2) derived from the cross combination- ICP 5529 × ICP 11605. A total of 126.63 million paired-end (PE) whole-genome resequencing data were generated for five samples, including one parent ICP 5529 (obcordate leaf and late-flowering plant), early and late flowering pools (EF and LF) and obcordate and lanceolate leaf shape pools (OLF and LLS). The QTL-seq identified two significant genomic regions, one on CcLG03 (1.58 Mb region spanned from 19.22 to 20.80 Mb interval) for days to flowering (LF and EF pools) and another on CcLG08 (2.19 Mb region spanned from 6.69 to 8.88 Mb interval) for OLF and LLF pools, respectively. Analysis of genomic regions associated SNPs with days to flowering and leaf shape revealed 5 genic SNPs present in the unique regions. The identified genomic regions for days to flowering were also validated with the genotyping-by-sequencing based classical QTL mapping method. A comparative analysis of the identified seven genes associated with days to flowering on 12 Fabaceae genomes, showed synteny with 9 genomes. A total of 153 genes were identified through the synteny analysis ranging from 13 to 36. This study demonstrates the usefulness of QTL-seq approach in precise identification of candidate gene(s) for days to flowering and leaf shape which can be deployed for pigeonpea improvement.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yan Liang ◽  
Wanchao Zhu ◽  
Sijia Chen ◽  
Jia Qian ◽  
Lin Li

Small peptides (sPeptides), <100 amino acids (aa) long, are encoded by small open reading frames (sORFs) often found in the 5′ and 3′ untranslated regions (or other parts) of mRNAs, in long non-coding RNAs, or transcripts from introns and intergenic regions; various sPeptides play important roles in multiple biological processes. In this study, we conducted a comprehensive study of maize (Zea mays) sPeptides using mRNA sequencing, ribosome profiling (Ribo-seq), and mass spectrometry (MS) on six tissues (each with at least two replicates). To identify maize sORFs and sPeptides from these data, we set up a robust bioinformatics pipeline and performed a genome-wide scan. This scan uncovered 9,388 sORFs encoding peptides of 2–100 aa. These sORFs showed distinct genomic features, such as different Kozak region sequences, higher specificity of translation, and high translational efficiency, compared with the canonical protein-coding genes. Furthermore, the MS data verified 2,695 sPeptides. These sPeptides perfectly discriminated all the tissues and were highly associated with their parental genes. Interestingly, the parental genes of sPeptides were significantly enriched in multiple functional gene ontology terms related to abiotic stress and development, suggesting the potential roles of sPeptides in the regulation of their parental genes. Overall, this study lays out the guidelines for genome-wide scans of sORFs and sPeptides in plants by integrating Ribo-seq and MS data and provides a more comprehensive resource of functional sPeptides in maize and gives a new perspective on the complex biological systems of plants.


Author(s):  
Nina Moravčíková ◽  
Radovan Kasarda ◽  
Ondrej Kadlečík ◽  
Anna Trakovická ◽  
Marko Halo ◽  
...  

The aim of this study was to analyse the genome-wide distribution of runs of homozygosity (ROH) segments in the genome of Norik of Muran horse and to identify the regions under strong selection pressure. Overall, 25 animals genotyped by the GGP Equine70k chip were included in the study. After SNP pruning, 54479 SNPs (75.72%) covering 2.25 Gb of the autosomal genome were retained for scan of ROH segments distribution. The ROHs were present in the genome of all animals and covered in average 13.17% (295.29 Mb) of autosomal genome expressed by the SNP loci. The highest number of ROHs was identified on autosome 1 (404), while the lowest proportion of autosome residing in ROH showed ECA31 (38). The footprints of selection, characterized by SNPs with extreme frequency in ROHs across specific genomic regions, were defined by the top 0.01 percentile of signals. Overall, nine genomic regions located on seven autosomes (3, 6, 9, 11, 15, 23) were identified. The strongest signal of selection showed three autosomes ECA3, ECA9 and ECA11. The protein-coding genes located within these regions suggested that the identified footprints of selection are most likely consequences of intensive breeding for traits of interest during the grading-up process of the Norik of Muran horse.


2021 ◽  
Author(s):  
Roman Hillje ◽  
Lucilla Luzi ◽  
Stefano Amatori ◽  
Mirco Fanelli ◽  
Pier Giuseppe Pelicci ◽  
...  

Abstract To disclose the epigenetic drift of time passing, we determined the genome-wide distributions of mono- and tri-methylated lysine 4 and acetylated and tri-methylated lysine 27 of histone H3 in the livers of healthy 3, 6 and 12 months old C57BL/6 mice. The comparison of different age profiles of histone H3 marks revealed global redistribution of histone H3 modifications with time, in particular in intergenic regions and near transcription start sites, as well as altered correlation between the profiles of different histone modifications. Moreover, feeding mice with caloric restriction diet, a treatment known to retard aging, preserved younger state of histone H3 in these genomic regions.


2017 ◽  
Author(s):  
Jörn M. Schmiedel ◽  
Debora S. Marks ◽  
Ben Lehner ◽  
Nils Blüthgen

AbstractmicroRNAs are pervasive post-transcriptional regulators of protein-coding genes in multicellular organisms. Two fundamentally different models have been proposed for the function of microRNAs in gene regulation. In the first model, microRNAs act as repressors, reducing protein concentrations by accelerating mRNA decay and inhibiting translation. In the second model, in contrast, the role of microRNAs is not to reduce protein concentrations per se but to reduce fluctuations in these concentrations. Here we present genome-wide evidence that mammalian microRNAs frequently function as noise controllers rather than repressors. Moreover, we show that post-transcriptional noise control has been widely adopted across species from bacteria to animals, with microRNAs specifically employed to reduce noise in regulatory and context-specific processes in animals. Our results substantiate the detrimental nature of expression noise, reveal a universal strategy to control it, and suggest that microRNAs represent an evolutionary innovation for adaptive noise control in animals.HighlightsGenome-wide evidence that microRNAs function as noise controllers for genes with context-specific functionsPost-transcriptional noise control is universal from bacteria to animalsAnimals have evolved noise control for regulatory and context-specific processes


2014 ◽  
Author(s):  
Josep M Comeron

The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS). Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and selective events, future analyses should incorporate BGS predictions and capture local recombination variation across genomes and along lineages.


2020 ◽  
Author(s):  
Manisha Ray ◽  
Saurav Sarkar ◽  
Surya Narayan Rath ◽  
Mukund Namdev Sable

AbstractThe COVID-19 pandemic is having a devastating effect on the healthcare system and the economy of the world. The unavailability of a specific treatment regime and a candidate vaccine yet opens up scope for new approaches and discoveries of drugs for mitigation of the sufferings of humankind due to the disease. The present isolated whole-genome sequences of SARS-CoV-2 from 11 different nations subjected to evolutionary study and genome-wide association study through in silico approaches including multiple sequence alignment, phylogenetic study through MEGA7 and have been analyzed through DNAsp respectively. These investigations recognized the nucleotide varieties and single nucleotide mutations/polymorphisms on the genomic regions as well as protein-coding regions. The resulted mutations have diversified the genomic contents of SARS-CoV-2 according to the altered nucleotides found in 11 genome sequences. India and Nepal have found to have progressively more distinct species of SARS-CoV-2 with variations in Spike protein and Nucleocapsid protein-coding sites. These genomic variations might be the explanation behind the less case fatality rate of India and Nepal dependent on the populaces. The anticipated idea of this investigation upgrades the information about genomic medication and might be useful in the planning of antibodies against SARS-CoV-2.


Sign in / Sign up

Export Citation Format

Share Document